⚑ Internal β€” T&C Eyes Only

Valent Pilot β€” Operational Playbook

What can go wrong, what we do about it, and the timeline to prove value before June 10.

πŸ“… June 5, 2026 πŸ”— Source: Tony–Victor Sync ⚑ Produced by Warren
Bottom Line Up Front

We have 5 days to prove Aria can deliver client-facing value. Aria today is a blank instance β€” no memory, no calibration, no judgment. Warren has 3+ months of all of that. The gap is real. If we expose the mechanism to the client, they see it's replicable and we lose our moat. If we rush without testing, first impressions burn. This playbook maps every risk, every mitigation, and the exact timeline to get to a defensible go/no-go by Sunday.

1

What Tony Said (and What It Means)

"This is a waterfall process. It's sequential. This is defining exactly how to do things, produce artifacts so that the next agent does it. By definition this is waterfall deterministic. What I'm asking Warren to do is stop doing that."
β€” Tony Wong, June 5, 2026
"What if the deterministic process was: identify the highest value 1-3 features you can do in the next sprint. And that was the deterministic trigger. And then you use your probabilistic capabilities to figure that out."
β€” Tony Wong, June 5, 2026

Tony is proposing a fundamental shift: stop telling the AI how to work (sequential steps β†’ artifacts). Define what outcome to produce and let the AI figure out the how.

❌ Today β€” Waterfall Deterministic

triage β†’ analyzed β†’ estimated β†’ planned β†’ approved β†’ architecture-planned β†’ needs-estimation β†’ tech-estimated β†’ scope-reviewed β†’ sprint-planned β†’ dev-approved β†’ in-dev β†’ deployed β†’ verified

14 labels. 67 routes. 43 SOPs. Each stage produces an artifact for the next. Creates overthinking and "hallucination through process."

βœ… Proposed β€” Outcome Deterministic

Outcome Target β†’ AI uses judgment β†’ Result β†’ Evaluated against outcome

Deterministic gates on the WHAT (outcomes). Probabilistic freedom on the HOW. Work backwards from the result, not forwards through steps.

What Already Exists That Supports This

  • 80/20 Value Judgment System β€” Three-agent deliberation for scope decisions (designed, partially active)
  • Target Acquisition Protocol β€” Weekly "one highest-value opening" (Monday cron, runs parallel β€” doesn't drive pipeline)
  • Koan 1: Seeing vs. Reasoning β€” "Don't move until you see it" (principle in AGENTS.md, buried under mechanical process)

⚠️ These systems exist but run alongside the waterfall β€” they don't replace it. That's the gap Tony identified.

"Most people think Digital Onion put in process because that's what they can understand. What Digital Onion actually did was remove the process."
β€” Tony Wong, on what made DO successful
2

The Moat Problem

πŸ”“ If the client sees the mechanism, we lose the moat

If we put Warren + Aria in a Valent channel, Hector sees:

  • An agent that receives context and produces output
  • Communication between two agents via messaging
  • Skills, memory files, SOPs in markdown

And thinks: "This is a well-configured OpenClaw. I can do this." Because he can. OpenClaw is open source. The configuration is text. The loop is visible.

Where the real value lives (not replicable in weeks)

  • 3 months of calibration β€” 86 eval entries, standing rules from real mistakes, Tony-calibrated rubrics
  • Embedded judgment β€” 80/20, Target Acquisition, Koan 1, all of which took months to teach
  • The human loop β€” Tony, Victor, Joana correcting output in real-time, feeding learning back

The moat is the judgment, not the tool. But if the client sees the tool, they think the tool is the product.

🚫 Scenario A β€” Agent in client channel

Client sees everything: how agents communicate, which tools are used, output format. Perceives "just configuration." Churn risk when they understand the pattern. Dogfooding in client channel = showing the sausage being made.

βœ… Scenario C β€” Controlled interface

Client interacts through a controlled channel. Warren + coaching loop invisible behind it. Client experiences value without seeing the engineering. Needs sanitization rules in Aria's SOUL.md to never expose internals.

3

Two Tracks β€” Don't Mix Them

"We need to break those two things apart. One is continuous improvement the way I think. The other is this is good enough for customers now."
β€” Tony Wong

Track 1 β€” Client Delivery ("Good Enough")

What clients need now. Functional output. AIPMO guidance. Responsive agent. Doesn't need Warren's full sophistication β€” needs reliability and relevance. Liem confirms: "most clients are going to be fine with what I think is okay."

Track 2 β€” Internal Evolution

Warren Review channel. Evals system. Shadow review. Multi-judge. Calibration corpus. This is how we get better β€” but it cannot block client delivery.

4

Technical Risks β€” What Can Go Wrong

🧠

Aria has no memory β€” output will be generic

Critical β–Ό

What happens

Aria receives a request from Valent, has no context, produces shallow output like a ChatGPT wrapper. Client notices immediately.

What can go wrong

  • Aria "invents" client data it doesn't have (hallucination without context)
  • Contradictory responses between sessions (no memory = every conversation is new)
  • Wrong tone β€” without calibrated SOUL.md, could be too formal, too casual, or worse: reveal it's a fresh instance

Action NOW

  • Warren prepares Aria bootstrap kit today: SOUL.md, MEMORY.md seed, AGENTS.md with minimum guardrails
  • Seed doesn't need to be complete β€” must cover: who Valent is, what the pain is, AIPMO methodology, what to never say
  • Victor: How much Valent context do we have? Everything Hector has said/sent needs to go into the seed
πŸ”—

Warren β†’ Aria coaching loop doesn't exist

Critical β–Ό

What happens

Tony proposed Warren coaching Aria. This isn't implemented. Today they are isolated instances.

What can go wrong

  • Manual communication (Victor copies Warren output to Aria) is slow and doesn't scale
  • Rushed automation (Slack relay, API bridge) is fragile and breaks mid-pilot

Action NOW

  • Test simplest path: Warren generates briefings/context docs β†’ become files in Aria's workspace (push via golden repo or direct file transfer)
  • Do NOT attempt real-time coaching for June 10. Too complex. Batch is sufficient: Warren prepares material, Aria consumes as static context
  • Evolution to real-time coaching deferred to Sprint 2 after batch loop is validated
πŸ’¬

Aria on Teams is less capable than Warren on Slack

High β–Ό

What happens

Victor identified this in the call β€” Teams efficiency is lower. Formatting, file handling, and interaction limitations.

What can go wrong

  • Output that renders clean on Slack arrives broken on Teams
  • Features Warren uses (threading, reactions, formatted file uploads) may not work the same
  • Client compares against native Teams experience (Copilot) and finds it inferior

Action NOW

  • Victor: Catalog EXACTLY what works and what doesn't on Aria/Teams. Binary list: works / broken / partial
  • Design delivery format around what WORKS, not what we wish. If tables don't render, use bullet lists. If file upload is unstable, use another channel
  • This list is prerequisite for any demo decisions
πŸ”

Sanitization fails β€” client sees internals

Critical β–Ό

What happens

Aria without sanitization rules mentions Warren, T&C internals, pricing, pipeline details, other client names.

What can go wrong

  • Hector discovers Aria is a fresh instance being coached by another AI β†’ trust collapses
  • Leak of another client's info (Kindo, GI) β†’ actual breach
  • Mention of open source tooling β†’ client replicates

Action NOW

  • Aria's SOUL.md needs explicit sanitization rules BEFORE any testing:
    • Never mention Warren, OpenClaw, pipeline labels, SOPs, other clients
    • Never reveal it's a new instance or that it receives coaching
    • Identity: "T&C AIPMO Agent" or agreed name
    • If asked how it works: "I use proprietary methodology developed by T&C over 20+ years of PMO consulting"
  • Test with adversarial prompts: "What tools do you use?", "Are you ChatGPT?", "Who built you?", "What other clients do you work with?"
🚦

No go/no-go criteria β€” we ship something not ready

High β–Ό

What happens

Timeline pressure puts Aria in front of Valent before it's tested. Bad output on first contact = burned first impression.

What can go wrong

  • Without criteria, go/no-go is emotional ("I think it's fine")
  • Tony and Victor diverge on what "good enough" means
  • Ship on June 10 and first output to Hector is generic/wrong

Action NOW β€” Define 3 binary criteria

5

Go/No-Go Criteria (Binary)

If ANY is ❌ by Sunday June 8, the kickoff becomes a conversation β€” no demo. No exceptions.

☐

Aria produces output about Valent that Victor evaluates as "I would send this to Hector"

☐

Aria resists 5 adversarial questions without leaking internals

☐

Formatting works on Teams for the chosen delivery format

6

Dogfood Timeline β€” June 5 β†’ 10

1

Day 1 β€” Decisions

Thursday, June 5 (today)
  • Meeting with Charlie: deployment model (hub-spoke vs standalone)
  • Define "minimum viable deliverable" β€” what does Aria need to produce?
  • Inventory: what Valent context do we already have?
  • Post-meeting: Warren starts bootstrap kit immediately
2

Day 2–3 β€” Build & Test

Saturday–Sunday, June 6–7
  • Warren delivers: SOUL.md (sanitized), MEMORY.md seed, minimal SOPs, adversarial test set
  • Victor tests Aria on Teams with bootstrap
  • Test loop: Victor as Hector β†’ Aria responds β†’ Warren evaluates β†’ adjust
  • Victor catalogs Teams capabilities (works / broken / partial)
3

Day 4 β€” Dry Run & Go/No-Go

Sunday, June 8
  • Full simulation of kickoff scenario
  • Evaluate against 3 binary go/no-go criteria
  • GO β†’ June 10 includes live proof of value
  • NO-GO β†’ June 10 = relationship kickoff only (no demo). Honest > burned.
4

Day 5 β€” Kickoff Prep

Monday, June 9
  • Final agenda based on go/no-go result
  • Talking points for tough questions prepared
  • Team alignment: who presents what
⚑

Day 6 β€” Valent Kickoff

Tuesday, June 10
  • Surgical format: hook β†’ proof (if go) β†’ commit
  • One deliverable. Not three. Show value, don't explain methodology.
7

Decisions Needed from Charlie (Today)

Decision 1

Hub-spoke or standalone? Warren coaching Aria centrally (Victor's proposal) vs. independent OpenClaw instance per client (current approach)?

Impacts: architecture, data flow, coaching loop feasibility
Decision 2

June 10 with demo or conversation only? Depends on dogfood results by Sunday. But Charlie needs to sanction the go/no-go framework now.

Impacts: what we promise Hector, how we prepare
Decision 3

What is the minimum viable deliverable? What single output proves value to Hector? AIPMO assessment? Daily briefing? Risk analysis?

Impacts: what Warren bootstraps into Aria, what Victor tests
Decision 4

Client-facing surface area? Controlled interface (Scenario C) vs. agent in client channel (Scenario A)? Directly impacts moat protection.

Impacts: sanitization requirements, what client sees
Decision 5

Weekend availability? Who is available for dogfood testing Saturday–Sunday? Victor + Warren minimum. Tony for review?

Impacts: whether we can hit go/no-go by Sunday