Anti-deception enforcement matrix

Anti-deception enforcement matrix

Audit deliverable per S-U-012. Every SOW-level phase transition,

the evidence it claims to require, and whether that evidence is

actually enforced by code today. Gaps are flagged so a follow-up

commit can close them.

Executing-model context boundary

The architectural invariant the matrix below exists to protect:

> No commit, attestation, or acceptance signoff may be produced by

> the same agent context that wrote the code under review.

The existing cross-model reviewer covers half of this (Claude writes,

Codex reviews). The missing half — cryptographic attribution of

commits — is what internal/stancesign/ now provides. A commit

signed by the reviewer stance key cannot have been authored by a

worker stance even if a supply-chain attacker compromised the worker

model's context, because the signing key never enters the worker's

execution environment.

Matrix

TransitionRequired evidenceEnforcement statusNotes
Task dispatch → task completeWorker declares tr.Success=true AND reviewer verdict Complete=truereviewAndFollowupRecursive (sownative.go:3127); LLM reviewer requiredSince 866fe57 the LLM verdict is cross-checked by the zombie classifier (3-state) and content-faithfulness judge on already-done cases.
Zombie overrideTask declared ≥1 file AND 0 writes attributed this dispatch AND ≥1 declared file missing/emptyclassifyZombie deterministic check cannot be overridden by the LLMNo LLM can talk its way past the stat-based file check.
Declared-files-present but LLM verdict=CompleteDeterministic stub scan + LLM content-faithfulness judge agree it's realtaskOutputsLookComplete + JudgeDeclaredContent both consultedFake content must pass both a regex stub scan and a second-opinion LLM.
Decomposer abandonDecomposer returns Abandon=true with reason⛔ BLOCKED marker surfaced; no silent acceptChanged in d1872a2: was (neutral); now ⛔ BLOCKED, not complete.
Session AC passCheckAcceptanceCriteria mechanical command exits 0 OR semantic judge overrides with reasoningacceptance.go + ac_judge.goSemantic override requires a stored reasoning string per criterion.
Session advance (continueOnFailure)Upstream session success OR explicit ContinueOnFailure=truerunParallel + legacy Run both enforceDependent sessions under a failed upstream are marked BLOCKED, not silently run.
Integration review verdictJSON-structured IntegrationReport OR synthetic reviewer-noncompliant gap✅ Prior non-JSON verdicts silently passed as "clean"; now forced to surface a synthetic gapFixed in the same commit as the env-var classifier work.
Env-var gatingClassifier verdict build-required AND OS env unset✅ Runtime-only vars no longer gate buildsenvvarclassifier.go + classifier cache.
Declared dependency resolutionEvery npm/PyPI/crates.io/Go module named in a manifest resolves against its registrydepcheck.Validate runs pre-installCatches the @nativewind/style hallucination class.
Prompt-injection intake scanProject-supplied skill content passes threat-pattern scanpromptguard in Warn mode at ingestionDefault Warn; policy flag will promote to Strip once telemetry accrues.
Secret egressLog/replay/bus output passes through redact.Writerlogging.Init wraps the output writerCan be disabled via STOKELOGREDACT=0 for debugging.
Dangling task dependencyEvery Task.Dependencies entry resolves to an existing task IDplan.CleanTaskDependencies before dispatchPrior behavior printed a warning and dispatched anyway.
Commit attributionAuthor + committer match stance; signature verifies against stance's public keyGap — primitives exist (stancesign.Identity.ApplyTo), not yet called from the commit sitesFollow-up: thread stance identity into worktree merge commit + any git commit invocation the harness produces.
Stance separation at model levelReviewer model ≠ builder modelmodelsource resolves per-role (builder vs reviewer)Can use Gemini reviewer + Claude builder; OpenRouter any-vendor.
Critic approval references evidenceCritic Verdict carries artifact/log hashes of what it examinedGapcritic.Verdict today carries findings + severity but no explicit evidence-hash fieldFollow-up: add EvidenceRefs []string to Verdict struct.
Supervisor rule verdicts carry evidenceSupervisor rule decisions record the inputs that justified themsupervisor rules emit bus events with payloadsVerify per-rule that payload is sufficient for replay.
Merge to mainBranch is clean OR conflict resolved deterministicallyworktree.Merge uses git merge-tree + mergeMuPre-merge snapshot + restore on failure.

Remaining gaps (ordered by blast radius)

1. Commit attribution is not yet stance-signed. The signing

primitive exists (internal/stancesign/). Wiring it into the

actual commit sites is a follow-up that touches internal/worktree

and any direct exec.Command("git", "commit", ...) in

cmd/r1/sow_native.go. When that lands, a commit with author

stoke-reviewer and a valid signature from the reviewer key

becomes a cryptographic attestation; a commit with that author

but no signature or a mismatching signature is immediately

rejectable.

2. Critic evidence references are prose, not hashes. The

existing Verdict.Findings[].Message is a human-readable string.

Extending to `EvidenceRefs []string{"sha256:<artifact>",

"log:<span-id>"}` lets an auditor replay exactly what the critic

saw. Low-risk schema change; callers append a hash list alongside

their existing prose.

3. Supervisor rule payloads should be schema-validated. Bus

events carry map[string]any; a drift-detection rule that emits

an under-populated payload today fails silently at replay. Not

urgent (the existing rules are well-tested) but worth a single

schema-validation pass during the next supervisor-rule refactor.

Validation procedure

When a follow-up commit closes one of the gaps above:

crafted failure case (e.g. a commit with mismatched stance

signature should fail verification).

is traceable: "closes anti-deception matrix row: commit

attribution is not yet stance-signed."

What this matrix does not cover

executed by workers). Sandboxing those is an env-backend concern

(Firecracker, S-U-010) rather than an evidence-chain one.

scope; the harness does not push to protected refs.

hand. The signing separation catches an attacker who gets inside

the harness; it does not catch an operator who removes the harness

from the loop.

Pages in this directory