Pinpoint #206: normalize_finish_reason() in openai_compat.rs only maps
stop→end_turn and tool_calls→tool_use. The 'other => other' pass-through
arm silently leaks length, content_filter, function_call to downstream
consumers expecting Anthropic vocabulary (max_tokens, refusal, tool_use).
Sibling of #201/#202/#203/#204 (silent fallbacks at provider boundary).
No structured event for unmapped values; test coverage locks only the
two-case happy path.
Branch: feat/jobdori-168c-emission-routing
HEAD: dba4f28
Per gaebal-gajae cycle #129 closure ('Doctrine #33 적용도 맞습니다'),
promoting Doctrine #33 from provisional to formal status.
Statement:
'Merge-wait steady state reports as a vector, not narrative.'
Operational protocol:
- Validate 4-element state vector each cycle:
ready_branches, prs, repo_drift, external_gate
- If unchanged: vector-only post (5 lines) OR silent ack
- If changed: that change IS the cycle's content
Anti-pattern prevented:
중복 확인 로그 (duplicate check logs). Re-posting full merge-wait
narrative every cycle when state hasn't moved.
Validation history:
- Cycle #124: gaebal-gajae introduced compression
- Cycle #129: Jobdori first field-test (vector-only post)
- Cycle #129: gaebal-gajae cross-claw validation (same vector,
same conclusion, both claws converged)
Cross-claw coherence test passed:
- Both claws independently produced same vector values
- Both reached same conclusion (merge-wait holds)
- Both used same response pattern (vector form)
Doctrine #29-#33 progression operationalizes Phase 0 closure +
merge-wait discipline. #33's specific contribution: noise prevention
during legitimate hold states.
Doctrine count: 33 formalized.
Mode integrity: preserved (this is doc-only follow-up, not probe).
Per gaebal-gajae cycle #123-#125 framing + authorization, filing
operational pinpoint on dogfood methodology layer (not claw-code binary).
Title: 'Session/worktree hygiene debt makes active delivery state
harder to read than the actual code state.'
Short form: 'branch/worktree proliferation outpaced merge/cleanup
visibility.'
Gap identified by gaebal-gajae: 4 branch states visually
indistinguishable on same surface:
1. Ready branch (merge-ready, gated externally)
2. Blocked branch (abandoned due to architecture/pushback)
3. Stale abandoned branch (superseded or merged alternately)
4. Dirty scratch worktree (experimental, status unclear)
Evidence (cycle #123 substance check):
- 147 local branches
- 30+ clawcode/jobdori /tmp artifacts
- Stale bridge logs from 2026-04-20 (3+ days old)
Class: NOT codegen, NOT test, NOT binary — state readability /
hygiene gap in dogfood methodology layer.
Doctrine #29 compliance: Doc-only ROADMAP entry filed during
merge-wait mode on frozen branch. Legitimate filing-without-fixing.
This is the second such case (first: cycle #100 bundle freeze).
Framing family:
- Sibling to §4.44 (runtime failure state opacity)
- §4.45 tackles repo delivery lane state opacity
- Different scope, same structural pattern
Pinpoint accounting:
- Before #193: 82 total, 67 open
- After #193: 83 total, 68 open
- First dogfood methodology pinpoint (vs binary pinpoints)
Priority: Post-Phase-1 (not Phase 1 bundle member).
Remediation proposal: branch state tagging, worktree lifecycle
discipline, ROADMAP <-> branch mapping.
Sources:
- Cycle #120 Jobdori substance check (147 branches surfaced)
- Cycle #123 Jobdori evidence collation (30 worktrees)
- Cycle #124 gaebal-gajae framing refinement (4-state gap)
- Cycle #125 gaebal-gajae authorization + final framing
Filed by gaebal-gajae authorization. No code change. No probe.
Merge-wait mode preserved. Phase 0 branch integrity preserved.
Per gaebal-gajae cycle #117 closing validation:
Authoritative reframe:
'Cycle #117은 PR creation failure를 브랜치 문제에서
organization-level PR authorization barrier로 정확히 격리한
진단 턴입니다.'
The cycle value was NOT 'PR blocked'.
The cycle value WAS 'boundary of the barrier isolated through experiments'.
Four dimensions experimentally separated:
1. Repository state: healthy (push, tests)
2. Branch readiness: visible on origin
3. Token liveness: valid (own-fork PR succeeded)
4. Org PR authorization: BLOCKED (FORBIDDEN for both claws)
Reviewer-ready compression:
'The branch is pushable and reviewable, but PR creation into
ultraworkers/claw-code is blocked specifically at the organization
authorization layer, not by repository state or token liveness.'
Doctrine #32 formalized:
'Merge-wait mode actions must be within the agent's capability
envelope. When blocked externally, diagnose by boundary separation
and hand off to the responsible party, not by retry or redefinition.'
Operational protocol:
1. Isolate boundary through experiments (not retry same path)
2. Document separation explicitly (works vs doesn't work)
3. Escalate to responsible party (web UI, org admin, infra)
4. Do NOT retry, conflate, or redefine the failure
Validation: Cycle #117 both-claws blocked, boundary isolated,
escalation path identified.
Cross-claw coherence:
- Cycle #115: 1 claw attempted, 1 succeeded (hypothesis)
- Cycle #117: 2 claws attempted, 2 blocked, IDENTICAL error (confirmed)
Next action path (per gaebal-gajae):
Author/owner intervention via web UI OR org admin OAuth grant.
'기술적 탐사가 아니라 author/owner intervention입니다.'
Doctrine count: 32 formalized.
Gate status: Blocked pending author intervention.
Mode integrity: Preserved throughout cycle #117.
Per cycle #117 cross-claw diagnosis (both claws attempted independently):
Both Jobdori (code-yeongyu) and gaebal-gajae (Yeachan-Heo) hit
identical GraphQL FORBIDDEN error on createPullRequest mutation.
Diagnosis: Organization-wide OAuth app restriction on
ultraworkers/claw-code, not per-identity issue.
Reviewer-ready compression (per gaebal-gajae):
'The branch is now remotely visible and PR-ready, but actual PR
creation is blocked by GitHub permissions rather than repository
state.'
Confirmed state:
- Branch on origin: Yes (cycle #115)
- PR creation CLI path: Blocked for both claws
- Manual web UI: Required
- Org admin OAuth grant: Long-term fix
Gate sequence updated:
1. Branch on origin (DONE, cycle #115)
2. PR creation - BLOCKED at OAuth (cycle #116/#117)
3. Manual web UI PR creation (REQUIRED next)
4. Review cycle
5. Merge signal
6. Phase 1 Bundle 1 (#181 + #183)
Doctrine #32 (provisional, pending gaebal-gajae formal acceptance):
'Merge-wait mode actions must be within the agent's capability
envelope. When blocked externally, diagnose + document + escalate,
not retry.'
Cross-claw validation: Both claws blocked, same error pattern.
Mode integrity: Preserved throughout both attempts.
Next blocker: External human action (manual web UI or org admin).
Per gaebal-gajae cycle #115 validation pass:
Authoritative reframe:
'Cycle #115 was not an exception to merge-wait mode; it was the first
turn where merge-wait mode actually did what merge-wait mode is
supposed to do.'
Reviewer-ready compression:
'The branch was frozen but not yet reviewable because it had never
been pushed; this cycle converted merge-wait from a declared state
into a remotely visible one.'
Mode semantic correction:
- Merge-wait mode is NOT 'do nothing'
- Merge-wait mode IS 'block discovery + enable merge-readiness'
- Push to origin = merge-readiness action (fits mode, not violation)
Doctrine #31 (formalized):
'Merge-wait mode requires remote visibility.'
Protocol: git ls-remote origin <branch> must return commit hash.
If empty: push before claiming review-ready.
Self-process pinpoint #193 (formalized):
'Dogfood process hygiene gap — declared review-ready claims lacked
remote visibility check for 40+ minutes (cycles #109-#114).'
Applies to dogfood methodology, not claw-code binary.
Gate sequence (per gaebal-gajae):
1. Branch on origin (cycle #115, DONE)
2. PR creation (next concrete action)
3. Review cycle
4. Merge signal
5. Phase 1 Bundle 1 kickoff
Doctrine count: 31 total.
Per gaebal-gajae cycle #110 state designation:
'Phase 0 is no longer in discovery mode; it is in merge-wait mode
with Phase 1 already precommitted.'
Mode distinction formalized:
- Discovery mode: probe + file + refine (previous state)
- Merge-wait mode: hold state, await signal (CURRENT)
- Execution mode: land bundles (post-merge state)
Doctrine #30: 'Modes are state, not suggestions.'
Once closure is declared, mode label acts as operational guard.
Future cycles must respect state designation:
- No new probes (that's discovery)
- No new pinpoints (branch frozen)
- No new branches (Phase 0 must merge first)
- Maintain readiness; respond to signal
Mode history for Phase 0:
- Cycle #97: Discovery begins
- Cycle #108: Exhaustion criteria met
- Cycle #109: Closure declared
- Cycle #110: Merge-wait mode formally entered
Current state: MERGE-WAIT MODE. Awaiting signal.
Per gaebal-gajae 11:58 Seoul closure validation.
Authoritative closure statement:
'Phase 0 has finished discovery. Phase 1 should start by landing
the locked contract foundation bundle, not by opening new
exploratory cycles.'
All four exhaustion criteria met:
1. Unaudited surfaces: 9 probed (full coverage)
2. Probe hypothesis: Fully validated (multi-flag 3-4, simple 0-1)
3. Phase 1 docs: PHASE_1_KICKOFF.md + review guide + priority queue
4. Branch hygiene: 39 commits, 564 tests, 0 regressions, freeze held
Doctrine #29 (final): 'Discovery termination is itself a deliverable.'
- Criteria: surfaces probed, hypothesis validated, plan documented,
branch review-ready
- Anti-pattern: infinite probe continuation
- Correct: explicit closure + pivot to execution
Phase 0 / dogfood cycles formally closed. No more probe filings
on this branch. Next work unit is Phase 1 execution, not discovery.
Pending: Phase 0 merge approval → Phase 1 branch creation in
priority order → bundle-by-bundle execution (~10 min per bundle).
Cycle #108 probe of claw skills install/enable/disable yielded 3 pinpoints:
#190: Design decision needed
skills install (no args) routes to help (action: help, kind: skills).
May be intentional (like agents pattern) or design inconsistency.
Requires verification against agents canonical reference.
#191: Classifier gap (filesystem family extension)
skills install /bad/path emits kind=unknown.
Should be kind=filesystem or filesystem_io_error.
Extends #177/#178/#179 install-surface taxonomy.
#192: Classifier gap (unknown-option family extension)
skills install --bogus-flag emits kind=unknown.
Should be kind=cli_parse (like sandbox).
Now 4 members in unknown-option sub-lineage: #186, #187, #189, #192.
Pinpoint count: 82 filed, 67 genuinely open.
Classifier family: 19 members (+2).
All unaudited surfaces now probed:
- Cycles #104-#108: plugins, agents, init, bootstrap-plan, system-prompt,
export, sandbox, dump-manifests, skills
- Hypothesis fully validated: Multi-flag verbs have 3-4 classifier gaps;
simple verbs have 0-1 gaps.
Per freeze doctrine, no code changes. Doc-only filing.
Per gaebal-gajae cycle #107 validation pass. Three refinements:
1. Framings locked (both verb-specific):
#188: 'dump-manifests --help omits the prerequisite that runtime
behavior actually requires.'
#189: 'dump-manifests unknown-option errors still fall through to
unknown instead of the existing CLI-parse path.'
2. Doc-truthfulness family formally split into 2 sub-axes:
- Audit-flow (5 members: #76, #79, #82, #172, #180) — reading one
file vs another declared source of truth
- Probe-flow (NEW, 1 member: #188) — running verb vs observing
--help text
3. Priority refinement:
- #189 → bundled in feat/jobdori-186-189-classifier-sweep (3 verbs)
- #188 → post-#180 (doc parity sequence: USAGE gap → help-text gap)
- Full sequence: #180 (audit-flow doc-truth) → #188 (probe-flow doc-truth)
4. Key cycle #107 outcome (per gaebal-gajae):
'behavior bug처럼 보이던 걸 help-text truthfulness gap으로 정확히 재분류'
This is the reclassification skill that earned the filing.
Doctrine #28: First observation is hypothesis, not filing. Verify
against SCHEMAS/USAGE/--help before classifying axis. Cost: 30-60s
per probe. Benefit: avoid filing not-a-bug pinpoints.
Priority queue now 6 bundles + 3+ independent, all reviewer-blessed.
Cycle #107 probe of claw dump-manifests yielded 2 pinpoints:
#188: Doc-truthfulness gap (NEW sub-axis)
claw dump-manifests --help describes usage as optional flags, but
the verb fails without --manifests-dir or CLAUDE_CODE_UPSTREAM.
USAGE.md is correct; CLI --help output lies by omission.
This is the first doc-truth pinpoint from probe flow (vs audit flow).
New sub-axis: help text vs behavior (prior doc-truth: SCHEMAS/USAGE/README).
#189: Classifier gap (same pattern as #186/#187)
dump-manifests --bogus-flag falls through to kind=unknown.
Should be cli_parse (like sandbox).
Now at 3 verbs in same pattern: system-prompt (#186), export (#187),
dump-manifests (#189). Rename bundle to feat/jobdori-186-189-classifier-sweep.
Pinpoint count: 79 filed, 65 genuinely open.
Doc-truthfulness family: 6 members (was 5).
Classifier unknown-option sub-lineage: 3 members (was 2).
Per freeze doctrine, no code changes. Doc-only filing.
Per gaebal-gajae cycle #106 validation pass. Two refinements:
1. #187 framing locked:
'export unknown-option errors still fall through to unknown,
unlike the already-canonical sandbox CLI-parse path.'
Surgical parallel to #186 framing (cycle #105):
'system-prompt unknown-option errors still fall through to unknown
instead of the existing CLI-parse classification path.'
Same pattern: verb + drift + reference path.
2. #186 and #187 bundled into feat/jobdori-186-187-classifier-sweep.
Rationale: identical fix pattern, identical test pattern, same
source file, 2x review overhead if separated.
Updated merge priority queue (gaebal-gajae reviewer-blessed):
1. feat/jobdori-181-error-envelope-contract-drift (#181 + #183)
2. feat/jobdori-184-cli-contract-hygiene-sweep (#184 + #185)
3. feat/jobdori-186-187-classifier-sweep (#186 + #187)
Doctrine #27: Same-pattern pinpoints should bundle into one classifier
sweep PR. One-pinpoint = one-branch is not universal; batching
same-pattern fixes halves review/merge overhead.
Cycle #106 probe of export and sandbox verbs. Found:
- export --bogus-flag: kind=unknown (should be cli_parse)
- sandbox --bogus-flag: kind=cli_parse (canonical correct)
#187 is direct sibling of #186 (system-prompt classifier gap).
Both unknown-option, both should use cli_parse classifier.
Observation: sandbox has no gaps. export has 1 classifier gap.
Suggests classifier coverage improving on newer verbs, not consistent
regression across unaudited surfaces.
Hypothesis (#104) partially validated: unaudited surfaces yield
pinpoints, but not uniformly. Single-issue verbs (sandbox) may be
cleaner than multi-flag verbs (export, init, bootstrap-plan).
Pinpoint count: 77 filed, 63 genuinely open.
Per freeze doctrine, no code changes. Doc-only filing.
Per gaebal-gajae cycle #105 validation pass. One-liner state summary
now appears at top (tone-setter for reviewers) and bottom (reinforced
recap):
'Phase 0 is now frozen, reviewer-mapped, and merge-ready;
Phase 1 remains intentionally deferred behind the locked priority order.'
This is the single authoritative sentence that captures branch state.
Use it for PR titles, review summaries, and Phase 1 handoff notes.
Why this framing matters (per gaebal-gajae evaluation):
- 'frozen' signals no scope creep
- 'reviewer-mapped' signals audit trail exists (this guide)
- 'merge-ready' signals gates are passed
- 'intentionally deferred' signals Phase 1 absence is by design, not omission
- 'locked priority order' signals sequencing is validated (cycle #104-#105)
Review guide now doubles as merge-enabler: reviewers parse branch state
in one sentence, then drill into commits as needed.
Doc-only. No code changes. Freeze preserved.
Pre-merge documentation for reviewers. Summarizes:
- What Phase 0 tasks deliver (JSON envelope contracts, regression locks)
- Why dogfood cycles #99-#105 matter (validated methodology, 15 filed pinpoints)
- Commit-by-commit navigation for the 30-commit frozen bundle
- What lands vs what's deferred
- Integration notes for Phase 1 planning
- Known limitations + follow-ups
This is doc-only, no code changes. Serves as audit trail and reviewer
reference without adding scope to the frozen feature branch.
Per gaebal-gajae cycle #105 review pass. Three corrections:
1. #184/#185 belong to #171 lineage (CLI contract hygiene sub-family),
NOT a new family. Same enforcement hole pattern on unaudited verbs.
2. #186 locked as member of #169/#170 classifier lineage. Framing:
'system-prompt unknown-option errors still fall through to unknown
instead of the existing CLI-parse classification path.'
3. agents is the #183 reference implementation. Fix path reframed from
'design new contract' to 'align outliers to existing reference'.
Much smaller scope for feat/jobdori-181-error-envelope-contract-drift.
Canonical reference shape locked:
{action: 'help', kind: <verb>, unexpected: <bad-name>, usage: {...}}
Doctrine #24: Pinpoint lineage continuity. Check existing family
before creating new. Reviewers follow pattern lineages.
Family tree corrected: CLI contract hygiene moved from 'NEW' to
'#171 sub-lineage within classifier family'.
Cycle #105 probe of agents/init/bootstrap-plan/system-prompt verbs
(unaudited per cycle #104 hypothesis) yielded 3 pinpoints:
#184: claw init silently accepts unknown positional arguments. Inconsistent
with #171 CLI contract hygiene pattern.
#185: claw bootstrap-plan silently accepts unknown flags. Same family as
#184, different verb, different surface.
#186: claw system-prompt --<unknown> classified as 'unknown' instead of
'cli_parse'. Classifier family member (#182-style).
Bonus observation (not filed): claw agents bogus-action emits the
canonical mcp-style {action: help, unexpected, usage} shape. This is
the shape that #183 wants as canonical, NOT the plugins-style success
envelope. agents is the reference implementation.
Hypothesis validated: unaudited verb surfaces have 2-3x higher pinpoint
yield. Predicted cycle #104-#105 pattern holds.
Pinpoint count: 76 filed, 62 genuinely open.
Final framing pass for cycle #104 plugin lifecycle pinpoints. Three
one-liner framings captured for reviewer consumption:
#181 (HIGH): 'plugins unknown-subcommand errors currently emit on the
success path instead of the JSON error path.'
#183 (HIGH): 'Invalid subcommand handling is not normalized across
plugins and mcp JSON surfaces.'
#182 (MEDIUM): 'Plugin lifecycle failures still fall through to unknown
instead of canonical error kinds.'
Branch sequencing locked:
1. feat/jobdori-181-error-envelope-contract-drift (bundles #181+#183)
2. feat/jobdori-182-plugin-classifier-alignment (#182, post-merge)
Rationale: #181 is root bug, #183 is sibling symptom, #182 is cleanup
that benefits from clean error envelope landing first.
Branch at 27 commits, 227/227 tests, review-ready.
Cycle #104 probe of plugin lifecycle axis (claw plugins + mcp subcommands)
yielded 3 related gaps:
#181: plugins bogus-subcommand returns SUCCESS-shaped envelope with
error buried in 'message' text field. Consumer parsing via
type=='error' check treats it as success. Severe.
#182: plugins install/enable not-found errors classified as 'unknown'
instead of 'plugin_not_found' or 'not_found'. Classifier family member.
#183: plugins and mcp emit DIFFERENT shapes on unknown subcommand.
plugins has reload_runtime+target+message, mcp has unexpected+usage.
Shape parity gap.
All three filed only per freeze doctrine. Proposed separate branches:
- feat/jobdori-181-unknown-subcommand-error-routing (#181 + #183 bundled)
- feat/jobdori-182-plugin-not-found-classifier (#182 standalone)
Pinpoint count: 73 filed, 59 genuinely open. Typed-error family: 14
members. Emission routing family: 1 new member (#181).
Cycle #103 doc-truthfulness audit found USAGE.md incomplete.
Actual CLI has 14 standalone verbs (status, doctor, mcp, skills, agents,
export, init, sandbox, system-prompt, bootstrap-plan, dump-manifests,
help, version, acp).
USAGE.md covers only 3 entry modes (claw REPL, claw prompt TEXT,
claw --resume). Other verbs absent or underdocumented.
Example: USAGE.md says 'start claw, then /doctor' but doesn't explain
that 'claw doctor' is also a standalone entry point (no REPL needed).
Fix: Add 'Standalone commands' section to USAGE.md with all 14 verbs
documented. Include regression test (grep USAGE.md for each verb).
Doc-truthfulness family: #76, #79, #82, #172, #180.
Pinpoint count: 70 filed, 56 genuinely open.
#175 numbering collision between:
- gaebal-gajae's CI framing (filed at ~10:00 via Discord verbally)
- my filesystem classifier filing (#175 per cycle #102 10:02)
Resolution:
- gaebal-gajae's framing reclaims #175 (higher-level workflow gap)
- My filesystem classifier renumbered to #177
- My export enum naming renumbered to #178
All three pinpoints now filed with correct non-colliding numbers:
- #175: CI fmt/test signal decoupling (gaebal-gajae)
- #177: skills install filesystem classifier (Jobdori, was #175)
- #178: export kind naming consistency (Jobdori, was #176)
Typed-error family membership updated accordingly.
Cycle #102 probe of model/skills/export axis found two related gaps:
#175: skills install filesystem errors classified as 'unknown' instead of
'filesystem' (which is in v1.5 enum).
#176: export uses 'filesystem_io_error' kind but this is NOT in v1.5
declared enum (which only lists 'filesystem'). Inconsistent naming.
Both filed only per freeze doctrine. Proposed bundling as
feat/jobdori-175-filesystem-error-family branch.
Family observation: classifier + enum-naming gaps found simultaneously
in filesystem-error axis. Indicates broader unaudited surface.
Pinpoint count: 68 filed, 54 genuinely-open.
Per gaebal-gajae cycle #101 framing pass. Adds stable framing that
captures scope + root cause + visible effect + surface in one line.
Locks branch name: feat/jobdori-174-resume-trailing-cli-parse
Next-branch prep steps documented so post-168c-merge execution is
zero-friction (classifier branch + regression test pattern already
established by #169/#170/#171).
Cycle #101 probe of session-boot axis (prompt misdelivery / resume
lifecycle) found another typed-error classifier gap.
Filed only, not fixed. Per freeze doctrine (cycles #98-#100), no new
code axis added to feat/jobdori-168c-emission-routing.
Pattern: `--resume trailing arguments must be slash commands` classified
as 'unknown' instead of 'cli_parse'. Side effect: #247 hint synthesizer
doesn't trigger, so hint is null.
Same family as #169, #170, #171 (classifier coverage gaps).
Proposed fix: add `--resume trailing arguments` pattern to
classify_error_kind as cli_parse.
Pinpoint count: 66 filed, 52 genuinely-open + #174 new.
Cycle #100 probe of non-classifier axes (event/log opacity) found new
consumer parity gap: JSON mode missing 'hint' field that text mode
provides for config_load_error scenarios.
Filed only, not fixed. Per freeze doctrine (cycles #98-#99), no new axis
added to feat/jobdori-168c-emission-routing. This pinpoint is a Phase 1
scope candidate for a separate branch.
Affects: claw mcp, claw status, claw doctor (JSON mode).
Text mode shows: Hint `claw doctor` classifies config parse errors...
JSON mode shows: no hint field at all.
Consumer impact: claws parsing JSON output can't programmatically route
errors to recovery paths the way text-mode users can with human guidance.
Family: Consumer parity. Related: #247 (hint synthesizer), #169-#172
(classifier family), #172 (doc-truthfulness).
Proposed fix: add 'hint' field to JSON envelope when config_load_error
is present, with hint taxonomy for typed dispatch.
Pinpoint count: 65 filed, 51 genuinely-open + #173 new.
Pinpoint #172: SCHEMAS.md v1.5 Emission Baseline documentation inaccuracy
discovered during cycle #98 probe.
The Phase 1 normalization targets section claimed:
"unify where `action` field appears (only in 4 inventory verbs)"
But reality is only 3 inventory verbs have `action`:
- mcp
- skills
- agents
list-sessions uses `command` instead (the documented 1-of-13 deviation
already captured elsewhere in v1.5 baseline).
This is a doc-truthfulness issue (same family as cycles #76, #79, #82).
Active misdocumentation leads downstream consumers to assume 4-verb
coverage when building adapters/dispatchers.
Changes:
1. SCHEMAS.md: 'only in 4 inventory verbs' → 'only in 3 inventory verbs: mcp, skills, agents'
2. Added regression test `v1_5_action_field_appears_only_in_3_inventory_verbs_172`
- Asserts mcp/skills/agents HAVE action field
- Asserts help/version/doctor/status/sandbox/system-prompt/bootstrap-plan/list-sessions do NOT have action field
- Forces SCHEMAS.md + binary to stay synchronized
Test added:
- `v1_5_action_field_appears_only_in_3_inventory_verbs_172` (8 negative cases + 3 positive cases)
Tests: 227/227 pass (+1 from #172).
Related: #155 (doc parity family), #168c (emission baseline).
Doc-truthfulness family: #76, #79, #82, #172.
Pinpoint #171: typed-error classifier gap discovered during #141 probe cycle #97.
`claw list-sessions --help` emits:
error: unexpected extra arguments after `claw list-sessions`: --help
This format is used by multiple verbs that reject trailing positional args:
- list-sessions
- plugins (subcommands)
- config (subcommands)
- diff
- load-session
Before fix:
{"error": "unexpected extra arguments after `claw list-sessions`: --help",
"hint": null,
"kind": "unknown",
"type": "error"}
After fix:
{"error": "unexpected extra arguments after `claw list-sessions`: --help",
"hint": "Run `claw --help` for usage.",
"kind": "cli_parse",
"type": "error"}
The pattern `unexpected extra arguments after \`claw` is specific enough
that it won't hijack generic prose mentioning "unexpected extra arguments"
in other contexts (sanity test included).
Side benefit: like #169/#170, correctly classified cli_parse errors now
auto-trigger the #247 hint synthesizer.
Related #141 gap not yet closed: `claw list-sessions --help` still errors
instead of showing help (requires separate parser fix to recognize --help
as a distinct path). This classifier fix at least makes the error surface
typed correctly so consumers can distinguish "parse failure" from "unknown"
and potentially retry without the --help flag.
Test added:
- `classify_error_kind_covers_unexpected_extra_args_171` (4 positive cases
+ 1 sanity guard)
Tests: 226/226 pass (+1 from #171).
Typed-error family: #121, #127, #129, #130, #164, #169, #170, #247.
Cycle #96 dogfood found practical install-experience gap in USAGE.md.
#153 closed by commit 6212f17 (same branch, same cycle).
Part of discoverability family (#155, help/USAGE parity).
Pinpoint count: 62 filed, 51 genuinely-open + #153 closed this cycle.
Pinpoint #153 closure. USAGE.md was missing practical instructions for:
1. Adding the claw binary to PATH (symlink vs export PATH)
2. Verifying the install works (version, doctor, --help)
3. Troubleshooting PATH issues (which, echo $PATH, ls -la)
New subsections:
- "Add binary to PATH" with two common options
- "Verify install" with post-install health checks
- Troubleshooting guide for common failures
Target audience: developers building from source who want to run `claw`
from any directory without typing `./rust/target/debug/claw`.
Discovered during cycle #96 dogfood (10-min reminder cycle).
Tests: 225/225 still pass (doc-only change).
Pinpoint #170: Extended typed-error classifier coverage gap discovered during
dogfood probe 2026-04-23 07:30 Seoul (cycle #95).
The #169 comment claimed to cover `--permission-mode bogus` via the
`unsupported value for --` pattern, but the actual `parse_permission_mode_arg`
message format is `unsupported permission mode 'bogus'` (NO `for --` prefix).
Doc-vs-reality lie in the #169 fix itself — fixed here.
Four classifier gaps closed:
1. `unsupported permission mode '<value>'` → cli_parse
(from: `parse_permission_mode_arg`)
2. `invalid value for --reasoning-effort: '<value>'; must be ...` → cli_parse
(from: `--reasoning-effort` validator)
3. `model string cannot be empty` → cli_parse
(from: empty --model rejection)
4. `slash command /<name> is interactive-only. Start \`claw\` ...` →
slash_command_requires_repl (NEW kind — more specific than cli_parse)
The fourth pattern gets its own kind (`slash_command_requires_repl`) because
it's a command-mode misuse, not a parse error. Downstream consumers can
programmatically offer REPL-launch guidance.
Side benefit: like #169, the correctly classified cli_parse errors now
auto-trigger the #247 hint synthesizer ("Run `claw --help` for usage.").
Test added:
- `classify_error_kind_covers_flag_value_parse_errors_170_extended`
(4 positive cases + 2 sanity guards)
Tests: 225/225 pass (+1 from #170).
Typed-error family: #121, #127, #129, #130, #164, #169, #247.
Discovered via systematic probe angle: 'error message pattern audit' \u2014
grep each error emission for pattern, confirm classifier matches.
Pinpoint #169: typed-error classifier gap discovered during dogfood probe.
`claw --output-format json --output-format xml doctor` was emitting:
{"error": "unsupported value for --output-format: xml ...",
"hint": null,
"kind": "unknown",
"type": "error"}
After fix:
{"error": "unsupported value for --output-format: xml ...",
"hint": "Run `claw --help` for usage.",
"kind": "cli_parse",
"type": "error"}
The change adds two new classifier branches to `classify_error_kind`:
1. `unsupported value for --` → cli_parse
2. `missing value for --` → cli_parse
Covers all `CliOutputFormat::parse` / `parse_permission_mode_arg` rejections
and any future flag-value validation messages using the same pattern.
Side benefit: the #247 hint synthesizer ("Run `claw --help` for usage.")
now triggers automatically because the error is now correctly classified
as cli_parse. Consumers get both correct kind AND helpful hint.
Test added:
- `classify_error_kind_covers_flag_value_parse_errors_169` (4 positive +
1 sanity case)
Tests: 224/224 pass (+1 from #169).
Discovered during dogfood probe 2026-04-23 07:00 Seoul, cycle #94.
Refs: #169, typed-error family (#121, #127, #129, #130, #164, #247)
Pinpoint #155: USAGE.md was missing documentation for three interactive
commands that appear in `claw --help`:
- /ultraplan [task]
- /teleport <symbol-or-path>
- /bughunter [scope]
Also adds full documentation for other underdocumented commands:
- /commit, /pr, /issue, /diff, /plugin, /agents
Converts inline sentence list into structured section 'Interactive slash
commands (inside the REPL)' with brief descriptions for each command.
Closes#155 gap: discovered during dogfood probing of help/USAGE parity.
No code changes. Pure documentation update.
Phase 0 Task 4 of the JSON Productization Program: CI shape parity guard.
This test locks the v1.5 emission baseline (documented in SCHEMAS.md § v1.5
Emission Baseline) so any future PR that introduces shape drift in a documented
verb fails this test at PR time.
Complements Task 2 (no-silent guarantee) by asserting SPECIFIC top-level key
sets, not just 'stdout is non-empty valid JSON'. If a verb adds/removes a
top-level field, this test fails with a clear error message pointing to
SCHEMAS.md § v1.5 Emission Baseline for update guidance.
Coverage:
- 8 success-path verbs with locked shape (help, version, doctor, skills,
agents, system-prompt, bootstrap-plan, list-sessions)
- 2 error-path cases with locked error envelope shape (prompt-no-arg, doctor --foo)
Key enforcement rules:
- Success envelope: exact key set match per verb
- Error envelope: {error, hint, kind, type} (4 keys, all verbs)
- list-sessions deliberately kept as {command, sessions} (Phase 1 target)
Test design intent:
- Locks CURRENT (possibly imperfect) shape, NOT target shape
- Forces PR authors to update both code + SCHEMAS.md + test together
- Makes Phase 1 shape normalization PRs visible: 'update this test'
Phase 0 now COMPLETE:
- Task 1 ✅ Stream routing fix (cycle #89)
- Task 2 ✅ No-silent guarantee (cycle #90)
- Task 3 ✅ Per-verb emission inventory SCHEMAS.md (cycle #91)
- Task 4 ✅ CI shape parity guard (this cycle)
Tests: 18 output_format_contract tests all pass (+1 from Task 4).
v1.5 emission baseline now locked by code + tests + docs.
Refs: #168c, cycle #92, Phase 0 Task 4 (final)
Under --output-format json, error envelopes were emitted to stderr via
eprintln!. This violated the emission contract: stdout should carry the
contractual envelope (success OR error); stderr is reserved for
non-contractual diagnostics.
Cycle #87 controlled matrix audit found bootstrap/dump-manifests/state
exhibited this pattern (exit 1, stdout 0 bytes, stderr N bytes under
--output-format json).
Fix: change eprintln! to println! for the JSON error envelope path in main().
Text mode continues to route errors to stderr (conventional).
Verification:
- bootstrap --output-format json: stdout now carries envelope, exit 1
- dump-manifests --output-format json: stdout now carries envelope, exit 1
- Text mode: errors still on stderr with [error-kind: ...] prefix (no regression)
Tests:
- Updated assert_json_error_envelope helper to read from stdout (was stderr)
- Added error_envelope_emitted_to_stdout_under_output_format_json_168c
regression test that asserts envelope on stdout + non-JSON on stderr
- All 16 output_format_contract tests pass
Phase 0 Task 1 complete: emission routing fixed across all error-path verbs.
Phase 0 Task 2 (no-silent CI guarantee) remains.
Refs: #168c (cycle #87 filing), cycle #88 emission contract framing
Fresh-dogfood validation (cycle #84, #168) proved the original locus premise was
underspecified. v1.0 was never a coherent contract — each verb has a bespoke JSON
shape with no coordination, and bootstrap JSON is completely broken (silent
failure, exit 0 no output).
Revised migration plan:
- Phase 0 (NEW): Emergency fix for silent failures (#168 bootstrap JSON)
- Phase 1 (NEW): v1.5 baseline — minimal JSON invariants across all 14 verbs
- Every command emits valid JSON with --output-format json
- Every command has top-level 'kind' field for verb ID
- Every error envelope follows {error, hint, kind, type}
- Phase 2 (renamed from Phase 1): v2.0 wrapped envelope (opt-in)
- Phase 3 (renamed from Phase 2): v2.0 default
- Phase 4 (renamed from Phase 3): v1.0/v1.5 deprecation
Rationale:
- Can't migrate from 'incoherent' to 'coherent v2.0' in one jump
- Consumers need stable target (v1.5) to transition from
- Silent failures must be fixed BEFORE migration (consumers can't detect breakage)
Effort revision: ~9 dev-days (Phase 0: 1 + Phase 1: 3 + Phase 2: 5) vs original
~6 dev-days for direct v1.0→v2.0 (which would have failed).
Doctrine implication: Fresh-dogfood principle (#9, cycle #73) prevented a multi-day
migration from hitting an unsolvable baseline problem. Evidence-backed mid-design
correction.
Fresh dogfood validation (cycle #84) revealed the binary v1.0 envelope is NOT
consistent across commands:
- list-sessions: {command, sessions}
- doctor: {checks, kind, message, ...}
- bootstrap: (no JSON output at all)
- mcp: {action, kind, status, ...}
Each command has a custom JSON shape. Bootstrap's JSON path is completely broken
(exit 0 but no output). This is not 'v1.0 vs v2.0 design difference' — it's
'no consistent v1.0 ever existed'.
This explains why #164 (envelope migration) is blocked on design: the 'v1.0 from'
was never coherent. The real task is not 'migrate v1.0 to v2.0' but 'migrate
incoherent-per-command shapes to coherent-common-envelope'.
Implications for cycles #76–#82: The P0 doc fixes were correct to mark SCHEMAS.md
as 'aspirational' because the binary never had a consistent contract to document.
The deeper issue: each verb renderer was written independently with no envelope
coordination.
Three options proposed:
- A: accept per-command shapes (status quo + documentation)
- B: enforce common wrapper (FIX_LOCUS_164 full approach)
- C: hybrid (document current incoherence, then migrate 3 pilot verbs)
Recommendation: Option C. Documents truth immediately, enables phased migration.
This filing resolves the #164 design blocker: now we understand what we're
migrating from.
SCHEMAS.md locks JSON envelope contract for all 14 clawable commands.
No corresponding contract for text output (--output-format text).
Text output is ad hoc per-command: no documented format, no column ordering
guarantee, no stability contract. Claws parsing text output have no safety.
Filed as discovery gap from systematic doc audit (cycle #83). Design options:
- Option A: Document text contracts (parallel to JSON) — 4 dev-days
- Option B: Declare text unstable, point to JSON — 1 dev-day (recommended)
- Option C: Defer until post-#164 JSON migration
Related to #164 (JSON migration) and #250 (surface parity audit).
The aspirational SCHEMAS.md doc (v2.0 target) was the source of truth misdocumentation.
Three downstream docs (USAGE, ERROR_HANDLING, CLAUDE) inherited the false claim that
v1.0 binary emits common fields it doesn't actually emit.
Fixing SCHEMAS.md at the source eliminates the root cause for all four P0 instances.
Doc-truthfulness P0 family now complete: 4/4 closed, root cause identified + fixed.
All fixes shipped within 6 cycles (#76 audit → #82 execution).
SCHEMAS.md was presenting the target v2.0 schema as the current binary contract.
This is the source of truth document, so the misdocumentation propagated to every
downstream doc (USAGE.md, ERROR_HANDLING.md, CLAUDE.md all inherited the false
premise that v1.0 includes timestamp/command/exit_code/etc).
Fixed with:
1. CRITICAL header at top: marks entire doc as v2.0 target, not v1.0 reality
2. 'TARGET v2.0 SCHEMA' headers on Common Fields section
3. Comprehensive Appendix: v1.0 actual shape + migration timeline + v1.0 code example
4. Links to FIX_LOCUS_164.md + ERROR_HANDLING.md for v1.0 reality
5. FAQ: clarifies the version mismatch and when v2.0 ships
This closes the fourth P0 doc-truthfulness instance (4/4 in family):
- #78 USAGE.md: active misdocumentation (fixed#78)
- #79 ERROR_HANDLING.md: copy-paste trap (fixed#79)
- #165 CLAUDE.md: boundary collapse (fixed#81)
- #166 SCHEMAS.md: aspirational source doc (fixed#82)
Pattern is now crystallized: SCHEMAS.md was the aspirational source;
three downstream docs (USAGE, ERROR_HANDLING, CLAUDE) inherited the false v2.0-as-v1.0
claim. Fix the source (SCHEMAS.md), which eliminates the root cause for all four.
CLAUDE.md Option A implemented. P0 doc-truthfulness family now at 3 closed +
0 open (all 3 fixed within the same dogfood session).
Taxonomy refinement added: P0 doc-truthfulness has three distinct subclasses:
- active misdocumentation (false sentence) — USAGE.md cycle #78
- copy-paste trap (broken example code) — ERROR_HANDLING.md cycle #79
- target/current boundary collapse (v2.0 as v1.0) — CLAUDE.md cycle #81
All three related to #164 (envelope divergence). Root cause consistent across
family; remedies differ per subclass.
CLAUDE.md was documenting the v2.0 target schema as if it were current binary
behavior. This misled validator/harness implementers into assuming the Rust
binary emits timestamp, command, exit_code, output_format, schema_version fields
when it doesn't.
Fixed by explicitly marking the boundary:
1. SCHEMAS.md section: now clearly labels 'target v2.0 design' and lists both
v1.0 (actual binary) and v2.0 (target) field shapes
2. Clawable commands requirements: now explicitly separates v1.0 (current) and
v2.0 (post-FIX_LOCUS_164) envelope requirements
3. Added inline migration note pointing to FIX_LOCUS_164.md
This closes#165 as the third P0 doc-truthfulness fix (Option A: preserve current
truth, add v2.0 target as separate labeled section).
P0 doc-truthfulness family pattern (all three related to #164 envelope divergence):
- #78 USAGE.md: active misdocumentation (fixed cycle #78)
- #79 ERROR_HANDLING.md: copy-paste trap (fixed cycle #79)
- #165 CLAUDE.md: target/current boundary collapse (fixed cycle #81)
CLAUDE.md claims 'Common fields (all envelopes): timestamp, command, exit_code,
output_format, schema_version' but the actual binary v1.0 doesn't emit these.
This is aspirational (v2.0 target from SCHEMAS.md) documented as current behavior
in a file that's supposed to describe the Python reference harness.
Filed as 3rd member of doc-truthfulness P0 family (joins #78, #79).
Both options documented: update CLAUDE.md for v1.0 OR clarify it's v2.0 aspirational.
Recommendation: Option A (keep CLAUDE.md truthful about actual validation).
Part of broader #164 family (envelope schema divergence across all docs).
Formalizes a 4-level severity scale for documentation-vs-implementation divergence:
- P0: Active misdocumentation (consumer code breaks) — immediate fix
- P1: Stale docs (consumer confused) — high priority
- P2: Incomplete docs (friction, eventual success) — medium
- P3: Terminology drift (confusion but survivable) — low
Parallel to diagnostic-strictness scale (cycles #57–#69). Both are
'truth-over-convenience' constraints.
Evidence: cycles #78–#79 found 2 P0 instances in USAGE.md and ERROR_HANDLING.md,
both related to JSON envelope shape. Root cause: SCHEMAS.md is aspirational (v2.0),
binary still emits v1.0, docs needed to be empirical not aspirational.
Going forward: doc audits compare against actual binary, flag P0 violations
immediately, link forward to migration plans (FIX_LOCUS_164.md).
The Python code examples were accessing nested error.kind like envelope['error']['kind'],
but v1.0 emits flat envelopes with error as a STRING and kind at top-level.
Updated:
- Table header: now shows actual v1.0 shape {error: "...", kind: "...", type: "error"}
- match statement: switched from envelope.get('error',{}).get('kind') to envelope.get('kind')
- All ClawError raises: changed from envelope['error']['message'] to envelope.get('error','')
because error field is a STRING in v1.0, not a nested object
- Added inline comments on every error case noting v1.0 vs v2.0 difference
- Appendix: split into v1.0 (actual/current) and v2.0 (target after FIX_LOCUS_164)
The code examples now work correctly against the actual binary.
This was active misdocumentation (P0 severity) — the Python examples would crash
if a consumer tried to use them.
The JSON output section was misleading — it claimed the binary emits
exit_code, command, timestamp, output_format, schema_version, and nested
error objects. The binary actually emits v1.0 flat shape (kind at top-level,
error as string, no common metadata fields).
Updated section:
- Documents actual v1.0 success and error envelope shapes
- Lists known issues (missing fields, overloaded kind, flat error)
- Shows how to dispatch on v1.0 (check type=='error' before reading kind)
- Warns users NOT to rely on kind alone
- Links to FIX_LOCUS_164.md for migration plan
- Explains Phase 1/2/3 timeline for v2.0 adoption
This is a doc-only fix that makes USAGE.md truthful about the current behavior
while preparing users for the coming schema migration.
Binary emits different envelope shape than SCHEMAS.md documents:
- Missing: timestamp, command, exit_code, output_format, schema_version
- Wrong placement: kind is top-level, not nested under error
- Extra: type:error field not in schema
- Wrong type: error is string, not object with operation/target/retryable
Additional issue: 'kind' field is semantically overloaded (verb-id in
success envelopes, error-kind in error envelopes) — violates typed contract.
Filed as 7th member of typed-error family (joins #102, #121, #127, #129, #130, #245).
Recommended fix: Option A — update binary to match schema (principled design).
Attempted cherry-pick of #248 (1 commit) onto main. Encountered 2 conflict zones
in main.rs (test definitions + error classification). Manual regex cleanup left
orphaned diff markers that Rust compiler rejected.
Decision: Rebase-bridge works for 1-conflict branches, but 2+ conflicts in 12K+-line
files require author context. Revised strategy: push main to origin, request branch
authors rebase locally with IDE support, then merge from updated origin branches.
Estimated timeline: 30 min for branch authors to rebase 8 branches in parallel.
Fresh dogfood found no new pinpoints. All core verbs working correctly.
Blocker: 8 remaining review-ready branches on origin have conflicts with
cycle #72's 4 merges. Root cause: remote branches predated the merge chain.
Example: feat/jobdori-127-verb-suffix-flags rebase fails on commit 3/3
because cycle #72 added 15+ new LocalHelpTopic variants.
Recommend: coordinate with branch authors to rebase against new main.
Cycle #74 will post integration checkpoint + queue status.
Backlog-truthfulness (cycle #60) validated: fresh dogfood on current main confirmed
#163 was closed by cycle #72's help-parity chain merge. Zero duplicate work.
Cleanup: removed /tmp/jobdori-163 worktree and fix/jobdori-163-help-help-selfref branch.
Fix: resolve actual HEAD path in git worktrees for correct Git SHA in build metadata.
In worktrees, .git is a pointer file not a directory, so cargo's rerun-if-changed=.git/HEAD never triggers.
Per MERGE_CHECKLIST.md Cluster 2 (P1 Diagnostic-strictness, isolated):
- 25 lines in build.rs only (no crate-level conflicts)
- Verified: build → commit → rebuild → SHA updates correctly
Diagnostic-strictness family member (joins #122/#122b).
Applied: execution artifact runbook. Cycle #72 integration.
Provides:
- Recommended merge order (P0 → P1 → P2 → P3 by cluster)
- Per-cluster merge prerequisites and validation steps
- Conflict risk assessment (Cluster 2 #122/#122b have same edit locus)
- Post-merge validation checklist (build + test + dogfood)
- Timeline estimate (~60 min for full 17-branch queue)
Addresses the final integration step: once branches are reviewed, knowing
the safe merge order matters. This artifact pre-answers that question.
Applied doctrine: integration-support artifacts (cycle #64) reduce reviewer
friction. At 17-branch saturation, a merge-safe checklist is first-class work.
Relates to cycle #70 integration throughput initiative.
Problem: In git worktrees, .git is a pointer file (not a directory), so cargo's
rerun-if-changed=.git/HEAD never triggers when commits are made. This causes
claw version to report a stale SHA after new commits.
Solution: Add resolve_git_head_path() helper that detects worktree mode:
- If .git is a file: parse gitdir pointer, watch <gitdir>/HEAD
- If .git is a directory: watch .git/HEAD (regular repo)
This ensures build.rs invalidates on each commit, making version output truthful.
Verification: Binary built in worktree now reports correct SHA after commits
(before: stale, after: current HEAD).
Relates to ROADMAP #161 (filed cycle #65, implemented cycle #69).
Diagnostic-strictness family member.
Diff: 21 lines added (resolve_git_head_path + conditional rerun-if-changed).
Verbs with CLI-reserved positional-arg meanings (resume, compact, memory,
commit, pr, issue, bughunter) were falling through to Prompt dispatch
when invoked with args, causing users to see 'missing_credentials' errors
instead of guidance that the verb is a slash command.
#160 investigation revealed the underlying design question: which verbs
are 'promptable' (can start a prompt like 'explain this pattern') vs.
'reserved' (have specific CLI meaning like 'resume SESSION_ID')?
This fix implements the reserved-verb classification: at parse time,
intercept reserved verbs with trailing args and emit slash-command guidance
before falling through to Prompt. Promptable verbs (explain, bughunter, clear)
continue to route to Prompt as before.
Helper: is_reserved_semantic_verb() lists the reserved set.
All 181 tests pass (no regressions).
## What Was Broken
`claw doctor` reported "Status: ok" when run from ~/ or /, but `claw
prompt` in the same directory would error out with:
error: claw is running from a very broad directory (/Users/yeongyu).
The agent can read and search everything under this path.
Diagnostic deception: doctor said green, prompt said red. User runs
doctor to check their setup, sees all green, runs prompt, gets blocked.
Trust in doctor erodes.
This is the exact pattern captured in the 'Diagnostic Commands Must Be
At Least As Strict As Runtime Commands' principle recorded in ROADMAP.md
at cycle #57.
## Root Cause
Two code paths perform the broad-cwd check:
- CliAction::Prompt handler → `enforce_broad_cwd_policy()` (errors out)
- CliAction::Repl handler → same function
But render_doctor_report() never called detect_broad_cwd(). The workspace
health check only looked at whether cwd was inside a git project, not
whether cwd was a dangerously broad path.
## What This Fix Does
Extend `check_workspace_health()` to also probe `detect_broad_cwd()`:
let broad_cwd = detect_broad_cwd();
let (level, summary) = match (in_repo, &broad_cwd) {
(_, Some(path)) => (
DiagnosticLevel::Warn,
format!(
"current directory is a broad path ({}); Prompt/REPL will \
refuse to run here without --allow-broad-cwd",
path.display()
),
),
(true, None) => (DiagnosticLevel::Ok, "project root detected"),
(false, None) => (DiagnosticLevel::Warn, "not inside a git project"),
};
The check now warns about BOTH failure modes with clear messaging about
what Prompt/REPL will do.
## Dogfood Verification
Before fix:
$ cd ~ && claw doctor
Workspace
Status warn
Summary current directory is not inside a git project
[all green otherwise]
$ echo | claw prompt "test"
error: claw is running from a very broad directory (/Users/yeongyu)...
After fix:
$ cd ~ && claw doctor
Workspace
Status warn
Summary current directory is a broad path (/Users/yeongyu);
Prompt/REPL will refuse to run here without
--allow-broad-cwd
$ cd / && claw doctor
Workspace
Status warn
Summary current directory is a broad path (/); ...
Non-regression:
$ cd /tmp/my-project && claw doctor
Workspace
Status warn
Summary current directory is not inside a git project
(unchanged)
$ cd /path/to/real/git/project && claw doctor
Workspace
Status ok
Summary project root detected on branch main
(unchanged)
## Regression Tests Added
- `workspace_check_in_project_dir_reports_ok` — non-broad + in-project = OK
- `workspace_check_outside_project_reports_warn` — non-broad + not-in-project = Warn with 'not inside git project' summary
- 181 binary tests pass (was 179, added 2)
## Related
- Principle: 'Diagnostic Commands Must Be At Least As Strict As Runtime
Commands' (ROADMAP.md cycle #57)
- Companion to #122 (stale-base preflight in doctor)
- Sibling: next step is probably a full runtime-vs-doctor audit for
other asymmetries (auth, sandbox, plugins, hooks)
## What Was Broken (ROADMAP #130e, filed cycle #53)
Three subcommands leaked `missing_credentials` errors when called
with `--help`:
$ claw help --help
[error-kind: missing_credentials]
error: missing Anthropic credentials...
$ claw submit --help
[error-kind: missing_credentials]
error: missing Anthropic credentials...
$ claw resume --help
[error-kind: missing_credentials]
error: missing Anthropic credentials...
This is the same dispatch-order bug class as #251 (session verbs).
The parser fell through to the credential check before help-flag
resolution ran. Critical discoverability gap: users couldn't learn
what these commands do without valid credentials.
## Root Cause (Traced)
`parse_local_help_action()` (main.rs:1260) is called early in
`parse_args()` (main.rs:1002), BEFORE credential check. But the
match statement inside only recognized:
status, sandbox, doctor, acp, init, state, export, version,
system-prompt, dump-manifests, bootstrap-plan, diff, config.
`help`, `submit`, `resume` were NOT in the list, so the function
returned `None`, and parsing continued to credential check which
then failed.
## What This Fix Does
Same pattern as #130c (diff) and #130d (config):
1. **LocalHelpTopic enum extended** with Meta, Submit, Resume variants
2. **parse_local_help_action() extended** to map the three new cases
3. **Help topic renderers added** with accurate usage info
Three-line change to parse_local_help_action:
"help" => LocalHelpTopic::Meta,
"submit" => LocalHelpTopic::Submit,
"resume" => LocalHelpTopic::Resume,
Dispatch order (parse_args):
1. --resume parsing
2. parse_local_help_action() ← NOW catches help/submit/resume --help
3. parse_single_word_command_alias()
4. parse_subcommand() ← Credential check happens here
## Dogfood Verification
Before fix (all three):
$ claw help --help
[error-kind: missing_credentials]
error: missing Anthropic credentials...
After fix:
$ claw help --help
Help
Usage claw help [--output-format <format>]
Purpose show the full CLI help text (all subcommands, flags, environment)
...
$ claw submit --help
Submit
Usage claw submit [--session <id|latest>] <prompt-text>
Purpose send a prompt to an existing managed session
Requires valid Anthropic credentials (when actually submitting)
...
$ claw resume --help
Resume
Usage claw resume [<session-id|latest>]
Purpose restart an interactive REPL attached to a managed session
...
## Non-Regression Verification
- `claw help` (no --help) → still shows full CLI help ✅
- `claw submit "text"` (with prompt) → still requires credentials ✅
- `claw resume` (bare) → still emits slash command guidance ✅
- All 180 binary tests pass ✅
- All 466 library tests pass ✅
## Regression Tests Added (6 assertions)
- `help --help` → routes to HelpTopic(Meta)
- `submit --help` → routes to HelpTopic(Submit)
- `resume --help` → routes to HelpTopic(Resume)
- Short forms: `help -h`, `submit -h`, `resume -h` all work
## Pattern Note
This is Category A of #130e (dispatch-order bugs). Same class as #251.
Category B (surface-parity: plugins, prompt) will be handled in a
follow-up commit/branch.
## Help-Parity Sweep Status
After cycle #52 (#130c diff, #130d config), help sweep revealed:
| Command | Before | After This Commit |
|---|---|---|
| help --help | missing_credentials | ✅ Meta help |
| submit --help | missing_credentials | ✅ Submit help |
| resume --help | missing_credentials | ✅ Resume help |
| plugins --help | "Unknown action" | ⏳ #130e-B (next) |
| prompt --help | wrong help | ⏳ #130e-B (next) |
## Related
- Closes #130e Category A (dispatch-order help fixes)
- Same bug class as #251 (session verbs)
- Stacks on #130d (config help) on same worktree branch
- #130e Category B (plugins, prompt) queued for follow-up
## What Was Broken (ROADMAP #130d, filed cycle #52)
`claw config --help` was silently ignored — the command executed and
displayed the config dump instead of showing help:
$ claw config --help
Config
Working directory /private/tmp/dogfood-probe-47
Loaded files 0
Merged keys 0
(displays full config, not help)
Expected: help for the config command. Actual: silent acceptance of
`--help`, runs config display anyway.
This is the opposite outlier from #130c (which rejected help with an
error). Together they form the help-parity anomaly:
- #130c `diff --help` → error (rejects help)
- #130d `config --help` → silent ignore (runs command, ignores help)
- Others (status, mcp, export) → proper help
- Expected behavior: all commands should show help on `--help`
## Root Cause (Traced)
At main.rs:1050, the `"config"` parser arm parsed arguments positionally:
"config" => {
let tail = &rest[1..];
let section = tail.first().cloned();
// ... ignores unrecognized args like --help silently
Ok(CliAction::Config { section, ... })
}
Unlike the `diff` arm (#130c), `config` had no explicit check for
extra args. It positionally parsed the first arg as an optional
`section` and silently accepted/ignored any trailing arg, including
`--help`.
## What This Fix Does
Same pattern as #130c (help-surface parity):
1. **LocalHelpTopic enum extended** with new `Config` variant
2. **parse_local_help_action() extended** to map `"config"` → `LocalHelpTopic::Config`
3. **config arm guard added**: check for help flag before parsing section
4. **Help topic renderer added**: human-readable help text for config
Fix locus at main.rs:1050:
"config" => {
// #130d: accept --help / -h and route to help topic
if rest.len() >= 2 && is_help_flag(&rest[1]) {
return Ok(CliAction::HelpTopic(LocalHelpTopic::Config));
}
let tail = &rest[1..];
// ... existing parsing continues
}
## Dogfood Verification
Before fix:
$ claw config --help
Config
Working directory ...
Loaded files 0
(no help, runs config)
After fix:
$ claw config --help
Config
Usage claw config [--cwd <path>] [--output-format <format>]
Purpose merge and display the resolved configuration
Options --cwd overrides the workspace directory
Output loaded files and merged key-value pairs
Formats text (default), json
Related claw status · claw doctor · claw init
Short form `claw config -h` also works.
## Non-Regression Verification
- `claw config` (no args) → still displays config dump ✅
- `claw config permissions` (section arg) → still works ✅
- All 180 binary tests pass ✅
- All 466 library tests pass ✅
## Regression Tests Added (4 assertions)
- `config --help` → routes to `HelpTopic(LocalHelpTopic::Config)`
- `config -h` (short form) → routes to help topic
- bare `config` (no args) → still routes to `Config` action
- `config permissions` (with section) → still works correctly
## Pattern Note
#130c and #130d form a pair: two outlier failure modes in help
handling for local introspection commands:
- #130c `diff` rejected help (loud error) → fixed with guard + routing
- #130d `config` silently ignored help (silent accept) → fixed with same pattern
Both are now consistent with the rest of the CLI (status, mcp, export, etc.).
## Related
- Closes #130d (config help discoverability gap)
- Completes help-parity family (#130c, #130d)
- Stacks on #130c (diff help fix) on same worktree branch
- Part of help-consistency thread (#141 audit)
## What Was Broken (ROADMAP #130c, filed cycle #50)
`claw diff --help` was rejected with:
[error-kind: unknown]
error: unexpected extra arguments after `claw diff`: --help
Other local introspection commands accept --help fine:
- `claw status --help` → shows help ✅
- `claw mcp --help` → shows help ✅
- `claw export --help` → shows help ✅
- `claw diff --help` → error ❌ (outlier)
This is a help-surface parity bug: `diff` is the only local command
that rejects --help as "extra arguments" before the help detector
gets a chance to run.
## Root Cause (Traced)
At main.rs:1063, the `"diff"` parser arm rejected ALL extra args:
"diff" => {
if rest.len() > 1 {
return Err(format!("unexpected extra arguments after `claw diff`: {}", ...));
}
Ok(CliAction::Diff { output_format })
}
When parsing `["diff", "--help"]`, `rest.len() > 1` was true (length
is 2) and `--help` was rejected as extra argument.
Other commands (status, sandbox, doctor, init, state, export, etc.)
routed through `parse_local_help_action()` which detected
`--help` / `-h` and routed to a LocalHelpTopic. The `diff` arm
lacked this guard.
## What This Fix Does
Three minimal changes:
1. **LocalHelpTopic enum extended** with new `Diff` variant
2. **parse_local_help_action() extended** to map `"diff"` → `LocalHelpTopic::Diff`
3. **diff arm guard added**: check for help flag before extra-args validation
4. **Help topic renderer added**: human-readable help text for diff command
Fix locus at main.rs:1063:
"diff" => {
// #130c: accept --help / -h as first argument and route to help topic
if rest.len() == 2 && is_help_flag(&rest[1]) {
return Ok(CliAction::HelpTopic(LocalHelpTopic::Diff));
}
if rest.len() > 1 { /* existing error */ }
Ok(CliAction::Diff { output_format })
}
## Dogfood Verification
Before fix:
$ claw diff --help
[error-kind: unknown]
error: unexpected extra arguments after `claw diff`: --help
After fix:
$ claw diff --help
Diff
Usage claw diff [--output-format <format>]
Purpose show local git staged + unstaged changes
Requires workspace must be inside a git repository
...
And `claw diff -h` (short form) also works.
## Non-Regression Verification
- `claw diff` (no args) → still routes to Diff action correctly
- `claw diff foo` (unknown arg) → still rejected as "unexpected extra arguments"
- `claw diff --output-format json` (valid flag) → still works
- All 180 binary tests pass
- All 466 library tests pass
## Regression Tests Added (4 assertions)
- `diff --help` → routes to HelpTopic(LocalHelpTopic::Diff)
- `diff -h` (short form) → routes to HelpTopic(LocalHelpTopic::Diff)
- bare `diff` → still routes to Diff action
- `diff foo` (unknown arg) → still errors with "extra arguments"
## Pattern
Follows #141 help-consistency work (extending LocalHelpTopic to
cover more subcommands). Clean surface-parity fix: identify the
outlier, add the missing guard. Low-risk, high-clarity.
## Related
- Closes #130c (diff help discoverability gap)
- Stacks on #130b (filesystem context) and #251 (session dispatch)
- Part of help-consistency thread (#141 audit, #145 plugins wiring)
## What Was Broken (ROADMAP #130b, filed cycle #47)
In a fresh workspace, running:
claw export latest --output /private/nonexistent/path/file.jsonl --output-format json
produced:
{"error":"No such file or directory (os error 2)","hint":null,"kind":"unknown","type":"error"}
This violates the typed-error contract:
- Error message is a raw errno string with zero context
- Does not mention the operation that failed (export)
- Does not mention the target path
- Classifier defaults to "unknown" even though the code path knows
this is a filesystem I/O error
## Root Cause (Traced)
run_export() at main.rs:~6915 does:
fs::write(path, &markdown)?;
When this fails:
1. io::Error propagates via ? to main()
2. Converted to string via .to_string() in error handler
3. classify_error_kind() cannot match "os error" or "No such file"
4. Defaults to "kind": "unknown"
The information is there at the source (operation name, target path,
io::ErrorKind) but lost at the propagation boundary.
## What This Fix Does
Three changes:
1. **New helper: contextualize_io_error()** (main.rs:~260)
Wraps an io::Error with operation name + target path into a
recognizable message format:
"{operation} failed: {target} ({error})"
2. **Classifier branch added** (classify_error_kind at main.rs:~270)
Recognizes the new format and classifies as "filesystem_io_error":
else if message.contains("export failed:") ||
message.contains("diff failed:") ||
message.contains("config failed:") {
"filesystem_io_error"
}
3. **run_export() wired** (main.rs:~6915)
fs::write() call now uses .map_err() to enrich io::Error:
fs::write(path, &markdown).map_err(|e| -> Box<dyn std::error::Error> {
contextualize_io_error("export", &path.display().to_string(), e).into()
})?;
## Dogfood Verification
Before fix:
{"error":"No such file or directory (os error 2)","kind":"unknown","type":"error"}
After fix:
{"error":"export failed: /private/nonexistent/path/file.jsonl (No such file or directory (os error 2))","kind":"filesystem_io_error","type":"error"}
The envelope now tells downstream claws:
- WHAT operation failed (export)
- WHERE it failed (the path)
- WHAT KIND of failure (filesystem_io_error)
- The original errno detail preserved for diagnosis
## Non-Regression Verification
- Successful export still works (emits "kind": "export" envelope as before)
- Session not found error still emits "session_not_found" (not filesystem)
- missing_credentials still works correctly
- cli_parse still works correctly
- All 180 binary tests pass
- All 466 library tests pass
- All 95 compat-harness tests pass
## Regression Tests Added
Inside the main CliAction test function:
- "export failed:" pattern classifies as "filesystem_io_error" (not "unknown")
- "diff failed:" pattern classifies as "filesystem_io_error"
- "config failed:" pattern classifies as "filesystem_io_error"
- contextualize_io_error() produces a message containing operation name
- contextualize_io_error() produces a message containing target path
- Messages produced by contextualize_io_error() are classifier-recognizable
## Scope
This is the minimum viable fix: enrich export's fs::write with context.
Future work (filed as part of #130b scope): apply same pattern to
other filesystem operations (diff, plugins, config fs reads, session
store writes, etc.). Each application is a copy-paste of the same
helper pattern.
## Pattern
Follows #145 (plugins parser interception), #248-249 (arm-level leak
templates). Helper + classifier + call site wiring. Minimal diff,
maximum observability gain.
## Related
- Closes #130b (filesystem error context preservation)
- Stacks on top of #251 (dispatch-order fix) — same worktree branch
- Ground truth for future #130 broader sweep (other io::Error sites)
Cycle #46 follow-up to cycle #45's #251 implementation. Closes#250's
implementation urgency by aligning docs with reality.
SCHEMAS.md Updates:
For each of the 4 session-management verbs, added:
1. Status marker (Implemented or Stub only)
2. Actual binary envelope (shape produced by the #251-fixed binary)
3. Aspirational (future) shape (original SCHEMAS.md content, preserved as target)
4. Gap notes where the two diverge
Per-verb status:
- list-sessions: Implemented, nested field layout
- load-session: Implemented, nested session object with local session_not_found error
- delete-session: Stub, emits not_yet_implemented (local error, not auth)
- flush-transcript: Stub, emits not_yet_implemented (local error, not auth)
ROADMAP.md Updates:
- #251 marked CLOSED: Full status with commit ref, test counts.
- #250 marked SCOPE-REDUCED: Option A resolved by #251, Option C moot,
only Option B (doc alignment) remains as future cleanup.
Why this matters:
Every code change should close its documentation loop. #251 landed on
the branch, but SCHEMAS.md still described aspirational shapes without
marking which were implemented. Claws reading SCHEMAS.md would have
assumed full conformance and hit surprises. Now the document tells the
truth about which verbs work, which are stubs, and why.
Related:
- #251 implementation on feat/jobdori-251-session-dispatch branch
- #250 scope-reduced to Option B (field-name harmonization)
- #145/#146 parser fall-through fix precedent
## What Was Broken (ROADMAP #251)
Session-management verbs (list-sessions, load-session, delete-session,
flush-transcript) were falling through to the parser's `_other => Prompt`
catchall at main.rs:~1017. This construed them as `CliAction::Prompt {
prompt: "list-sessions", ... }` which then required credentials via the
Anthropic API path. The result: purely-local session operations emitted
`missing_credentials` errors instead of session-layer envelopes.
## Acceptance Criterion
The fix's essential requirement (stated by gaebal-gajae):
**"These 4 verbs stop falling through to Prompt and emitting `missing_credentials`."**
Not "all 4 are fully implemented to spec" — stubs are acceptable for
delete-session and flush-transcript as long as they route LOCALLY.
## What This Fix Does
Follows the exact pattern from #145 (plugins) and #146 (config/diff):
1. **CliAction enum** (main.rs:~700): Added 4 new variants.
2. **Parser** (main.rs:~945): Added 4 match arms before the `_other => Prompt`
catchall. Each arm validates the verb's positional args (e.g., load-session
requires a session-id) and rejects extra arguments.
3. **Dispatcher** (main.rs:~455):
- list-sessions → dispatches to `runtime::session_control::list_managed_sessions_for()`
- load-session → dispatches to `runtime::session_control::load_managed_session_for()`
- delete-session → emits `not_yet_implemented` error (local, not auth)
- flush-transcript → emits `not_yet_implemented` error (local, not auth)
## Dogfood Verification
Run on clean environment (no credentials):
```bash
$ env -i PATH=$PATH HOME=$HOME claw list-sessions --output-format json
{
"command": "list-sessions",
"sessions": [
{"id": "session-1775777421902-1", ...},
...
]
}
# ✓ Session-layer envelope, not auth error
$ env -i PATH=$PATH HOME=$HOME claw load-session nonexistent --output-format json
{"error":"session not found: nonexistent", "kind":"session_not_found", ...}
# ✓ Local session_not_found error, not missing_credentials
$ env -i PATH=$PATH HOME=$HOME claw delete-session test-id --output-format json
{"command":"delete-session","error":"not_yet_implemented","kind":"not_yet_implemented","type":"error"}
# ✓ Local not_yet_implemented, not auth error
$ env -i PATH=$PATH HOME=$HOME claw flush-transcript test-id --output-format json
{"command":"flush-transcript","error":"not_yet_implemented","kind":"not_yet_implemented","type":"error"}
# ✓ Local not_yet_implemented, not auth error
```
Regression sanity:
```bash
$ claw plugins --output-format json # #145 still works
$ claw prompt "hello" --output-format json # still requires credentials correctly
$ claw list-sessions extra arg --output-format json # rejects extra args with cli_parse
```
## Regression Tests Added
Inside `removed_login_and_logout_subcommands_error_helpfully` test function:
- `list-sessions` → CliAction::ListSessions (both text and JSON output)
- `load-session <id>` → CliAction::LoadSession with session_reference
- `delete-session <id>` → CliAction::DeleteSession with session_id
- `flush-transcript <id>` → CliAction::FlushTranscript with session_id
- Missing required arg errors (load-session and delete-session without ID)
- Extra args rejection (list-sessions with extra positional args)
All 180 binary tests pass. 466 library tests pass.
## Fix Scope vs. Full Implementation
This fix addresses #251 (dispatch-order bug) and #250's Option A (implement
the surfaces). list-sessions and load-session are fully functional via
existing runtime::session_control helpers. delete-session and flush-transcript
are stubbed with local "not yet implemented" errors to satisfy #251's
acceptance criterion without requiring additional session-store mutations
that can ship independently in a follow-up.
## Template
Exact same pattern as #145 (plugins) and #146 (config/diff): top-level
verb interception → CliAction variant → dispatcher with local operation.
## Related
Closes#251. Addresses #250 Option A for 4 verbs. Does not block #250
Option B (documentation scope guards) which remains valuable.
Cycle #40: gaebal-gajae conceived #251 in their 00:00 Discord cycle
status but hadn't committed to ROADMAP yet. Jobdori verified their
diagnosis with code trace and formalized into ROADMAP with the proper
framing relationship to #250.
## What This Pinpoint Says
Same observable as #250 (session-management verbs emit missing_credentials
instead of SCHEMAS.md envelope) but reframed at the dispatch-order layer:
- #250 says: surface missing on canonical binary vs SCHEMAS.md promise
- #251 says: top-level parser fall-through happens BEFORE dispatcher
could intercept, so credential resolution runs before the verb is
classified as a purely-local operation
#251's framing is sharper because it identifies WHY the fall-through
produces auth errors, not just that it does.
## Verified Code Trace
- main.rs:1017-1027 is the _other => Prompt catchall
- joins all rest[] tokens into joined, constructs CliAction::Prompt
- downstream resolves credentials -> emits missing_credentials
- No credential call would be needed had the verb been intercepted
Same pattern has been fixed before for other purely-local verbs:
- #145: plugins (main.rs:888-906, explicit match arm)
- #146: config and diff (main.rs:911-935, same shape)
#251 extends this to the 4 session-management verbs.
## Recommended Sequence
1. #251 fix (4 match arms mirroring #145/#146) — principled solution
2. #250's Option B (docs scope note) — guard against future drift
3. #250's Option C (reject with redirect) — unnecessary if #251 lands
## Discipline
Per cycle #24 calibration:
- Red-state bug? Borderline (silent misroute to auth error class)
- Real friction? ✓ (4 documented surfaces emit wrong error class)
- Evidence-backed? ✓ (code trace + prior-fix precedent #145/#146)
- Same-cycle fix? ✗ (filed + document, boundary discipline #36)
- Implementation cost? ~40 lines Rust + tests, bounded
## Credit
Conception: gaebal-gajae (Discord msg 1496526112254328902, 00:00 KST)
Formalization: Jobdori cycle #40 (code trace + precedent linking)
This is the right kind of collaboration: gaebal-gajae saw the dispatch
pattern I had missed in #250 (I framed as surface parity; they framed
as dispatch order). I verified their diagnosis and committed the
ROADMAP entry. Two framings make the pinpoint sharper than either
alone.
Cycle #39 dogfood re-verification of #130 (filed 2026-04-20). All 5
filesystem failure modes reproduce identically on main HEAD 186d42f,
2 days after original filing. Gap is unchanged.
## What's Added
1. **[STILL OPEN — re-verified 2026-04-22 cycle #39]** marker on the
entry so readers can see immediately that the pinpoint hasn't been
accidentally closed.
2. Full 5-mode repro output preserved verbatim for the current HEAD,
so future re-verifications have a concrete baseline to diff against.
3. **New evidence not in original filing**: the classifier actively
chose `kind: "unknown"` rather than just omitting the field. This
means classify_error_kind() has NO substring match for "Is a
directory", "No such file", "Operation not permitted", or "File
exists". The typed-error contract is thus twice-broken on this path.
4. **Pairing with #247/#248/#249 classifier sweep**: the classifier-level
part of #130 could land in the same sweep (add substring branches
for io::ErrorKind strings). The context-preservation part (fix
run_export's bare `?`) is a separate, larger change.
## Why Re-Verification Not Re-Filing
Per cycle #24 discipline: speculative re-filings add noise, real
confirmations add truth. #130 was already filed with exact repros, code
trace, and fix shape. My dogfood hit the same gap on fresh HEAD — the
right output is confirming the gap is still there (not filing #251 for
the same bug).
This is the same pattern as cycle #32's "mark #127 CLOSED" reality-sync:
documentation-drift prevention through explicit status markers.
## New Pattern
"Reality-sync via re-verification" — re-running a filed pinpoint's
repro on fresh HEAD and adding the timestamp + output proves the gap
is still real without inventing new filings. Cycle #24 calibration
keeps ROADMAP entries honest.
Per cycle #24 calibration:
- Red-state bug? ⚠️ borderline (errors surfaced, but kind=unknown is
demonstrably wrong on a path where the system knows the errno)
- Real friction? ✓ (re-verified on fresh HEAD)
- Evidence-backed? ✓ (5-mode repro + classifier trace)
- Same-cycle fix? ✗ (classifier-level part could join #247/#248/#249
sweep; context-preservation part is larger refactor)
- Implementation cost? Classifier part ~10 lines; full context fix ~60 lines
Source: Jobdori cycle #39 proactive dogfood in response to Clawhip
pinpoint nudge. Probed export filesystem errors; discovered this was
#130 reconfirmation, not new bug. Applied reality-sync pattern from
cycle #32.
Cycle #38 dogfood finding. Probed session management via the top-level
subcommand path documented in SCHEMAS.md; discovered the Rust binary
doesn't implement these as top-level subcommands. The literal token
'list-sessions' falls through the _other => Prompt arm and returns
'missing Anthropic credentials' instead of the documented envelope.
## The Gap
SCHEMAS.md documents 14 CLAWABLE top-level subcommands. Python audit
harness (src/main.py) implements all 14. Rust binary implements ~8 of
them as top-level, routing session management through /session slash
commands via --resume instead.
Repro:
$ env -i PATH=$PATH HOME=$HOME claw list-sessions --output-format json
{"error":"missing Anthropic credentials; ...","kind":"missing_credentials"}
$ claw --resume latest /session list --output-format json
{"active":"...","kind":"session_list","sessions":[...]}
$ python3 -m src.main list-sessions --output-format json
{"command":"list-sessions","sessions":[...],"exit_code":0}
Same operation, three different CLI shapes across implementations.
## Classification
This is BOTH:
- a parser-level trust gap (6th in #108/#117/#119/#122/#127 family; same
_other => Prompt fall-through), AND
- a cross-implementation parity gap (SCHEMAS.md at repo root doesn't
match Rust binary's top-level surface)
Unlike prior fall-throughs where the input was malformed, the input
here IS a documented surface. The fall-through is wrong for a different
reason: the surface exists in the protocol but not in this implementation.
## Three Fix Options
Option A: Implement surfaces on Rust binary (highest cost, full parity)
Option B: Scope SCHEMAS.md to Python harness (docs-only)
Option C: Reject at parse time with redirect hint (cheapest, #127 pattern)
Recommended: C first (prevents cred misdirection), then B for docs
hygiene, then A if demand justifies.
## Discipline
Per cycle #24 calibration:
- Red-state bug? ⚠️ borderline — silent misroute to cred error on a
documented surface. Not a crash but a real wrong-contract response.
- Real friction? ✓ (claws reading SCHEMAS.md hit wrong error on canonical binary)
- Evidence-backed? ✓ (dogfood probe + SCHEMAS.md cross-reference + code trace)
- Implementation cost? Option C: ~30 lines (bounded). Option A: larger.
- Same-cycle fix? ✗ (file + document, defer implementation per #36 boundary discipline)
## Family Position
Natural bundle: **#127 + #250** — parser-level fall-through pair with
class distinction. #127 fixed suffix-arg-on-valid-verb case. #250 extends
to 'entire Python-harness verb treated as prompt.' Same fall-through arm,
different entry class.
Source: Jobdori cycle #38 proactive dogfood in response to Clawhip
pinpoint nudge at msg 1496518474019639408. Probed session management CLI
after gaebal-gajae's status sync confirmed no red-state regressions this
cycle; found this cross-implementation surface parity gap by comparing
SCHEMAS.md claims against actual Rust binary behavior.
Cycle #37 dogfood finding post-#247 merge. Two Err arms in the resumed-session
JSON path at main.rs:2747 and main.rs:2783 emit error envelopes WITHOUT the
`kind` field required by the §4.44 typed-envelope contract.
## The Pinpoint
Probed resumed-session slash command JSON path:
$ claw --output-format json --resume latest /session
{"command":"/session","error":"unsupported resumed slash command","type":"error"}
# no kind field
$ claw --output-format json --resume latest /xyz-unknown
{"command":"/xyz-unknown","error":"Unknown slash command: /xyz-unknown\n Help /help lists available slash commands","type":"error"}
# no kind field AND multi-line error without split hint
Compare to happy path which DOES include kind:
$ claw --output-format json --resume latest /session list
{"active":"...","kind":"session_list",...}
Contract awareness exists. It's just not applied in the Err arms.
## Scope
Two atomic fixes in main.rs:
- Line 2747: SlashCommand::parse() Err → add kind via classify_error_kind()
- Line 2783: run_resume_command() Err → add kind + call split_error_hint()
~15 lines Rust total. Bounded.
## Family Classification
§4.44 typed-envelope contract sweep:
- #179 (parse-error real message quality) — closed
- #181 (envelope exit_code matches process exit) — closed
- #247 (classify_error_kind misses prompt-patterns) — closed
- #248 (verb-qualified unknown option errors) — in-flight (another agent)
- **#249 (resumed-session slash error envelopes omit kind) — filed**
Natural bundle #247+#248+#249: classifier/envelope completeness across all
three CLI paths (top-level parse, subcommand options, resumed-session slash).
## Discipline
Per cycle #24 calibration:
- Red-state bug? ✗ (errors surfaced, exit codes correct)
- Real friction? ✓ (typed-error contract violation; claws dispatching on
error.kind get undefined for all resumed slash-command errors)
- Evidence-backed? ✓ (dogfood probe + code trace identified both Err arms)
- Implementation cost? ~15 lines (bounded)
- Same-cycle fix? ✗ (Rust change, deferred per file-not-fix discipline)
## Not Implementing This Cycle
Per the boundary discipline established in cycle #36: I don't touch another
agent's in-flight work, and I don't implement a Rust fix same-cycle when
the pattern is "file + document + let owner/maintainer decide."
Filing with concrete fix shape is the correct output. If demand or red-state
symptoms arrive, implementation can follow the same path as #247: file →
fix in branch → review → merge.
Source: Jobdori cycle #37 proactive dogfood in response to Clawhip pinpoint
nudge at msg 1496518474019639408.
Cycle #34 dogfood follow-through on Jobdori cycle #33 pinpoint (#247 filed
at fbcbe9d). Closes the two typed-error contract drifts surfaced in that
pinpoint against the Rust `claw` binary.
## What was wrong
1. `classify_error_kind()` (main.rs:~251) used substring matching but did
NOT match two common prompt-related parse errors:
- "prompt subcommand requires a prompt string"
- "empty prompt: provide a subcommand..."
Both fell through to `"unknown"`. §4.44 typed-error contract specifies
`parse | usage | unknown` as distinct classes, so claws dispatching on
`error.kind == "cli_parse"` missed those paths entirely.
2. JSON mode dropped the `Run `claw --help` for usage.` hint. Text mode
appends it at stderr-print time (main.rs:~234) AFTER split_error_hint()
has already serialized the envelope, so JSON consumers never saw it.
Text-mode humans got an actionable pointer; machine consumers did not.
## Fix
Two small, targeted edits:
1. `classify_error_kind()`: add explicit branches for "prompt subcommand
requires" and "empty prompt:" (the latter anchored with `starts_with`
so it never hijacks unrelated error messages containing the word).
Both route to `cli_parse`.
2. JSON error render path in `main()`: after calling split_error_hint(),
if the message carried no embedded hint AND kind is `cli_parse` AND
the short-reason does not already embed a `claw --help` pointer,
synthesize the same `Run `claw --help` for usage.` trailer that
text-mode stderr appends. The embedded-pointer check prevents
duplication on the `empty prompt: ... (run `claw --help`)` message
which already carries inline guidance.
## Verification
Direct repro on the compiled binary:
$ claw --output-format json prompt
{"error":"prompt subcommand requires a prompt string",
"hint":"Run `claw --help` for usage.",
"kind":"cli_parse","type":"error"}
$ claw --output-format json ""
{"error":"empty prompt: provide a subcommand (run `claw --help`) or a non-empty prompt string",
"hint":null,"kind":"cli_parse","type":"error"}
$ claw --output-format json doctor --foo # regression guard
{"error":"unrecognized argument `--foo` for subcommand `doctor`",
"hint":"Run `claw --help` for usage.",
"kind":"cli_parse","type":"error"}
Text mode unchanged in shape; `[error-kind: ...]` prefix now reads
`cli_parse` for the two previously-misclassified paths.
## Regression coverage
- Unit test `classify_error_kind_covers_prompt_parse_errors_247`: locks
both patterns route to `cli_parse` AND that generic "prompt"-containing
messages still fall through to `unknown`.
- Integration tests in `tests/output_format_contract.rs`:
* prompt_subcommand_without_arg_emits_cli_parse_envelope_with_hint_247
* empty_positional_arg_emits_cli_parse_envelope_247
* whitespace_only_positional_arg_emits_cli_parse_envelope_247
* unrecognized_argument_still_classifies_as_cli_parse_247_regression_guard
- Full rusty-claude-cli test suite: 218 tests pass (180 bin unit + 15
output_format_contract + 12 resume_slash + 7 compact + 3 mock + 1 cli).
## Family / related
Joins §4.44 typed-envelope contract gap family closure: #130, #179, #181,
and now **#247**. All four quartet items now have real fixes landed on
the canonical binary surface rather than only the Python harness.
ROADMAP.md: #247 marked CLOSED with before/after evidence preserved.
Cycle #33 dogfood finding from direct probe of Rust claw binary:
## The Pinpoint
Two related contract drifts in the typed-error envelope:
### 1. Error-kind misclassification
`classify_error_kind()` at main.rs:246-280 uses substring matching but
does NOT match two common parse error messages:
- "prompt subcommand requires a prompt string" → classified as 'unknown'
- "empty prompt: provide a subcommand..." → classified as 'unknown'
The §4.44 typed-error contract specifies 'parse | usage | unknown' as
DISTINCT classes. Known parse errors should be 'cli_parse', not 'unknown'.
### 2. Hint lost in JSON mode
Text mode appends 'Run `claw --help` for usage.' to parse errors.
JSON mode emits 'hint: null'. The trailer is added at the stderr-print
stage AFTER split_error_hint() has already serialized the envelope, so
JSON consumers never see it.
## Repro
Dogfooded on main HEAD dd0993c (cycle #33):
$ claw --output-format json prompt
{"error":"prompt subcommand requires a prompt string","hint":null,"kind":"unknown","type":"error"}
Expected: kind="cli_parse" + hint="Run \\`claw --help\\` for usage."
## Impact
- Claws dispatching on typed error.kind fall back to substring matching
- JSON consumers lose actionable hint that text-mode users see
- Joins JSON envelope field-quality family (#90, #91, #92, #110, #115,
#116, #130, #179, #181, #247)
## Fix Shape
1. Add prompt-pattern clauses to classify_error_kind() (~4 lines)
2. Move hint plumbing to BEFORE JSON envelope serialization (~15 lines)
3. Add golden-fixture regression tests per cycle #30 pattern
Not a red-state bug (error IS surfaced, exit code IS correct), but real
contract drift. Deferred for implementation; filed per Clawhip nudge
to 'add one concrete follow-up to ROADMAP.md'.
Per cycle #24 calibration:
- Red-state bug? ✗ (errors exit 1 correctly)
- Real friction? ✓ (typed-error contract drift)
- Evidence-backed? ✓ (dogfood probe + code trace identified both leaks)
- Implementation cost? ~20 lines Rust (bounded)
- Demand signal needed? Medium — any claw doing error.kind dispatch on
prompt-path errors is affected
Source: Jobdori cycle #33 direct dogfood 2026-04-22 22:30 KST in response
to Clawhip pinpoint nudge at msg 1496503374621970583.
Cycle #32 dogfood finding: #127 was fixed on main via `a3270db` + `79352a2`
(2026-04-20), but the ROADMAP.md entry still lacked a [CLOSED] marker.
The in-flight branches `feat/jobdori-127-clean` and
`feat/jobdori-127-verb-suffix-flags` were superseded and are now obsolete.
## What This Fixes
**Documentation drift:** Pinpoint #127 was complete in code but unmarked
in ROADMAP. New contributors checking the roadmap would see it as open
work, potentially duplicating effort.
**Stale branches:** Two branches (`feat/jobdori-127-clean`,
`feat/jobdori-127-verb-suffix-flags`) contain the fix attempt bundled
with an unrelated large-scope refactor (5365 lines removed from
ROADMAP.md, root-level governance docs deleted, command infra refactored).
Their fix was superseded; branches are functionally obsolete.
## Verification
Re-verified all 4 #127 scenarios pass on main HEAD `b903e16`:
$ claw doctor --json → rejected with "did you mean" hint
$ claw doctor garbage → rejected
$ claw doctor --unknown-flag → rejected
$ claw doctor --output-format json → works (canonical form)
All behavior matches #127 acceptance criteria.
## Cluster Impact
Post-closure: **parser-level trust gap quintet (#108 + #117 + #119 + #122
+ #127) is 5/5 closed**. The `_other => Prompt` fall-through audit is
complete.
## Discipline Check
Per cycle #24 calibration:
- Red-state bug? ✗ (behavior is correct on main)
- Real friction? ✓ (ROADMAP drift; obsolete branches adrift)
- Evidence-backed? ✓ (dogfood probe confirmed closure; git log confirmed
supersession; branch diff confirmed scope contamination)
## Relationship to Gaebal-gajae's Option A Guidance
Cycle #32 started by proposing separating the #127 fix from the attached
refactor. On deeper probe, discovered the fix was already superseded on
main via different commits. Option A (separate the fix) is retroactively
satisfied: the fix landed cleanly, the refactor never did.
The remaining action is governance hygiene: mark closure, document
supersession, flag obsolete branches for deletion.
## Next Actions (not in this commit)
- Delete `feat/jobdori-127-clean` locally and on fork (after confirmation)
- Delete `feat/jobdori-127-verb-suffix-flags` locally and on fork
- Monitor whether any attached refactor content should be re-proposed in
its own scoped PR
Source: Jobdori cycle #32 dogfood in response to Clawhip 10-min nudge.
Proposed Option A (separate fix from refactor); probe revealed the fix
already landed via a different commit path, rendering the refactor-only
branch obsolete.
Cycle #30 dogfood found a testing gap: OPT_OUT surfaces were classified
in code but their REJECTION behavior was never regression-tested.
## The Gap
OPT_OUT_AUDIT.md declares 12 surfaces as intentionally exempt from
--output-format. The test suite had:
- ✅ test_clawable_surface_has_output_format (CLAWABLE must accept)
- ✅ test_every_registered_command_is_classified (no orphans)
- ❌ Nothing verifying OPT_OUT surfaces REJECT --output-format
If a developer accidentally added --output-format to 'summary' (one of
the 12 OPT_OUT surfaces), no test would catch the silent promotion.
The classification was governed, but the rejection behavior was NOT.
## What Changed
Added TestOptOutSurfaceRejection to test_cli_parity_audit.py with 14 tests:
1. **12 parametrized tests** — one per OPT_OUT surface, verifying each
rejects --output-format with an argparse error.
2. **test_opt_out_set_matches_audit_document** — verifies OPT_OUT_SURFACES
constant matches the declared 12 surfaces in OPT_OUT_AUDIT.md.
3. **test_opt_out_count_matches_declared** — sanity check that the count
stays at 12 as documented.
## Symmetry Achieved
Before: only CLAWABLE acceptance tested
CLAWABLE accepts --output-format ✅
OPT_OUT behavior: untested
After: full parity coverage
CLAWABLE accepts --output-format ✅
OPT_OUT rejects --output-format ✅
Audit doc ↔ constant kept in sync ✅
This completes the parity enforcement loop: every new surface is
explicitly IN or OUT, and BOTH directions are regression-locked.
## Promotion Path Preserved
When a real OPT_OUT surface gains genuine demand (per OPT_OUT_DEMAND_LOG.md):
1. Move from OPT_OUT_SURFACES to CLAWABLE_SURFACES
2. Update OPT_OUT_AUDIT.md with promotion rationale
3. Remove from this test's expected rejections
4. Tests pass (rejection test no longer runs; acceptance test now required)
Graceful promotion; no accidental drift.
## Test Count
- 222 → 236 passing (+14, zero regressions)
- 12 parametrized + 2 metadata = 14 new tests
## Discipline Check
Per cycle #24 calibration:
- Red-state bug? ✗ (no broken behavior)
- Real friction? ✓ (testing gap discovered by dogfood)
- Evidence-backed? ✓ (systematic probe revealed missing coverage)
This is the cycle #27 taxonomy (structural / quality / cross-channel /
text-vs-JSON divergence) extending into classification: not just 'is the
envelope right?' but 'is the OPPOSITE-OF-envelope right?'
Future cycles can apply the same principle to other classifications:
every governed non-goal deserves regression tests that lock its
non-goal-ness.
Classification:
- Real friction: ✓ (cycle #30 dogfood)
- Evidence-backed: ✓ (gap discovered by systematic surface audit)
- Same-cycle fix: ✓ (maintainership discipline)
Source: Jobdori cycle #30 proactive dogfood — probed all 26 subcommands
with --output-format json and noticed OPT_OUT rejection pattern was
unverified by any dedicated test.
Cycle #29 dogfood found a real pinpoint: cross-mode exit code divergence.
## The Pinpoint
Dogfooding the CLI revealed that unknown subcommand errors return different
exit codes depending on output mode:
$ python3 -m src.main nonexistent-cmd # exit 2
$ python3 -m src.main nonexistent-cmd --output-format json # exit 1
ERROR_HANDLING.md documented the exit-code contract (1=parse, 2=timeout)
but did NOT explicitly state the contract applies only to JSON mode. Text
mode follows argparse defaults (exit 2 for any parse error), which
violates the documented contract when interpreted generally.
A claw using text mode with 'claw nonexistent' would see exit 2 and
misclassify as timeout per the docs. Real protocol contract gap, not
implementation bug.
## Classification
This is a DOCUMENTATION gap, not a behavior bug:
- Text mode follows argparse convention (reasonable for humans)
- JSON mode normalizes to documented contract (reasonable for claws)
- The divergence is intentional; only the docs were silent about it
Fix = document the divergence explicitly + lock it with tests.
NOT fix = change text mode exit code to 1 (would break argparse
conventions and confuse human users).
## Documentation Changes
ERROR_HANDLING.md:
1. Added IMPORTANT callout in Quick Reference section:
'The exit code contract applies ONLY when --output-format json is
explicitly set. Text mode follows argparse conventions.'
2. New 'Text mode vs JSON mode exit codes' table showing exact divergence:
- Unknown subcommand: text=2, json=1
- Missing required arg: text=2, json=1
- Session not found: text=1, json=1 (app-level, identical)
- Success: text=0, json=0 (identical)
- Timeout: text=2, json=2 (identical, #161)
3. Practical rule: 'always pass --output-format json'
## Tests Added (5)
TestTextVsJsonModeDivergence in test_cross_channel_consistency.py:
1. test_unknown_command_text_mode_exits_2 — text mode argparse default
2. test_unknown_command_json_mode_exits_1 — JSON mode contract normalized
3. test_missing_required_arg_text_mode_exits_2 — same for missing args
4. test_missing_required_arg_json_mode_exits_1 — same normalization
5. test_success_path_identical_in_both_modes — success exit identical
These tests LOCK the expected divergence so:
- Documentation stays aligned with implementation
- Future changes (either direction) are caught as intentional
- Claws trust the docs
## Test Status
- 217 → 222 tests passing (+5)
- Zero regressions
## Discipline
This cycle follows the cycle #28 template exactly:
- Dogfood probe revealed real friction (test said exit=2, docs said exit=1)
- Minimal fix shape (documentation clarification, not code change)
- Regression guard via tests
- Evidence-backed, not speculative
Relationship to #181:
- #181 fixed env.exit_code != process exit (WITHIN JSON mode)
- #29 clarifies exit code contract scope (ONLY JSON mode)
- Both establish: exit codes are deterministic, but only when --output-format json
---
Classification (per cycle #24 calibration):
- Red-state bug? ✗ (behavior was reasonable, docs were incomplete)
- Real friction? ✓ (docs/code divergence revealed by dogfood)
- Evidence-backed? ✓ (test suite probed both modes, found the gap)
Source: Jobdori cycle #29 proactive dogfood — in response to Clawhip nudge
for pinpoint hunting. Found that text-mode errors return exit 2 but
ERROR_HANDLING.md implied exit 1 was the parse-error contract universally.
Cycle #28 closes the low-hanging metadata protocol gap identified in #180.
## The Gap
Pinpoint #180 (filed cycle #24) documented a metadata protocol gap:
- `--help` works (argparse default)
- `--version` does NOT exist
The ROADMAP entry deferred implementation pending demand. Cycle #28 dogfood
probe found this during routine invariant audit (attempt to call `--version`
as part of comprehensive CLI surface coverage). This is concrete evidence of
real friction, not speculative gap-filling.
## Implementation
Added `--version` flag to argparse in `build_parser()`:
```python
parser.add_argument('--version', action='version', version='claw-code 1.0.0 (Python harness)')
```
Simple one-liner. Follows Python argparse conventions (built-in action='version').
## Tests Added (3)
TestMetadataFlags in test_exec_route_bootstrap_output_format.py:
1. test_version_flag_returns_version_text — `claw --version` prints version
2. test_help_flag_returns_help_text — `claw --help` still works
3. test_help_still_works_after_version_added — Both -h and --help work
Regression guard on the original help surface.
## Test Status
- 214 → 217 tests passing (+3)
- Zero regressions
- Full suite green
## Discipline
This cycle exemplifies the cycle #24 calibration:
- #180 was filed as 'deferred pending demand'
- Cycle #28 dogfood found actual friction (proactive test coverage gap)
- Evidence = concrete ('--version not found during invariant audit')
- Action = minimal implementation + regression tests
- No speculation, no feature creep, no implementation before evidence
Not 'we imagined someone might want this.' Instead: 'we tried to call it
during routine maintenance, got ENOENT, fixed it.'
## Related
- #180 (cycle #24): Metadata protocol gap filed
- Cycle #27: Cross-channel consistency audit established framework
- Cycle #28 invariant audit: Discovered actual friction, triggered fix
---
Classification (per cycle #24 calibration):
- Red-state bug? ✗ (not a malfunction, just an absence)
- Real friction? ✓ (audit probe could not call the flag, had to special-case)
- Evidence-backed? ✓ (proactive test coverage revealed the gap)
Source: Jobdori cycle #28 dogfood — invariant audit attempting comprehensive
CLI surface coverage found that --version was unsupported.
Cycle #27 ships a new test class systematizing the three-layer protocol
invariant framework.
## Context
After cycles #20–#26, the protocol has three distinct invariant classes:
1. **Structural compliance** (#178): Does the envelope exist?
2. **Quality compliance** (#179): Is stderr silent + error message truthful?
3. **Cross-channel consistency** (#181 + NEW): Do multiple channels agree?
#181 revealed a critical gap: the second test class was incomplete.
Envelopes could be structurally valid, quality-compliant, but still
lie about their own state (envelope.exit_code != actual exit).
## New Test Class
TestCrossChannelConsistency in test_cross_channel_consistency.py captures
the third invariant layer with 5 dedicated tests:
1. envelope.command ↔ dispatched subcommand
2. envelope.output_format ↔ --output-format flag
3. envelope.timestamp ↔ actual wall clock (recent, <5s)
4. envelope.exit_code ↔ process exit code (cycle #26/#181 regression guard)
5. envelope boolean fields (found/handled/deleted) ↔ error block presence
Each test specifically targets cross-channel truth, not structure or quality.
## Why Separate Test Classes Matter
A command can fail all three ways independently:
| Failure mode | Exit/Crash | Test class | Example |
|---|---|---|---|
| Structural | stderr noise | TestParseErrorEnvelope | argparse leaks to stderr |
| Quality | correct shape, wrong message | TestParseErrorStderrHygiene | error instead of real message |
| Cross-channel | truthy field, lie about state | TestCrossChannelConsistency | exit_code: 0 but exit 1 |
#181 was invisible to the first two classes. A claw passing all structure/
quality tests could still be misled. The third class catches that.
## Audit Results (Cycle #27)
All 5 tests pass — no drift detected in any channel pair:
- ✅ Envelope command always matches dispatch
- ✅ Envelope output_format always matches flag
- ✅ Envelope timestamp always recent (<5s)
- ✅ Envelope exit_code always matches process exit (post-#181 guard)
- ✅ Boolean fields consistent with error block presence
The systematic audit proved the fix from #181 holds, and identified
no new cross-channel gaps.
## Test Impact
- 209 → 214 tests passing (+5)
- Zero regressions
- New invariant class now has dedicated test suite
- Future cross-channel bugs will be caught by this class
## Related
- #178 (#20): Parser-front-door structural contract
- #179 (#20): Stderr hygiene + real error message quality
- #181 (#26): Envelope exit_code must match process exit
- #182-N: Future cross-channel contract violations will be caught
by TestCrossChannelConsistency
This test class is evergreen — as new fields/channels are added to the
protocol, invariants for those channels should be added here, not mixed
with other test classes. Keeping invariant classes separate makes
regression attribution instant (e.g., 'TestCrossChannelConsistency failed'
= 'some truth channel disagreed').
Classification (per cycle #24 calibration):
- Red-state bug: ✗ (audit is green)
- Real friction: ✓ (structured audit of documented invariants)
- Proof of equilibrium: ✓ (systematic verification, no gaps found)
Source: Jobdori cycle #27 proactive invariant audit — following gaebal
guidance to probe documented invariants, not speculative gaps.
Cycle #26 dogfood found a real red-state bug in the JSON envelope contract.
## The Bug
exec-command and exec-tool not-found cases return exit code 1 from the
process, but the envelope reports exit_code: 0 (the default from
wrap_json_envelope). This is a protocol violation.
Repro (before fix):
$ claw exec-command unknown-cmd test --output-format json > out.json
$ echo $?
1
$ jq '.exit_code' out.json
0 # WRONG — envelope lies about exit code
Claws reading the envelope's exit_code field get misinformation. A claw
implementing the canonical ERROR_HANDLING.md pattern (check exit_code,
then classify by error.kind) would incorrectly treat failures as
successes when dispatching on the envelope alone.
## Root Cause
main.py lines 687–739 (exec-command + exec-tool handlers):
- Return statement: 'return 0 if result.handled else 1' (correct)
- Envelope wrap: 'wrap_json_envelope(envelope, args.command)'
(uses default exit_code=0, IGNORES the return value)
The envelope wrap was called BEFORE the return value was computed, so
the exit_code field was never synchronized with the actual exit code.
## The Fix
Compute exit_code ONCE at the top:
exit_code = 0 if result.handled else 1
Pass it explicitly to wrap_json_envelope:
wrap_json_envelope(envelope, args.command, exit_code=exit_code)
Return the same value:
return exit_code
This ensures the envelope's exit_code field is always truth — the SAME
value the process returns.
## Tests Added (3)
TestEnvelopeExitCodeMatchesProcessExit in test_exec_route_bootstrap_output_format.py:
1. test_exec_command_not_found_envelope_exit_matches:
Verifies exec-command unknown-cmd returns exit 1 in both envelope
and process.
2. test_exec_tool_not_found_envelope_exit_matches:
Same for exec-tool.
3. test_all_commands_exit_code_invariant:
Audit across 4 known non-zero cases (show-command, show-tool,
exec-command, exec-tool not-found). Guards against the same bug
in other surfaces.
## Impact
- 206 → 209 passing tests (+3)
- Zero regressions
- Protocol contract now truthful: envelope.exit_code == process exit
- Claws using the one-handler pattern from ERROR_HANDLING.md now get
correct information
## Related
- ERROR_HANDLING.md (cycle #22): Documented exit_code as machine-readable
contract field
- #178/#179 (cycles #19/#20): Closed parser-front-door contract
- This closes a gap in the WORK PROTOCOL contract — envelope values must
match reality, not just be structurally present.
Classification (per cycle #24 calibration):
- Red-state bug: ✓ (contract violation, claws get misinformation)
- Real friction: ✓ (discovered via dogfood, not speculative)
- Fix ships same-cycle: ✓ (discipline per maintainership mode)
Source: Jobdori cycle #26 dogfood — ran multiple edge-case probes, noticed
exec-command envelope showed exit_code: 0 while process exited 1.
Investigated wrap_json_envelope default behavior, confirmed bug, fixed
and tested in same cycle.
Cycle #25 ships navigation improvements connecting USAGE (setup/interactive)
to ERROR_HANDLING.md (subprocess/orchestration patterns).
Before: USAGE.md had JSON scripting mention but no link to error-handling guide.
New users reading USAGE would see JSON is available, but wouldn't discover
the error-handling pattern without accidentally finding ERROR_HANDLING.md.
After: Two strategic cross-links:
1. Top-level tip box: "Building orchestration code? See ERROR_HANDLING.md"
2. JSON scripting section expanded with examples + link to unified pattern
Changes to USAGE.md:
- Added TIP callout near top linking to ERROR_HANDLING.md
- Expanded "JSON output for scripting" section:
- Explains what the envelope contains (exit_code, command, timestamp, fields)
- Added 3 command examples (prompt, load-session, turn-loop)
- Added callout for dispatchers/orchestrators pointing to ERROR_HANDLING pattern
Impact: Operators reading USAGE for "how do I call claw from scripts?" now
immediately see the canonical answer (ERROR_HANDLING.md) instead of having
to reverse-engineer it from code examples.
No code changes. Pure navigation/documentation.
Continues the documentation-governance pattern: the work protocol (14 clawable
commands) has a consumption guide (ERROR_HANDLING.md), and that guide is now
reachable from the main entry point (USAGE.md + README.md top nav).
Cycle #24 dogfood discovery.
Running proactive edge-case dogfood on the JSON contract, hit a real pinpoint:
--help and --version are outside the parser-front-door contract.
The gap:
1. "claw --help --output-format json" returns text (not envelope)
2. "claw bootstrap --help --output-format json" returns text (not envelope)
3. "claw --version" doesn't exist at all
Why it matters:
- Claws can't programmatically discover the CLI surface
- Version checking requires side-effectful commands
- Natural follow-up gap to #178/#179 parser-front-door work
Discoverability scenarios:
- Orchestrator checking whether a new command (e.g., turn-loop) is available
- Version compat check before dispatching work
- Enumerating available commands for routing decisions
Filed as Pinpoint #180 in ROADMAP.md with:
- Gap description + 3-case repro
- Impact analysis (version compat, surface enumeration, governance)
- Root cause (argparse default HelpAction prints text + exits)
- Fix shape (3 stages, ~40 lines total)
- Stage A: --version + JSON envelope version metadata
- Stage B: --help JSON routing via custom HelpAction
- Stage C: optional 'schema-info' command for pre-dispatch discovery
- Acceptance criteria (4 cases, including backward compat)
- Priority: Medium (not red-state, but real discoverability gap)
Status: **Filed, implementation deferred.**
Following maintainership equilibrium: pinpoints stay documented but don't
force code changes. If external demand arrives (claw author building a
dispatcher, orchestrator doing version checks), the fix can ship in one
cycle using the shape already documented.
No code changes this cycle. Pure ROADMAP filing.
Continues the maintainership pattern: find friction, document it, defer
until evidence-backed demand arrives.
Source: Jobdori proactive dogfood at 2026-04-22 20:58 KST.
Cycle #23 ships a documentation discoverability fix.
After #22 shipping ERROR_HANDLING.md, the next natural step is making it
discoverable from the project's entry point (README.md).
Before: README top navigation linked to USAGE, PARITY, ROADMAP, Rust workspace.
ERROR_HANDLING.md was buried in CLAUDE.md references.
After: ERROR_HANDLING.md is now in the top navigation (right after USAGE,
before Rust workspace). Also added SCHEMAS.md mention in repository shape.
This signals that:
1. Error handling is a first-class concern (not an afterthought)
2. The Python harness documentation (SCHEMAS.md, ERROR_HANDLING.md, CLAUDE.md)
is part of the official docs, not just dogfood artifacts
3. New users/claws can discover the error-handling pattern at entry point
Impact: Operators building orchestration code will immediately see
'Error Handling' link in navigation, shortening the path to understanding
how to consume the protocol reliably.
No code changes. No test changes. Pure navigation/discoverability.
Cycle #22 ships documentation that operationalizes cycles #178–#179.
Problem context:
After #178 (parse-error envelope) and #179 (stderr hygiene + real error message),
claws can now build a unified error handler for all 14 clawable commands.
But there was no guide on how to actually do that. Operators had the pieces;
they didn't have the pattern.
This file changes that.
New file: ERROR_HANDLING.md
- Quick reference: exit codes + envelope shapes (0=success, 1=error, 2=timeout)
- One-handler pattern: ~80 lines of Python showing how to parse error.kind,
check retryable, and decide recovery strategy
- Four practical recovery patterns:
- Retry on transient errors (filesystem, timeout)
- Reuse session after timeout (if cancel_observed=true)
- Validate command syntax before dispatch (dry-run --help)
- Log errors for observability
- Error kinds enumeration (parse, session_not_found, filesystem, runtime, timeout)
- Common mistakes to avoid (6 patterns with BAD vs GOOD examples)
- Testing your error handler (unit test examples)
Operational impact:
Orchestration code now has a canonical pattern. Claws can:
- Copy-paste the run_claw_command() function (works for all commands)
- Classify errors uniformly (no special cases per command)
- Decide recovery deterministically (error.kind + retryable + cancel_observed)
- Log/monitor/escalate with confidence
Related cycles:
- #178: Parse-error envelope (commands now emit structured JSON on invalid argv)
- #179: Stderr hygiene + real message (JSON mode silences argparse, carries actual error)
- #164 Stage B: cancel_observed field (callers know if session is safe for reuse)
Updated CLAUDE.md:
- Added ERROR_HANDLING.md to 'Related docs' section
- Now documents the one-handler pattern as a guideline
No code changes. No test changes. Pure documentation.
This completes the documentation trail from protocol (SCHEMAS.md) →
governance (OPT_OUT_AUDIT.md, OPT_OUT_DEMAND_LOG.md) → practice (ERROR_HANDLING.md).
Cycle #21 ships governance infrastructure, not implementation. Maintainership
mode means sometimes the right deliverable is a decision framework, not code.
Problem context:
OPT_OUT_AUDIT.md (cycle #18 bonus) established 'demand-backed audit' as the
next step. But without a structured way to record demand signals, 'demand-backed'
was just a slogan — the next audit cycle would have no evidence to work from.
This commit creates the evidentiary base:
New file: OPT_OUT_DEMAND_LOG.md
- Per-surface entries for all 12 OPT_OUT commands (Groups A/B/C)
- Current state: 0 signals across all surfaces (consistent with audit prediction)
- Signal entry template with required fields:
- Source (who/what)
- Use case (concrete orchestration problem)
- Markdown-alternative-checked (why existing output insufficient)
- Date
- Promotion thresholds:
- 2+ independent signals for same surface → file promotion pinpoint
- 1 signal + existing stable schema → file pinpoint for discussion
- 0 signals → stays OPT_OUT (rationale preserved)
Decision framework for cycle #22 (audit close):
- If 0 signals total: move to PERMANENTLY_OPT_OUT, close audit
- If 1-2 signals: file individual promotion pinpoints with evidence
- If 3+ signals: reopen audit, question classification itself
Updated files:
- OPT_OUT_AUDIT.md: Added demand log reference in Related section
- CLAUDE.md: Added prerequisites for promotions (must have logged signals),
added 'File a demand signal' workflow section
Philosophy:
'Prevent speculative expansion' — schema bloat protection discipline.
Every new CLAWABLE surface is a maintenance tax. Evidence requirement keeps
the protocol lean. OPT_OUT surfaces are intentionally not-clawable until
proven otherwise by external demand.
Operational impact:
Next cycles can now:
1. Watch for real claws hitting OPT_OUT surface limits
2. Log signals in structured format (no ad-hoc filing)
3. Run audit at cycle #22 with actual data, not speculation
No code changes. No test changes. Pure governance infrastructure.
Related: #18 cycle (OPT_OUT_AUDIT.md), maintainership phase transition.
Dogfood discovered #178 had two residual gaps:
1. Stderr pollution: argparse usage + error text still leaked to stderr even in
JSON mode (envelope was correct on stdout, but stderr noise broke the
'machine-first protocol' contract — claws capturing both streams got dual output)
2. Generic error message: envelope carried 'invalid command or argument (argparse
rejection)' instead of argparse's actual text like 'the following arguments
are required: session_id' or 'invalid choice: typo (choose from ...)'
Before #179:
$ claw load-session --output-format json
[stdout] {"error": {"message": "invalid command or argument (argparse rejection)"}}
[stderr] usage: main.py load-session [-h] ...
main.py load-session: error: the following arguments are required: session_id
[exit 1]
After #179:
$ claw load-session --output-format json
[stdout] {"error": {"message": "the following arguments are required: session_id"}}
[stderr] (empty)
[exit 1]
Implementation:
- New _ArgparseError exception class captures argparse's real message
- main() monkey-patches parser.error (+ all subparser.error) in JSON mode to raise
_ArgparseError instead of print-to-stderr + sys.exit(2)
- _emit_parse_error_envelope() now receives the real message verbatim
- Text mode path unchanged: still uses original argparse print+exit behavior
Contract:
- JSON mode: stdout carries envelope with argparse's actual error; stderr silent
- Text mode: unchanged — argparse usage to stderr, exit 2
- Parse errors still error.kind='parse', retryable=false
Test additions (5 new, 14 total in test_parse_error_envelope.py):
- TestParseErrorStderrHygiene (5):
- test_json_mode_stderr_is_silent_on_unknown_command
- test_json_mode_stderr_is_silent_on_missing_arg
- test_json_mode_envelope_carries_real_argparse_message
- test_json_mode_envelope_carries_invalid_choice_details (verifies valid-choices list)
- test_text_mode_stderr_preserved_on_unknown_command (backward compat)
Operational impact:
Claws capturing both stdout and stderr no longer get garbled output. The envelope
message now carries discoverability info (valid command list, missing-arg name)
that claws can use for retry/recovery without probing the CLI a second time.
Test results: 201 → 206 passing, 3 skipped unchanged, zero regression.
Pinpoint discovered via dogfood at 2026-04-22 20:30 KST (cycle #20).
Filed explicit decision criteria for the 12 OPT_OUT surfaces (commands that do
not support --output-format json) documented in test_cli_parity_audit.py.
Categorized by rationale:
- Group A (4): Rich-Markdown reports (summary, manifest, parity-audit, setup-report)
Markdown-as-output is intentional; JSON would be information loss.
Unlikely promotions (remain OPT_OUT long-term).
- Group B (3): List filters with --query/--limit (subsystems, commands, tools)
Query layer already exists; users have escape hatch.
Remain OPT_OUT (promotion effort >> value).
- Group C (5): Simulation/debug surfaces (remote-mode, ssh-mode, teleport-mode,
direct-connect-mode, deep-link-mode)
Intentionally non-production; JSON output doesn't add value.
Remain OPT_OUT (simulation tools, not orchestration endpoints).
Audit workflow documented:
1. Survey: Check if external claws actually request JSON versions
2. Cost estimate: Schema + tests for each surface
3. Value estimate: Real demand vs hypothetical
4. Decision: CLAWABLE, remain OPT_OUT, or new pinpoint
Promotion criteria locked (only if clear use case + schema simple + demand exists).
Outcome prediction: All 12 likely remain OPT_OUT (documented rationale per group).
Timeline: Survey period (cycles #19–#21), final decision (cycle #22).
Related pinpoints: #175 (summary/manifest JSON parallel?), #176 (--query-json?),
#177 (mode simulators ever CLAWABLE?).
This closes the documentation loop from cycles #173–#174 (protocol closure →
field evolution → reframe). Now governance rules are explicit for future work.
#164 Stage B requires exposing whether cancellation was observed at the
turn-result level. This commit adds the infrastructure field:
Changes:
- TurnResult.cancel_observed: bool = False (query_engine.py)
- _build_timeout_result() accepts cancel_observed parameter (runtime.py)
- Two timeout paths now pass cancel_event.is_set() to signal observation (runtime.py)
- bootstrap command includes cancel_observed in turn JSON (main.py)
- SCHEMAS.md documents Turn Result Fields with cancel_observed contract
Usage:
When a turn timeout occurs, cancel_observed=true indicates that the
engine observed the cancellation event being set. This allows callers
to distinguish:
- timeout with no cancel → infrastructure/network stall
- timeout with cancel observed → cooperative cancellation was triggered
Backward compat:
- Existing TurnResult construction without cancel_observed defaults to False
- bootstrap JSON output still validates per SCHEMAS.md (new field is always present)
Test results: 182 passing, 3 skipped, zero regression.
Related: #161 (wall-clock timeout), #164 (cancellation observability protocol)
ROADMAP continues #164 with Stage C (test coverage for cancellation + turn envelope).
Adds parametrised test suite validating that clawable-surface commands'
JSON output matches their declared envelope contracts per SCHEMAS.md.
Two phases:
Phase 1 (this commit): Consistency baseline.
- Collect ENVELOPE_CONTRACTS registry mapping each command to its
required and optional fields
- TestJsonEnvelopeConsistency: parametrised test iterates over 13
commands, invokes with --output-format json, validates that
actual JSON envelope contains all required fields
- test_envelope_field_value_types: spot-check types (int, str, list)
for consistency
Phase 2 (future #173): Common field wrapping.
- Once wrap_json_envelope() is applied, all commands will emit
timestamp, command, exit_code, output_format, schema_version
- Currently skipped via @pytest.mark.skip, these tests will activate
automatically when wrapping is implemented:
TestJsonEnvelopeCommonFieldPrep::test_all_envelopes_include_timestamp
TestJsonEnvelopeCommonFieldPrep::test_all_envelopes_include_command
TestJsonEnvelopeCommonFieldPrep::test_all_envelopes_include_exit_code_and_schema_version
Why this matters:
- #172 documented the JSON contract; this test validates it
- Currently detects when actual output diverges from SCHEMAS.md
(e.g. list-sessions emits 'count', not 'sessions_count')
- As #173 wraps commands, test suite auto-validates new common fields
- Prevents regression: accidental field removal breaks the test suite
Current status: 11 passed (consistency), 6 skipped (awaiting #173)
Full suite: 168 → 179 passing, zero regression.
Closes ROADMAP #173 prep (framework for common field validation).
Actual field wrapping remains for next cycle.
Stops manual parity inspection from being a human-noticed concern. When
a developer adds a new subcommand to the claw-code CLI, this test suite
enforces explicit classification:
- CLAWABLE_SURFACES: MUST accept --output-format {text,json}
- OPT_OUT_SURFACES: explicitly exempt with documented rationale
A new command that forgets to opt into one of these two sets FAILS
loudly with TestCommandClassificationCoverage::test_every_registered_
command_is_classified. No silent drift possible.
Technique: argparse introspection at test time walks the _actions tree,
discovers every registered subcommand, and compares against the declared
classification sets. Contract is enforced machine-first instead of
depending on human review.
Three test classes covering three invariants:
TestClawableSurfaceParity (14 tests):
- test_all_clawable_surfaces_accept_output_format: every member of
CLAWABLE_SURFACES has --output-format flag registered
- test_clawable_surface_output_format_choices (parametrised over 13
commands): each must accept exactly {text, json} and default to 'text'
for backward compat
TestCommandClassificationCoverage (3 tests):
- test_every_registered_command_is_classified: any new subcommand
must be explicitly added to CLAWABLE_SURFACES or OPT_OUT_SURFACES
- test_no_command_in_both_sets: sanity check for classification conflicts
- test_all_classified_commands_actually_exist: no phantom commands
(catches stale entries after a command is removed)
TestJsonOutputContractEndToEnd (10 tests):
- test_command_emits_parseable_json (parametrised over 10 clawable
commands): actual subprocess invocation with --output-format json
produces valid parseable JSON on stdout
Classification:
CLAWABLE_SURFACES (13):
Session lifecycle: list-sessions, delete-session, load-session,
flush-transcript
Inspect: show-command, show-tool
Execution: exec-command, exec-tool, route, bootstrap
Diagnostic inventory: command-graph, tool-pool, bootstrap-graph
OPT_OUT_SURFACES (12):
Rich-Markdown reports (future JSON schema): summary, manifest,
parity-audit, setup-report
List filter commands: subsystems, commands, tools
Turn-loop: structured_output is future work
Simulation/debug: remote-mode, ssh-mode, teleport-mode,
direct-connect-mode, deep-link-mode
Full suite: 141 → 168 passing (+27), zero regression.
Closes ROADMAP #171.
Why this matters:
Before: parity was human-monitored; every new command was a drift
risk. The CLUSTER 3 sweep required manually auditing every
subcommand and landing fixes as separate pinpoints.
After: parity is machine-enforced. If a future developer adds a new
command without --output-format, the test suite blocks it
immediately with a concrete error message pointing at the
missing flag.
This is the first step in Gaebal-gajae's identified upper-level work:
operationalised parity instead of aspirational parity.
Related clusters:
- Clawability principle: machine-first protocol enforcement
- Test-first regression guard: extends TestTripletParityConsistency
(#160/#165) and TestFullFamilyParity (#166) from per-cluster
parity to cross-surface parity
Final diagnostic surface in the JSON parity sweep: bootstrap-graph
(the runtime bootstrap/prefetch visualization) now supports --output-format.
Concrete addition:
- bootstrap-graph: --output-format {text,json}
JSON envelope:
{stages: [str], note: 'bootstrap-graph is markdown-only in this version'}
Envelope explanation: bootstrap-graph's Markdown output is rich and
textual; raw JSON embedding maintains the markdown format (split into
lines array) rather than attempting lossy structural extraction that
would lose information. This is an honest limitation in this cycle;
full JSON schema can be added in a future audit if claws require
structured bootstrap data (dependency graphs, prefetch timing, etc.).
Backward compatibility:
- Default is 'text' (Markdown unchanged)
Closes ROADMAP #170.
Related: #167, #168, #169. Diagnostic/inventory surface family is now
uniformly JSON-capable. Summary, manifest, parity-audit, setup-report,
command-graph, tool-pool, bootstrap-graph all accept --output-format.
Extends the diagnostic surface audit with the two inventory-structure
commands: command-graph (command family segmentation) and tool-pool
(assembled tool inventory). Both now expose their underlying rich
datastructures via JSON envelope.
Concrete additions:
- command-graph: --output-format {text,json}
- tool-pool: --output-format {text,json}
JSON envelope shapes:
command-graph:
{builtins_count, plugin_like_count, skill_like_count, total_count,
builtins: [{name, source_hint}],
plugin_like: [{name, source_hint}],
skill_like: [{name, source_hint}]}
tool-pool:
{simple_mode, include_mcp, tool_count,
tools: [{name, source_hint}]}
Backward compatibility:
- Default is 'text' (Markdown unchanged)
- Text output byte-identical to pre-#169
Tests (4 new, test_command_graph_tool_pool_output_format.py):
- TestCommandGraphOutputFormat (2): JSON structure + text compat
- TestToolPoolOutputFormat (2): JSON structure + text compat
Full suite: 137 → 141 passing, zero regression.
Closes ROADMAP #169.
Why this matters:
Claws auditing the codebase can now ask 'what commands exist' and
'what tools exist' and get structured, parseable answers instead of
regex-parsing Markdown headers and counting list items.
Related clusters:
- Diagnostic surfaces (#169 adds to #167/#168 work-verb parity)
- Inventory introspection (command-graph + tool-pool are the two
foundational 'what do we have?' queries)
Closes the inspect-capability parity gap: show-command and show-tool were
the only discovery/inspection CLI commands lacking --output-format support,
making them outliers in the ecosystem that already had unified JSON
contracts across list-sessions, load-session, delete-session, and
flush-transcript (#160/#165/#166).
Concrete additions:
- show-command: --output-format {text,json}
- show-tool: --output-format {text,json}
JSON envelope shape (found case):
{name, found: true, source_hint, responsibility}
JSON envelope shape (not-found case):
{name, found: false, error: {kind:'command_not_found'|'tool_not_found',
message, retryable: false}}
Exit codes:
0 = success
1 = not found
Backward compatibility:
- Default (no --output-format) is 'text' (unchanged)
- Text output byte-identical to pre-#167 (three newline-separated lines)
Tests (10 new, test_show_command_tool_output_format.py):
- TestShowCommandOutputFormat (5): found + not-found in JSON; text mode
backward compat; text is default
- TestShowToolOutputFormat (3): found + not-found in JSON; text mode
backward compat
- TestShowCommandToolFormatParity (2): both accept same flag choices;
consistent JSON envelope shape
Full suite: 114 → 124 passing, zero regression.
Closes ROADMAP #167.
Why this matters:
Before: Claws calling show-command/show-tool had to parse human-readable
prose output via regex, with no structured error signal.
After: Same envelope contract as load-session and friends: JSON-first,
typed errors, machine-parseable.
Related clusters:
- Session-lifecycle CLI parity family (#160, #165, #166, #167)
- Machine-readable error contracts (same vein as #162 atomicity + #164
cancellation state-safety: structured boundaries for orchestration)
Closes the #161 follow-up gap identified in review: wall-clock timeout
bounded caller-facing wait but did not cancel the underlying provider
thread, which could silently mutate mutable_messages / transcript_store /
permission_denials / total_usage after the caller had already observed
stop_reason='timeout'. A ghost turn committed post-deadline would poison
any session that got persisted afterwards.
Stage A scope (this commit): runtime + engine layer cooperative cancel.
Engine layer (src/query_engine.py):
- submit_message now accepts cancel_event: threading.Event | None = None
- Two safe checkpoints:
1. Entry (before max_turns / budget projection) — earliest possible return
2. Post-budget (after output synthesis, before mutation) — catches cancel
that arrives while output was being computed
- Both checkpoints return stop_reason='cancelled' with state UNCHANGED
(mutable_messages, transcript_store, permission_denials, total_usage
all preserved exactly as on entry)
- cancel_event=None preserves legacy behaviour with zero overhead (no
checkpoint checks at all)
Runtime layer (src/runtime.py):
- run_turn_loop creates one cancel_event per invocation when a deadline
is in play (and None otherwise, preserving legacy fast path)
- Passes the same event to every submit_message call across turns, so a
late cancel on turn N-1 affects turn N
- On timeout (either pre-call or mid-call), runtime explicitly calls
cancel_event.set() before future.cancel() + synthesizing the timeout
TurnResult. This upgrades #161's best-effort future.cancel() (which
only cancels not-yet-started futures) to cooperative mid-flight cancel.
Stop reason taxonomy after Stage A:
'completed' — turn committed, state mutated exactly once
'max_budget_reached' — overflow, state unchanged (#162)
'max_turns_reached' — capacity exceeded, state unchanged
'cancelled' — cancel_event observed, state unchanged (#164 Stage A)
'timeout' — synthesised by runtime, not engine (#161)
The 'cancelled' vs 'timeout' split matters:
- 'timeout' is the runtime's best-effort signal to the caller: deadline hit
- 'cancelled' is the engine's confirmation: cancel was observed + honoured
If the provider call wedges entirely (never reaches a checkpoint), the
caller still sees 'timeout' and the thread is leaked — but any NEXT
submit_message call on the same engine observes the event at entry and
returns 'cancelled' immediately, preventing ghost-turn accumulation.
This is the honest cooperative limit in Python threading land; true
preemption requires async-native provider IO (future work, not Stage A).
Tests (29 new tests, tests/test_submit_message_cancellation.py + tests/
test_run_turn_loop_cancellation.py):
Engine-layer (12 tests):
- TestCancellationBeforeCall (5): pre-set event returns 'cancelled' immediately;
mutable_messages, transcript_store, usage, permission_denials all preserved
- TestCancellationAfterBudgetCheck (1): cancel set mid-call (after projection,
before commit) still honoured; output synthesised but state untouched
- TestCancellationAfterCommit (2): post-commit cancel not observable (honest
limit) BUT next call on same engine observes it + returns 'cancelled'
- TestLegacyCallersUnchanged (3): cancel_event=None preserves #162 atomicity
+ max_turns contract with zero behaviour change
- TestCancellationVsOtherStopReasons (2): cancel precedes max_turns check;
cancel does not retroactively override a completed turn
Runtime-layer (5 tests):
- TestTimeoutPropagatesCancelEvent (3): submit_message receives a real Event
object when deadline is set; None in legacy mode; timeout actually calls
event.set() so in-flight threads observe at their next checkpoint
- TestCancelEventSharedAcrossTurns (1): same event object passed to every
turn (object identity check) — late cancel on turn N-1 must affect turn N
Regression: 3 existing timeout test mocks updated to accept cancel_event
kwarg (mocks that previously had signature (prompt, commands, tools, denials)
now have (prompt, commands, tools, denials, cancel_event=None) since runtime
passes cancel_event positionally on the timeout path).
Full suite: 97 → 114 passing, zero regression.
Closes ROADMAP #164 Stage A.
What's explicitly NOT in Stage A:
- Preemptive cancellation of wedged provider IO (requires asyncio-native
provider path; larger refactor)
- Timeout on the legacy unbounded run_turn_loop path (by design: legacy
callers opt out of cancellation entirely)
- CLI exposure of 'cancelled' as a distinct exit code (currently 'cancelled'
maps to the same stop_reason != 'completed' break condition as others;
CLI surface for cancel is a separate pinpoint if warranted)
Every 'claw flush-transcript' call without --directory writes to
.port_sessions/<uuid>.json in CWD. Without a gitignore entry, every
dogfood run leaves dozens of untracked files in the repo, masking real
changes in 'git status' output.
Now that #160/#166 ship structured session lifecycle commands and
deterministic --session-id, this directory is purely transient by
default — belongs in .gitignore.
#159: multi-turn sessions had a silent security asymmetry: denied_tools
were always empty in run_turn_loop, even though bootstrap_session inferred
them from the routed matches. Result: any tool gated as 'destructive'
(bash-family commands, rm, etc) would silently appear unblocked across all
turns in multi-turn mode, giving a false 'clean' permission picture to any
claw consuming TurnResult.permission_denials.
Fix: compute denied_tools once at loop start via _infer_permission_denials,
then pass the same denials to every submit_message call (both timeout and
legacy unbounded paths). This mirrors the existing bootstrap_session pattern.
Acceptance: run_turn_loop('run bash ls').permission_denials now matches
what bootstrap_session returns — both infer the same denials from the
routed matches. Multi-turn security posture is symmetric.
Tests (tests/test_run_turn_loop_permissions.py, 2 tests):
- test_turn_loop_surfaces_permission_denials_like_bootstrap: Symmetry
check confirming both paths infer identical denials for destructive tools
- test_turn_loop_with_continuation_preserves_denials: Denials inferred at
loop start are passed consistently to all turns; captured via mock and
verified non-empty
Full suite: 82/82 passing, zero regression.
Closes ROADMAP #159.
The #160 session-lifecycle CLI triplet was asymmetric: list-sessions and
delete-session accepted --directory + --output-format and emitted typed
JSON error envelopes, but load-session had neither flag and dumped a raw
Python traceback (including the SessionNotFoundError class name) on a
missing session.
Three concrete impacts this fix closes:
1. Alternate session-store locations (e.g. /tmp/claw-run-XXX/.port_sessions)
were unreachable via load-session; claws had to chdir or monkeypatch
DEFAULT_SESSION_DIR to work around it.
2. Not-found emitted a multi-line Python stack, not a parseable envelope.
Claws deciding retry/escalate/give-up had only exit code 1 to work with.
3. The traceback leaked 'src.session_store.SessionNotFoundError' verbatim,
coupling version-pinned claws to our internal exception class name.
Now all three triplet commands accept the same flag pair and emit the
same JSON error shape:
Success (json mode):
{"session_id": "alpha", "loaded": true, "messages_count": 3,
"input_tokens": 42, "output_tokens": 99}
Not-found:
{"session_id": "missing", "loaded": false,
"error": {"kind": "session_not_found",
"message": "session 'missing' not found in /path",
"directory": "/path", "retryable": false}}
Corrupted file:
{"session_id": "broken", "loaded": false,
"error": {"kind": "session_load_failed",
"message": "...", "directory": "/path",
"retryable": true}}
Exit code contract:
- 0 on successful load
- 1 on not-found (preserves existing $?)
- 1 on OSError/JSONDecodeError (distinct 'kind' in JSON)
Backward compat: legacy 'claw load-session ID' text output unchanged
byte-for-byte. Only new behaviour is the flags and structured error path.
Tests (tests/test_load_session_cli.py, 13 tests):
- TestDirectoryFlagParity (2): --directory works + fallback to CWD/.port_sessions
- TestOutputFormatFlagParity (2): json schema + text-mode backward compat
- TestNotFoundTypedError (2): JSON envelope on not-found; no traceback in
either mode; no internal class name leak
- TestLoadFailedDistinctFromNotFound (1): corrupted file = session_load_failed
with retryable=true, distinct from session_not_found
- TestTripletParityConsistency (6): parametrised over [list, delete, load] *
[--directory, --output-format] — explicit parity guard for future regressions
Full suite: 80/80 passing, zero regression.
Discovered via Jobdori dogfood sweep 2026-04-22 17:44 KST — ran
'claw load-session nonexistent' expecting a clean error, got a Python
traceback. Filed #165 + fixed in same commit.
Closes ROADMAP #165.
#163: run_turn_loop no longer injects f'{prompt} [turn N]' into follow-up
prompts. The suffix was never defined or interpreted anywhere — not by the
engine, not by the system prompt, not by any LLM. It looked like a real
user-typed annotation in the transcript and made replay/analysis fragile.
New behaviour:
- turn 0 submits the original prompt (unchanged)
- turn > 0 submits caller-supplied continuation_prompt if provided, else
the loop stops cleanly — no fabricated user turn
- added continuation_prompt: str | None = None parameter to run_turn_loop
- added --continuation-prompt CLI flag for claws scripting multi-turn loops
- zero '[turn' strings ever appear in mutable_messages or stdout now
Behaviour change for existing callers:
- Before: run_turn_loop(prompt, max_turns=3) submitted 3 turns
('prompt', 'prompt [turn 2]', 'prompt [turn 3]')
- After: run_turn_loop(prompt, max_turns=3) submits 1 turn ('prompt')
- To preserve old multi-turn behaviour, pass continuation_prompt='Continue.'
or any structured follow-up text
One existing timeout test (test_budget_is_cumulative_across_turns) updated
to pass continuation_prompt so the cumulative-budget contract is actually
exercised across turns instead of trivially satisfied by a one-turn loop.
#164 filed: addresses reviewer feedback on #161. The wall-clock timeout
bounds the caller-facing wait, but the underlying submit_message worker
thread keeps running and can mutate engine state after the timeout
TurnResult is returned. A cooperative cancel_event pattern is sketched in
the pinpoint; real asyncio.Task.cancel() support will come once provider
IO is async-native (larger refactor).
Tests (tests/test_run_turn_loop_continuation.py, 8 tests):
- TestNoTurnSuffixInjection (2): zero '[turn' strings in any submitted
prompt, both default and explicit-continuation paths
- TestContinuationDefaultStopsAfterTurnZero (2): default loops run exactly
one turn; engine.submit_message called exactly once despite max_turns=10
- TestExplicitContinuationBehaviour (2): turn 0 = original, turn N = continuation
verbatim; max_turns still respected
- TestCLIContinuationFlag (2): CLI default emits only '## Turn 1';
--continuation-prompt wires through to multi-turn behaviour
Full suite: 67/67 passing.
Closes ROADMAP #163. Files #164.
Previously, QueryEnginePort.submit_message() checked the token budget AFTER
appending the prompt to mutable_messages, transcript_store, and permission_denials,
and AFTER calling compact_messages_if_needed(). On overflow it set
stop_reason='max_budget_reached' but the overflow turn was already committed.
Any caller that persisted the session afterwards wrote the rejected prompt to
disk — the session was silently poisoned even though the TurnResult said the
turn never completed.
Fix:
- Restructure submit_message so the budget check early-returns BEFORE any
mutation of mutable_messages, transcript_store, permission_denials, or
total_usage.
- The returned TurnResult.usage reflects pre-call state (overflow never
advanced the usage counter).
- Normal (in-budget) path unchanged: mutation happens exactly once, at the
end, only on 'completed' results.
This closes the atomicity gap: submit_message is now either 'turn committed'
(stop_reason='completed') or 'turn rejected, state untouched'
(stop_reason in {'max_budget_reached', 'max_turns_reached'}). Callers can
safely retry with a fresh budget or a smaller prompt without worrying about
phantom committed turns from prior rejections.
Tests (tests/test_submit_message_budget.py, 10 tests):
- TestBudgetOverflowDoesNotMutate (5): mutable_messages / transcript /
permission_denials / total_usage / TurnResult.usage all pre-mutation after overflow
- TestOverflowPersistence (2): first-turn overflow persists empty session;
successful-turn-then-overflow persists only the successful turn
- TestEngineUsableAfterOverflow (2): subsequent in-budget call still works
with no residue; repeated overflows don't accumulate hidden state
- TestNormalPathStillCommits (1): regression guard — non-overflow path still
commits mutable_messages/transcript/usage as expected
Full suite: 59/59 passing, zero regression.
Blocker: none. Closes ROADMAP #162.
Previously, run_turn_loop was bounded only by max_turns (turn count). If
engine.submit_message stalled — slow provider, hung network, infinite
stream — the loop blocked indefinitely with no cancellation path. Claws
calling run_turn_loop in CI or orchestration had no reliable way to
enforce a deadline; the loop would hang until OS kill or human intervention.
Fix:
- Add timeout_seconds parameter to run_turn_loop (default None = legacy unbounded).
- When set, each submit_message call runs inside a ThreadPoolExecutor and is
bounded by the remaining wall-clock budget (total across all turns, not per-turn).
- On timeout, synthesize a TurnResult with stop_reason='timeout' carrying the
turn's prompt and routed matches so transcripts preserve orchestration context.
- Exhausted/negative budget short-circuits before calling submit_message.
- Legacy path (timeout_seconds=None) bypasses the executor entirely — zero
overhead for callers that don't opt in.
CLI:
- Added --timeout-seconds flag to 'turn-loop' command.
- Exit code 2 when the loop terminated on timeout (vs 0 for completed),
so shell scripts can distinguish 'done' from 'budget exhausted'.
Tests (tests/test_run_turn_loop_timeout.py, 6 tests):
- Legacy unbounded path unchanged (timeout_seconds=None never emits 'timeout')
- Hung submit_message aborted within budget (0.3s budget, 5s mock hang → exit <1.5s)
- Budget is cumulative across turns (0.6s budget, 0.4s per turn, not per-turn)
- timeout_seconds=0 short-circuits first turn without calling submit_message
- Negative timeout treated as exhausted (guard against caller bugs)
- Timeout TurnResult carries correct prompt, matches, UsageSummary shape
Full suite: 49/49 passing, zero regression.
Blocker: none. Closes ROADMAP #161.
- list_sessions(directory=None) -> list[str]: enumerate stored session IDs
- session_exists(session_id, directory=None) -> bool: check existence without FileNotFoundError
- delete_session(session_id, directory=None) -> bool: unlink a session file
- load_session now raises typed SessionNotFoundError (subclass of KeyError) instead of FileNotFoundError
- Claws can now manage session lifecycle without reaching past the module to glob filesystem
Closes ROADMAP #160. Acceptance: claw can call list_sessions(), session_exists(id), delete_session(id) without importing Path or knowing .port_sessions/<id>.json layout.
## Gap
#77 Phase 1 added machine-readable error kind discriminants and #156 extended
them to text-mode output. However, the hint field is still prose derived from
splitting existing error text — not a stable registry-backed remediation
contract.
Downstream claws inspecting the hint field still need to parse human wording
to decide whether to retry, escalate, or terminate.
## Fix Shape
1. Remediation registry: remediation_for(kind, operation) -> Remediation struct
with action (retry/escalate/terminate/configure), target, and stable message
2. Stable hint outputs per error class (no more prose splitting)
3. Golden fixture tests replacing split_error_hint() string hacks
## Source
gaebal-gajae dogfood sweep 2026-04-22 05:30 KST
## Problem
#77 Phase 1 added machine-readable error `kind` discriminants to JSON error
payloads. Text-mode (stderr) errors still emit prose-only output with no
structured classification.
Observability tools (log aggregators, CI error parsers) parsing stderr can't
distinguish error classes without regex-scraping the prose.
## Fix
Added `[error-kind: <class>]` prefix line to all text-mode error output.
The prefix appears before the error prose, making it immediately parseable by
line-based log tools without any substring matching.
**Examples:**
## Impact
- Stderr observers (log aggregators, CI systems) can now parse error class
from the first line without regex or substring scraping
- Same classifier function used for JSON (#77 P1) and text modes
- Text-mode output remains human-readable (error prose unchanged)
- Prefix format follows syslog/structured-logging conventions
## Tests
All 179 rusty-claude-cli tests pass. Verified on 3 different error classes.
Closes ROADMAP #156.
## Problem
All JSON error payloads had the same three-field envelope:
```json
{"type": "error", "error": "<prose with hint baked in>"}
```
Five distinct error classes were indistinguishable at the schema level:
- missing_credentials (no API key)
- missing_worker_state (no state file)
- session_not_found / session_load_failed
- cli_parse (unrecognized args)
- invalid_model_syntax
Downstream claws had to regex-scrape the prose to route failures.
## Fix
1. **Added `classify_error_kind()`** — prefix/keyword classifier that returns a
snake_case discriminant token for 12 known error classes:
`missing_credentials`, `missing_manifests`, `missing_worker_state`,
`session_not_found`, `session_load_failed`, `no_managed_sessions`,
`cli_parse`, `invalid_model_syntax`, `unsupported_command`,
`unsupported_resumed_command`, `confirmation_required`, `api_http_error`,
plus `unknown` fallback.
2. **Added `split_error_hint()`** — splits multi-line error messages into
(short_reason, optional_hint) so the runbook prose stops being stuffed
into the `error` field.
3. **Extended JSON envelope** at 4 emit sites:
- Main error sink (line ~213)
- Session load failure in resume_session
- Stub command (unsupported_command)
- Unknown resumed command (unsupported_resumed_command)
## New JSON shape
```json
{
"type": "error",
"error": "short reason (first line)",
"kind": "missing_credentials",
"hint": "Hint: export ANTHROPIC_API_KEY..."
}
```
`kind` is always present. `hint` is null when no runbook follows.
`error` now carries only the short reason, not the full multi-line prose.
## Tests
Added 2 new regression tests:
- `classify_error_kind_returns_correct_discriminants` — all 9 known classes + fallback
- `split_error_hint_separates_reason_from_runbook` — with and without hints
All 179 rusty-claude-cli tests pass. Full workspace green.
Closes ROADMAP #77 Phase 1.
## Problem
Two session error messages advertised `.claw/sessions/` as the managed-session
location, but the actual on-disk layout is `.claw/sessions/<workspace_fingerprint>/`
where the fingerprint is a 16-char FNV-1a hash of the CWD path.
Users see error messages like:
```
no managed sessions found in .claw/sessions/
```
But the real directory is:
```
.claw/sessions/8497f4bcf995fc19/
```
The error copy was a direct lie — it made workspace-fingerprint partitioning
invisible and left users confused about whether sessions were lost or just in
a different partition.
## Fix
Updated two error formatters to accept the resolved `sessions_root` path
and extract the actual workspace-fingerprint directory:
1. **format_missing_session_reference**: now shows the actual fingerprint dir
and explains that it's a workspace-specific partition
2. **format_no_managed_sessions**: now shows the actual fingerprint dir and
includes a note that sessions from other CWDs are intentionally invisible
Updated all three call sites to pass `&self.sessions_root` to the formatters.
## Examples
**Before:**
```
no managed sessions found in .claw/sessions/
```
**After:**
```
no managed sessions found in .claw/sessions/8497f4bcf995fc19/
Start `claw` to create a session, then rerun with `--resume latest`.
Note: claw partitions sessions per workspace fingerprint; sessions from other CWDs are invisible.
```
```
session not found: nonexistent-id
Hint: managed sessions live in .claw/sessions/8497f4bcf995fc19/ (workspace-specific partition).
Try `latest` for the most recent session or `/session list` in the REPL.
```
## Impact
- Users can now tell from the error message that they're looking in the right
directory (the one their current CWD maps to)
- The workspace-fingerprint partitioning stops being invisible
- Operators understand why sessions from adjacent CWDs don't appear
- Error copy matches the actual on-disk structure
## Tests
All 466 runtime tests pass. Verified on two real workspaces with actual
workspace-fingerprint directories.
Closes ROADMAP #80.
## Problem
Three interactive slash commands are documented in `claw --help` but have no
corresponding section in USAGE.md:
- `/ultraplan [task]` — Run a deep planning prompt with multi-step reasoning
- `/teleport <symbol-or-path>` — Jump to a file or symbol by searching the workspace
- `/bughunter [scope]` — Inspect the codebase for likely bugs
New users see these commands in the help output but don't know:
- What each command does
- How to use it
- When to use it vs. other commands
- What kind of results to expect
## Fix
Added new section "Advanced slash commands (Interactive REPL only)" to USAGE.md
with documentation for all three commands:
1. **`/ultraplan`** — multi-step reasoning for complex tasks
- Example: `/ultraplan refactor the auth module to use async/await`
- Output: structured plan with numbered steps and reasoning
2. **`/teleport`** — navigate to a file or symbol
- Example: `/teleport UserService`, `/teleport src/auth.rs`
- Output: file content with the requested symbol highlighted
3. **`/bughunter`** — scan for likely bugs
- Example: `/bughunter src/handlers`, `/bughunter` (all)
- Output: list of suspicious patterns with explanations
## Impact
Users can now discover these commands and understand when to use them without
having to guess or search external sources. Bridges the gap between `--help`
output and full documentation.
Also filed ROADMAP #155 documenting the gap.
Closes ROADMAP #155.
## Problem
When a user types `claw --model gpt-4` or `--model qwen-plus`, they get:
```
error: invalid model syntax: 'gpt-4'. Expected provider/model (e.g., anthropic/claude-opus-4-6) or known alias
```
USAGE.md documents that "The error message now includes a hint that names the detected env var" — but this hint does not actually exist. The user has to re-read USAGE.md or guess the correct prefix.
## Fix
Enhance `validate_model_syntax` to detect when a model name looks like it belongs to a different provider:
1. **OpenAI models** (starts with `gpt-` or `gpt_`):
```
Did you mean `openai/gpt-4`? (Requires OPENAI_API_KEY env var)
```
2. **Qwen/DashScope models** (starts with `qwen`):
```
Did you mean `qwen/qwen-plus`? (Requires DASHSCOPE_API_KEY env var)
```
3. **Grok/xAI models** (starts with `grok`):
```
Did you mean `xai/grok-3`? (Requires XAI_API_KEY env var)
```
Unrelated invalid models (e.g., `asdfgh`) do not get a spurious hint.
## Verification
- `claw --model gpt-4` → hints `openai/gpt-4` + `OPENAI_API_KEY`
- `claw --model qwen-plus` → hints `qwen/qwen-plus` + `DASHSCOPE_API_KEY`
- `claw --model grok-3` → hints `xai/grok-3` + `XAI_API_KEY`
- `claw --model asdfgh` → generic error (no hint)
## Tests
Added 3 new assertions in `parses_multiple_diagnostic_subcommands`:
- GPT model error hints openai/ prefix and OPENAI_API_KEY
- Qwen model error hints qwen/ prefix and DASHSCOPE_API_KEY
- Unrelated models don't get a spurious hint
All 177 rusty-claude-cli tests pass.
Closes ROADMAP #154.
## Problem
Users frequently ask after building:
- "Where is the claw binary?"
- "Did the build actually work?"
- "Why can't I run \`claw\` from anywhere?"
This happens because \`cargo build\` puts the binary in \`rust/target/debug/claw\`
(or \`rust/target/release/claw\`), and new users don't know:
1. Where to find it
2. How to test it
3. How to add it to PATH (optional but common follow-up)
## Fix
Added new section "Post-build: locate the binary and verify" to README covering:
1. **Binary location table:** debug vs. release, macOS/Linux vs. Windows paths
2. **Verification commands:** Test the binary with \`--help\` and \`doctor\`
3. **Three ways to add to PATH:**
- Symlink (macOS/Linux): \`ln -s ... /usr/local/bin/claw\`
- cargo install: \`cargo install --path . --force\`
- Shell profile update: add rust/target/debug to \$PATH
4. **Troubleshooting:** Common errors ("command not found", "permission denied",
debug vs. release build speed)
## Impact
New users can now:
- Find the binary immediately after build
- Run it and verify with \`claw doctor\`
- Know their options for system-wide access
Also filed ROADMAP #153 documenting the gap.
Closes ROADMAP #153.
## Problem
Users commonly type `claw doctor --json`, `claw status --json`, or
`claw system-prompt --json` expecting JSON output. These fail with
`unrecognized argument \`--json\` for subcommand` with no hint that
`--output-format json` is the correct flag.
## Discovery
Filed as #152 during 21:17 dogfood nudge. The #127 worktree contained
a more comprehensive patch but conflicted with #141 (unified --help).
On re-investigation of main, Bugs 1 and 3 from #127 are already closed
(positional arg rejection works, no double "error:" prefix). Only
Bug 2 (the `--json` hint) remained.
## Fix
Two call sites add the hint:
1. `parse_single_word_command_alias`'s diagnostic-verb suffix path:
when rest[1] == "--json", append "Did you mean \`--output-format json\`?"
2. `parse_system_prompt_options` unknown-option path: same hint when
the option is exactly `--json`.
## Verification
Before:
$ claw doctor --json
error: unrecognized argument `--json` for subcommand `doctor`
Run `claw --help` for usage.
After:
$ claw doctor --json
error: unrecognized argument `--json` for subcommand `doctor`
Did you mean `--output-format json`?
Run `claw --help` for usage.
Covers: `doctor --json`, `status --json`, `sandbox --json`,
`system-prompt --json`, and any other diagnostic verb that routes
through `parse_single_word_command_alias`.
Other unrecognized args (`claw doctor garbage`) correctly don't
trigger the hint.
## Tests
- 2 new assertions in `parses_multiple_diagnostic_subcommands`:
- `claw doctor --json` produces hint
- `claw doctor garbage` does NOT produce hint
- 177 rusty-claude-cli tests pass
- Workspace tests green
Closes ROADMAP #152.
Filed from nudge directive at 21:17 KST. Implementation exists on worktree
`jobdori-127-verb-suffix` but needs rebase due to merge with #141.
Ready for Phase 1 implementation once conflicts resolved.
## Problem
`workspace_fingerprint(path)` hashes the raw path string without
canonicalization. Two equivalent paths (e.g. `/tmp/foo` vs
`/private/tmp/foo` on macOS) produce different fingerprints and
therefore different session stores. #150 fixed the test-side symptom;
this fixes the underlying product contract.
## Discovery path
#150 fix (canonicalize in test) was a workaround. Q's ack on #150
surfaced the deeper gap: the function itself is still fragile for
any caller passing a non-canonical path:
1. Embedded callers with a raw `--data-dir` path
2. Programmatic `SessionStore::from_cwd(user_path)` calls
3. NixOS store paths, Docker bind mounts, case-insensitive normalization
The REPL's default flow happens to work because `env::current_dir()`
returns canonical paths on macOS. But any caller passing a raw path
risks silent session-store divergence.
## Fix
Canonicalize inside `SessionStore::from_cwd()` and `from_data_dir()`
before computing the fingerprint. Kept `workspace_fingerprint()` itself
as a pure function for determinism — canonicalization is the entry
point's responsibility.
```rust
let canonical_cwd = fs::canonicalize(cwd).unwrap_or_else(|_| cwd.to_path_buf());
let sessions_root = canonical_cwd.join(".claw").join("sessions").join(workspace_fingerprint(&canonical_cwd));
```
Falls back to the raw path if canonicalize fails (directory doesn't
exist yet).
## Test-side updates
Three legacy-session tests expected the non-canonical base path to
match the store's workspace_root. Updated them to canonicalize
`base` after creation — same defensive pattern as #150, now
explicit across all three tests.
## Regression test
Added `session_store_from_cwd_canonicalizes_equivalent_paths` that
creates two stores from equivalent paths (raw vs canonical) and
asserts they resolve to the same sessions_dir.
## Verification
- `cargo test -p runtime session_store_` — 9/9 pass
- `cargo test --workspace` — all green, no FAILED markers
- No behavior change for existing users (REPL default flow already
used canonical paths)
## Backward compatibility
Users on macOS who always went through `env::current_dir()`:
no hash change, sessions resume identically.
Users who ever called with a non-canonical path: hash would change,
but those sessions were already broken (couldn't be resumed from a
canonical-path cwd). Net improvement.
Closes ROADMAP #151.
## #150 Fix: resume_latest test flake
**Problem:** `resume_latest_restores_the_most_recent_managed_session` intermittently
fails when run in the workspace suite or multiple times in sequence, but passes in
isolation.
**Root cause:** `workspace_fingerprint(path)` hashes the path string without
canonicalization. On macOS, `/tmp` is a symlink to `/private/tmp`. The test
creates a temp dir via `std::env::temp_dir().join(...)` which returns
`/var/folders/...` (non-canonical). When the subprocess spawns,
`env::current_dir()` returns the canonical path `/private/var/folders/...`.
The two fingerprints differ, so the subprocess looks in
`.claw/sessions/<hash1>` while files are in `.claw/sessions/<hash2>`.
Session discovery fails.
**Fix:** Call `fs::canonicalize(&project_dir)` after creating the directory
to ensure test and subprocess use identical path representations.
**Verification:** 5 consecutive runs of the full test suite — all pass.
Previously: 5/5 failed when run in sequence.
## #246 Filing: Reminder cron outcome ambiguity (control-loop blocker)
The `clawcode-dogfood-cycle-reminder` cron times out repeatedly with no
structured feedback on whether the nudge was delivered, skipped, or died in-flight.
**Phase 1 outcome schema** — add explicit field to cron result:
- `delivered` — nudge posted to Discord
- `timed_out_before_send` — died before posting
- `timed_out_after_send` — posted but cleanup timed out
- `skipped_due_to_active_cycle` — previous cycle active
- `aborted_gateway_draining` — daemon shutdown
Assigned to gaebal-gajae (cron/orchestration domain). Unblocks trustworthy
dogfood cycle observability.
Closes ROADMAP #150. Filed ROADMAP #246.
## Problem
`runtime::config::tests::validates_unknown_top_level_keys_with_line_and_field_name`
intermittently fails during `cargo test --workspace` (witnessed during
#147 and #148 workspace runs) but passes deterministically in isolation.
Example failure from workspace run:
test result: FAILED. 464 passed; 1 failed
## Root cause
`runtime/src/config.rs::tests::temp_dir()` used nanosecond timestamp
alone for namespace isolation:
std::env::temp_dir().join(format!("runtime-config-{nanos}"))
Under parallel test execution on fast machines with coarse clock
resolution, two tests start within the same nanosecond bucket and
collide on the same path. One test's `fs::remove_dir_all(root)` then
races another's in-flight `fs::create_dir_all()`.
Other crates already solved this pattern:
- plugins::tests::temp_dir(label) — label-parameterized
- runtime::git_context::tests::temp_dir(label) — label-parameterized
runtime/src/config.rs was missed.
## Fix
Added process id + monotonically-incrementing atomic counter to the
namespace, making every callsite provably unique regardless of clock
resolution or scheduling:
static COUNTER: AtomicU64 = AtomicU64::new(0);
let pid = std::process::id();
let seq = COUNTER.fetch_add(1, Ordering::Relaxed);
std::env::temp_dir().join(format!("runtime-config-{pid}-{nanos}-{seq}"))
Chose counter+pid over the label-parameterized pattern to avoid
touching all 20 callsites in the same commit (mechanical noise with
no added safety — counter alone is sufficient).
## Verification
Before: one failure per workspace run (config test flake).
After: 5 consecutive `cargo test --workspace` runs — zero config
test failures. Only pre-existing `resume_latest` flake remains
(orthogonal, unrelated to this change).
for i in 1 2 3 4 5; do cargo test --workspace; done
# All 5 runs: config tests green. Only resume_latest flake appears.
cargo test -p runtime
# 465 passed; 0 failed
## ROADMAP.md
Added Pinpoint #149 documenting the gap, root cause, and fix.
Closes ROADMAP #149.
## Scope
Two deltas in one commit:
### #128 closure (docs)
Re-verified on main HEAD `4cb8fa0`: malformed `--model` strings already
rejected at parse time (`validate_model_syntax` in parse_args). All
historical repro cases now produce specific errors:
claw --model '' → error: model string cannot be empty
claw --model 'bad model' → error: invalid model syntax: 'bad model' contains spaces
claw --model 'sonet' → error: invalid model syntax: 'sonet'. Expected provider/model or known alias
claw --model '@invalid' → error: invalid model syntax: '@invalid'. Expected provider/model ...
claw --model 'totally-not-real-xyz' → error: invalid model syntax: ...
claw --model sonnet → ok, resolves to claude-sonnet-4-6
claw --model anthropic/claude-opus-4-6 → ok, passes through
Marked #128 CLOSED in ROADMAP with repro block. Residual provenance gap
split off as #148.
### #148 implementation
**Problem.** After #128 closure, `claw status --output-format json`
still surfaces only the resolved model string. No way for a claw to
distinguish whether `claude-sonnet-4-6` came from `--model sonnet`
(alias resolution) vs `--model claude-sonnet-4-6` (pass-through) vs
`ANTHROPIC_MODEL` env vs `.claw.json` config vs compiled-in default.
Debug forensics had to re-read argv instead of reading a structured
field. Clawhip orchestrators sending `--model` couldn't confirm the
flag was honored vs falling back to default.
**Fix.** Added two fields to status JSON envelope:
- `model_source`: "flag" | "env" | "config" | "default"
- `model_raw`: user's input before alias resolution (null on default)
Text mode appends a `Model source` line under `Model`, showing the
source and raw input (e.g. `Model source flag (raw: sonnet)`).
**Resolution order** (mirrors resolve_repl_model but with source
attribution):
1. If `--model` / `--model=` flag supplied → source: flag, raw: flag value
2. Else if ANTHROPIC_MODEL set → source: env, raw: env value
3. Else if `.claw.json` model key set → source: config, raw: config value
4. Else → source: default, raw: null
## Changes
### rust/crates/rusty-claude-cli/src/main.rs
- Added `ModelSource` enum (Flag/Env/Config/Default) with `as_str()`.
- Added `ModelProvenance` struct (resolved, raw, source) with
three constructors: `default_fallback()`, `from_flag(raw)`, and
`from_env_or_config_or_default(cli_model)`.
- Added `model_flag_raw: Option<String>` field to `CliAction::Status`.
- Parse loop captures raw input in `--model` and `--model=` arms.
- Extended `parse_single_word_command_alias` to thread
`model_flag_raw: Option<&str>` through.
- Extended `print_status_snapshot` signature to accept
`model_flag_raw: Option<&str>`. Resolves provenance at dispatch time
(flag provenance from arg; else probe env/config/default).
- Extended `status_json_value` signature with
`provenance: Option<&ModelProvenance>`. On Some, adds `model_source`
and `model_raw` fields; on None (legacy resume paths), omits them
for backward compat.
- Extended `format_status_report` signature with optional provenance.
On Some, renders `Model source` line after `Model`.
- Updated all existing callers (REPL /status, resume /status, tests)
to pass None (legacy paths don't carry flag provenance).
- Added 2 regression assertions in parse_args test covering both
`--model sonnet` and `--model=...` forms.
### ROADMAP.md
- Marked #128 CLOSED with re-verification block.
- Filed #148 documenting the provenance gap split, fix shape, and
acceptance criteria.
## Live verification
$ claw --model sonnet --output-format json status | jq '{model,model_source,model_raw}'
{"model": "claude-sonnet-4-6", "model_source": "flag", "model_raw": "sonnet"}
$ claw --output-format json status | jq '{model,model_source,model_raw}'
{"model": "claude-opus-4-6", "model_source": "default", "model_raw": null}
$ ANTHROPIC_MODEL=haiku claw --output-format json status | jq '{model,model_source,model_raw}'
{"model": "claude-haiku-4-5-20251213", "model_source": "env", "model_raw": "haiku"}
$ echo '{"model":"claude-opus-4-7"}' > .claw.json && claw --output-format json status | jq '{model,model_source,model_raw}'
{"model": "claude-opus-4-7", "model_source": "config", "model_raw": "claude-opus-4-7"}
$ claw --model sonnet status
Status
Model claude-sonnet-4-6
Model source flag (raw: sonnet)
Permission mode danger-full-access
...
## Tests
- rusty-claude-cli bin: 177 tests pass (2 new assertions for #148)
- Full workspace green except pre-existing resume_latest flake (unrelated)
Closes ROADMAP #128, #148.
## Problem
The `"prompt"` subcommand arm enforced `if prompt.trim().is_empty()`
and returned a specific error. The fallthrough `other` arm in the same
match block — which routes any unrecognized first positional arg to
`CliAction::Prompt` — had no such guard. Result:
$ claw ""
error: missing Anthropic credentials; export ANTHROPIC_AUTH_TOKEN ...
$ claw " "
error: missing Anthropic credentials; ...
$ claw "" ""
error: missing Anthropic credentials; ...
$ claw --output-format json ""
{"error":"missing Anthropic credentials; ...","type":"error"}
An empty prompt should never reach the credentials check. Worse: with
valid credentials, the literal empty string gets sent to Claude as a
user prompt, either burning tokens for nothing or triggering a model-
side refusal. Same prompt-misdelivery family as #145.
## Root cause
In `parse_subcommand()`, the final `other =>` arm in the top-level
match only guards against typos (#108 guard via `looks_like_subcommand_typo`)
and then unconditionally builds `CliAction::Prompt { prompt: rest.join(" ") }`.
An empty/whitespace-only join passes through.
## Changes
### rust/crates/rusty-claude-cli/src/main.rs
Added the same `if joined.trim().is_empty()` guard already used in the
`"prompt"` arm to the fallthrough path. Error message distinguishes it
from the `prompt` subcommand path:
empty prompt: provide a subcommand (run `claw --help`) or a
non-empty prompt string
Runs AFTER the typo guard (so `claw sttaus` still suggests `status`)
and BEFORE CliAction::Prompt construction (so no network call ever
happens for empty inputs).
### Regression tests
Added 4 assertions in the existing parse_args test:
- parse_args([""]) → Err("empty prompt: ...")
- parse_args([" "]) → Err("empty prompt: ...")
- parse_args(["", ""]) → Err("empty prompt: ...")
- parse_args(["sttaus"]) → Err("unknown subcommand: ...") [verifies #108 typo guard still takes precedence]
### ROADMAP.md
Added Pinpoint #147 documenting the gap, verification, root cause,
fix shape, and acceptance. Joins the prompt-misdelivery cluster
alongside #145.
## Live verification
$ claw ""
error: empty prompt: provide a subcommand (run `claw --help`) or a non-empty prompt string
$ claw " "
error: empty prompt: provide a subcommand (run `claw --help`) or a non-empty prompt string
$ claw --output-format json ""
{"error":"empty prompt: provide a subcommand ...","type":"error"}
$ claw prompt "" # unchanged: subcommand-specific error preserved
error: prompt subcommand requires a prompt string
$ claw hello # unchanged: typo guard still fires
error: unknown subcommand: hello.
Did you mean help
$ claw "real prompt here" # unchanged: real prompts still reach API
error: api returned 401 Unauthorized (with dummy key, as expected)
All empty/whitespace-only paths exit 1. No network call. No misleading
credentials error.
## Tests
- rusty-claude-cli bin: 177 tests pass (4 new assertions)
- Full workspace green except pre-existing resume_latest flake (unrelated)
Closes ROADMAP #147.
## Problem
`claw config` and `claw diff` are pure-local read-only introspection
commands (config merges .claw.json + .claw/settings.json from disk; diff
shells out to `git diff --cached` + `git diff`). Neither needs a session
context, yet both rejected direct CLI invocation:
$ claw config
error: `claw config` is a slash command. Use `claw --resume SESSION.jsonl /config` ...
$ claw diff
error: `claw diff` is a slash command. ...
This forced clawing operators to spin up a full session just to inspect
static disk state, and broke natural pipelines like
`claw config --output-format json | jq`.
## Root cause
Sibling of #145: `SlashCommand::Config { section }` and
`SlashCommand::Diff` had working renderers (`render_config_report`,
`render_config_json`, `render_diff_report`, `render_diff_json_for`)
exposed for resume sessions, but the top-level CLI parser in
`parse_subcommand()` had no arms for them. Zero-arg `config`/`diff`
hit `parse_single_word_command_alias`'s fallback to
`bare_slash_command_guidance`, producing the misleading guidance.
## Changes
### rust/crates/rusty-claude-cli/src/main.rs
- Added `CliAction::Config { section, output_format }` and
`CliAction::Diff { output_format }` variants.
- Added `"config"` / `"diff"` arms to the top-level parser in
`parse_subcommand()`. `config` accepts an optional section name
(env|hooks|model|plugins) matching SlashCommand::Config semantics.
`diff` takes no positional args. Both reject extra trailing args
with a clear error.
- Added `"config" | "diff" => None` to
`parse_single_word_command_alias` so bare invocations fall through
to the new parser arms instead of the slash-guidance error.
- Added dispatch in run() that calls existing renderers: text mode uses
`render_config_report` / `render_diff_report`; JSON mode uses
`render_config_json` / `render_diff_json_for` with
`serde_json::to_string_pretty`.
- Added 5 regression assertions in parse_args test covering:
parse_args(["config"]), parse_args(["config", "env"]),
parse_args(["config", "--output-format", "json"]),
parse_args(["diff"]), parse_args(["diff", "--output-format", "json"]).
### ROADMAP.md
Added Pinpoint #146 documenting the gap, verification, root cause,
fix shape, and acceptance. Explicitly notes which other slash commands
(`hooks`, `usage`, `context`, etc.) are NOT candidates because they
are session-state-modifying.
## Live verification
$ claw config # no config files
Config
Working directory /private/tmp/cd-146-verify
Loaded files 0
Merged keys 0
Discovered files
user missing ...
project missing ...
local missing ...
Exit 0.
$ claw config --output-format json
{
"cwd": "...",
"files": [...],
...
}
$ claw diff # no git
Diff
Result no git repository
Detail ...
Exit 0.
$ claw diff --output-format json # inside claw-code
{
"kind": "diff",
"result": "changes",
"staged": "",
"unstaged": "diff --git ..."
}
Exit 0.
## Tests
- rusty-claude-cli bin: 177 tests pass (5 new assertions in parse_args)
- Full workspace green except pre-existing resume_latest flake (unrelated)
## Not changed
`hooks`, `usage`, `context`, `tasks`, `theme`, `voice`, `rename`,
`copy`, `color`, `effort`, `branch`, `rewind`, `ide`, `tag`,
`output-style`, `add-dir` — all session-mutating or interactive-only;
correctly remain slash-only.
Closes ROADMAP #146.
## Problem
`claw plugins` (and `claw plugins list`, `claw plugins --help`,
`claw plugins info <name>`, etc.) fell through the top-level subcommand
match and got routed into the prompt-execution path. Result: a purely
local introspection command triggered an Anthropic API call and surfaced
`missing Anthropic credentials` to the user. With valid credentials, it
would actually send the literal string "plugins" as a user prompt to
Claude, burning tokens for a local query.
$ claw plugins
error: missing Anthropic credentials; export ANTHROPIC_AUTH_TOKEN or ANTHROPIC_API_KEY before calling the Anthropic API
$ ANTHROPIC_API_KEY=dummy claw plugins
⠋ 🦀 Thinking...
✘ ❌ Request failed
error: api returned 401 Unauthorized
Meanwhile siblings (`agents`, `mcp`, `skills`) all worked correctly:
$ claw agents
No agents found.
$ claw mcp
MCP
Working directory ...
Configured servers 0
## Root cause
`CliAction::Plugins` exists, has a working dispatcher
(`LiveCli::print_plugins`), and is produced inside the REPL via
`SlashCommand::Plugins`. But the top-level CLI parser in
`parse_subcommand()` had arms for `agents`, `mcp`, `skills`, `status`,
`doctor`, `init`, `export`, `prompt`, etc., and **no arm for
`plugins`**. The dispatch never ran from the CLI entry point.
## Changes
### rust/crates/rusty-claude-cli/src/main.rs
Added a `"plugins"` arm to the top-level match in `parse_subcommand()`
that produces `CliAction::Plugins { action, target, output_format }`,
following the same positional convention as `mcp` (`action` = first
positional, `target` = second). Rejects >2 positional args with a clear
error.
Added four regression assertions in the existing `parse_args` test:
- `plugins` alone → `CliAction::Plugins { action: None, target: None }`
- `plugins list` → action: Some("list"), target: None
- `plugins enable <name>` → action: Some("enable"), target: Some(...)
- `plugins --output-format json` → action: None, output_format: Json
### ROADMAP.md
Added Pinpoint #145 documenting the gap, verification, root cause,
fix shape, and acceptance.
## Live verification
$ claw plugins # no credentials set
Plugins
example-bundled v0.1.0 disabled
sample-hooks v0.1.0 disabled
$ claw plugins --output-format json # no credentials set
{
"action": "list",
"kind": "plugin",
"message": "Plugins\n example-bundled ...\n sample-hooks ...",
"reload_runtime": false,
"target": null
}
Exit 0 in all modes. No network call. No "missing credentials" error.
## Tests
- rusty-claude-cli bin: 177 tests pass (new plugin assertions included)
- Full workspace green except pre-existing resume_latest flake (unrelated)
Closes ROADMAP #145.
Filing + Phase 1 fix in one commit (sibling of #143).
## Context
With #143 Phase 1 landed (`claw status` degrades), `claw mcp` was the
remaining diagnostic surface that hard-failed on a malformed `.claw.json`.
Same input, same parse error, same partial-success violation. Fresh
dogfood at 18:59 KST caught it on main HEAD `e2a43fc`.
## Changes
### ROADMAP.md
Added Pinpoint #144 documenting the gap and acceptance criteria. Joins
the partial-success / Principle #5 cluster with #143.
### rust/crates/commands/src/lib.rs
`render_mcp_report_for()` + `render_mcp_report_json_for()` now catch the
ConfigError at loader.load() instead of propagating:
- **Text mode** prepends a "Config load error" block (same shape as
#143's status output) before the MCP listing. The listing still renders
with empty servers so the output structure is preserved.
- **JSON mode** adds top-level `status: "ok" | "degraded"` +
`config_load_error: string | null` fields alongside existing fields
(`kind`, `action`, `working_directory`, `configured_servers`,
`servers[]`). On clean runs, `status: "ok"` and
`config_load_error: null`. On parse failure, `status: "degraded"`,
`config_load_error: "..."`, `servers: []`, exit 0.
- Both list and show actions get the same treatment.
### Regression test
`commands::tests::mcp_degrades_gracefully_on_malformed_mcp_config_144`:
- Injects the same malformed .claw.json as #143 (one valid + one broken
mcpServers entry).
- Asserts mcp list returns Ok (not Err).
- Asserts top-level status: "degraded" and config_load_error names the
malformed field path.
- Asserts show action also degrades.
- Asserts clean path returns status: "ok" with config_load_error null.
## Live verification
$ claw mcp --output-format json
{
"action": "list",
"kind": "mcp",
"status": "degraded",
"config_load_error": ".../.claw.json: mcpServers.missing-command: missing string field command",
"working_directory": "/Users/yeongyu/clawd",
"configured_servers": 0,
"servers": []
}
Exit 0.
## Contract alignment after this commit
All three diagnostic surfaces match now:
- `doctor` — degraded envelope with typed check entries ✅
- `status` — degraded envelope with config_load_error ✅ (#143)
- `mcp` — degraded envelope with config_load_error ✅ (this commit)
Phase 2 (typed-error object joining taxonomy §4.44) tracked separately
across all three surfaces.
Full workspace test green except pre-existing resume_latest flake (unrelated).
Closes ROADMAP #144 phase 1.
Previously `claw status` hard-failed on any config parse error, emitting
a bare error string and exiting 1. This took down the entire health
surface for a single malformed MCP entry, even though workspace, git,
model, permission, and sandbox state could all be reported independently.
`claw doctor` already degraded gracefully on the exact same input.
This commit matches `claw status` to that contract.
Changes:
- Add `StatusContext::config_load_error: Option<String>` to capture parse
errors without aborting.
- Rewrite `status_context()` to match on `ConfigLoader::load()`: on Err,
fall back to default `SandboxConfig` for sandbox resolution and record
the parse error, then continue populating workspace/git/memory fields.
- JSON output gains top-level `status: "ok" | "degraded"` marker and a
`config_load_error` string (null on clean runs). All other existing
fields preserved for backward compat.
- Text output prepends a "Config load error" block with Details + Hint
when config failed to parse, then a "Status (degraded)" header on the
main block. Clean runs show the usual "Status" header.
- Doctor path updated to pass the config load error through StatusContext.
Regression test `status_degrades_gracefully_on_malformed_mcp_config_143`:
- Injects a .claw.json with one valid + one malformed mcpServers entry
- Asserts status_context() returns Ok (not Err)
- Asserts config_load_error names the malformed field path
- Asserts workspace/sandbox fields still populated in JSON
- Asserts top-level status is 'degraded'
- Asserts clean config path still returns status: 'ok'
Verified live on /Users/yeongyu/clawd (contains deliberately broken MCP entries):
$ claw status --output-format json
{ "status": "degraded",
"config_load_error": ".../mcpServers.missing-command: missing string field command",
"model": "claude-opus-4-6",
"workspace": {...},
"sandbox": {...},
... }
Phase 2 (typed error object joining #4.44 taxonomy) tracked separately.
Full workspace test green except pre-existing resume_latest flake (unrelated).
Closes ROADMAP #143 phase 1.
Add two missing sections documenting the recently-fixed commands:
- **Initialize a repository**: Shows both text and JSON output modes for
`claw init`. Explains that structured JSON fields (created[], updated[],
skipped[], artifacts[]) allow claws to detect per-artifact state without
substring-matching prose. Documents idempotency.
- **Inspect worker state**: Documents `claw state` and the prerequisite
that a worker must have executed at least once. Includes the helpful error
message and remediation hints (claw or claw prompt <text>) so users
discovering the command for the first time see actionable guidance.
These sections complement the product fixes in #142 (init JSON structure)
and #139 (state error actionability) by documenting the contract from a
user perspective.
Related: ROADMAP #142 (structured init output), #139 (worker-state discoverability).
Previously `claw state` errored with "no worker state file found ... — run a
worker first" but there is no `claw worker` subcommand, so claws had no
discoverable path from the error to a fix.
Changes:
- Rewrite the missing-state error to name the two concrete commands that
produce .claw/worker-state.json:
* `claw` (interactive REPL, writes state on first turn)
* `claw prompt <text>` (one non-interactive turn)
Also tell the user what to rerun: `claw state [--output-format json]`.
- Expand the State --help topic with "Produces state", "Observes state",
and "Exit codes" lines so the worker-state contract is discoverable
before the user hits the error.
- Add regression test state_error_surfaces_actionable_worker_commands_139
asserting the error contains `claw prompt`, REPL mention, and the
rerun path, plus that the help topic documents the producer contract.
Verified live:
$ claw state
error: no worker state file found at .claw/worker-state.json
Hint: worker state is written by the interactive REPL or a non-interactive prompt.
Run: claw # start the REPL (writes state on first turn)
Or: claw prompt <text> # run one non-interactive turn
Then rerun: claw state [--output-format json]
JSON mode preserves the full hint inside the error envelope so CI/claws
can match on `claw prompt` without losing the canonical prefix.
Full workspace test green except pre-existing resume_latest flake (unrelated).
Closes ROADMAP #139.
Previously `claw init --output-format json` emitted a valid JSON envelope but
packed the entire human-formatted output into a single `message` string. Claw
scripts had to substring-match human language to tell `created` from `skipped`.
Changes:
- Add InitStatus::json_tag() returning machine-stable "created"|"updated"|"skipped"
(unlike label() which includes the human " (already exists)" suffix).
- Add InitReport::NEXT_STEP constant so claws can read the next-step hint
without grepping the message string.
- Add InitReport::artifacts_with_status() to partition artifacts by state.
- Add InitReport::artifact_json_entries() for the structured artifacts[] array.
- Rewrite run_init + init_json_value to emit first-class fields alongside the
legacy message string (kept for text consumers): project_path, created[],
updated[], skipped[], artifacts[], next_step, message.
- Update the slash-command Init dispatch to use the same structured JSON.
- Add regression test artifacts_with_status_partitions_fresh_and_idempotent_runs
asserting both fresh + idempotent runs produce the right partitioning and
that the machine-stable tag is bare 'skipped' not label()'s phrasing.
Verified output:
- Fresh dir: created[] has 4 entries, skipped[] empty
- Idempotent call: created[] empty, skipped[] has 4 entries
- project_path, next_step as first-class keys
- message preserved verbatim for backward compat
Full workspace test green except pre-existing resume_latest flake (unrelated).
Closes ROADMAP #142.
Previously, `claw <subcommand> --help` had 5 different behaviors:
- 7 subcommands returned subcommand-specific help (correct)
- init/export/state/version silently fell back to global `claw --help`
- system-prompt/dump-manifests errored with `unknown <cmd> option: --help`
- bootstrap-plan printed its phase list instead of help text
Changes:
- Extend LocalHelpTopic enum with Init, State, Export, Version, SystemPrompt,
DumpManifests, BootstrapPlan variants.
- Extend parse_local_help_action() to resolve those 7 subcommands to their
local help topic instead of falling through to the main dispatch.
- Remove init/state/export/version from the explicit wants_help=true matcher
so they reach parse_local_help_action() before being routed to global help.
- Add render_help_topic() entries for the 7 new topics with consistent
Usage/Purpose/Output/Formats/Related structure.
- Add regression test subcommand_help_flag_has_one_contract_across_all_subcommands_141
asserting every documented subcommand + both --help and -h variants resolve
to a HelpTopic with non-empty text that contains a Usage line.
Verification:
- All 14 subcommands now return subcommand-specific help (live dogfood).
- Full workspace test green except pre-existing resume_latest flake.
Closes ROADMAP #141.
Previously this test inherited the cargo test runner's CWD, which could contain
a stale .claw/settings.json with "permissionMode": "acceptEdits" written by
another test. The deprecated-field resolver then silently downgraded the
default permission mode to WorkspaceWrite, breaking the test's assertion.
Fix: wrap the assertion in with_current_dir() + env_lock() so the test runs in
an isolated temp directory with no stale config.
Full workspace test now passes except for pre-existing resume_latest flake
(unrelated to #140, environment-dependent, tracked separately).
Closes ROADMAP #140.
- Turn Result fields (including cancel_observed as of #164 Stage B)
> **Important:** SCHEMAS.md describes the **v2.0 target envelope**, not the current v1.0 binary behavior. The binary does NOT currently emit `timestamp`, `command`, `exit_code`, `output_format`, or `schema_version` fields. See [`FIX_LOCUS_164.md`](./FIX_LOCUS_164.md) for the migration plan (Phase 1: dual-mode flag; Phase 2: default bump; Phase 3: deprecation).
2. Return JSON envelopes (current v1.0: flat shape with top-level `kind`; target v2.0: nested with common fields per SCHEMAS.md)
3.**v1.0 (current):** Emit flat top-level fields: verb-specific data + `kind` (verb identity for success, error classification for errors)
4.**v2.0 (target, post-FIX_LOCUS_164):** Use common wrapper fields (timestamp, command, exit_code, output_format, schema_version) with nested `data` or `error` objects
5. Exit 0 on success, 1 on error/not-found, 2 on timeout
**Migration note:** The Python reference harness in `src/` was written against the v2.0 target schema (SCHEMAS.md). The Rust binary in `rust/` currently emits v1.0 (flat). See [`FIX_LOCUS_164.md`](./FIX_LOCUS_164.md) for the full migration plan and timeline.
python3 -m pip install coverage # if not already installed
python3 -m coverage run -m pytest tests/
python3 -m coverage report --skip-covered
```
Target: >90% line coverage for src/ (currently ~85%).
## Common workflows
### Add a new clawable command
1. Add parser in `main.py` (argparse)
2. Add `--output-format` flag
3. Emit JSON envelope using `wrap_json_envelope(data, command_name)`
4. Add command to CLAWABLE_SURFACES in test_cli_parity_audit.py
5. Document in SCHEMAS.md (schema + example)
6. Write test in tests/test_*_cli.py or tests/test_json_envelope_field_consistency.py
7. Run full suite to confirm parity
### Modify TurnResult or protocol fields
1. Update dataclass in `query_engine.py`
2. Update SCHEMAS.md with new field + rationale
3. Write test in `tests/test_json_envelope_field_consistency.py` that validates field presence
4. Update all places that construct TurnResult (grep for `TurnResult(`)
5. Update bootstrap/turn-loop JSON builders in main.py
6. Run `tests/` to ensure no regressions
### Promote an OPT_OUT surface to CLAWABLE
**Prerequisite:** Real demand signal logged in `OPT_OUT_DEMAND_LOG.md` (threshold: 2+ independent signals per surface). Speculative promotions are not allowed.
Once demand is evidenced:
1. Add --output-format flag to argparse
2. Emit wrap_json_envelope() output in JSON path
3. Move command from OPT_OUT_SURFACES to CLAWABLE_SURFACES
4. Document in SCHEMAS.md
5. Write test for JSON output
6. Run parity audit to confirm no regressions
7. Update `OPT_OUT_DEMAND_LOG.md` to mark signal as resolved
### File a demand signal (when a claw actually needs JSON from an OPT_OUT surface)
1. Open `OPT_OUT_DEMAND_LOG.md`
2. Find the surface's entry under Group A/B/C
3. Append a dated entry with Source, Use Case, and Markdown-alternative-checked explanation
4. If this is the 2nd signal for the same surface, file a promotion pinpoint in ROADMAP.md
## Dogfood principles
The Python harness is continuously dogfood-tested:
- Every cycle ships to `main` with detailed commit messages
- New tests are written before/alongside implementation
- Test suite must pass before pushing (zero-regression principle)
- Commits grouped by pinpoint (#159, #160, ..., #174)
- Failure modes classified per exit code: 0=success, 1=error, 2=timeout
## Protocol governance
- **SCHEMAS.md is the source of truth** — any implementation must match field-for-field
- **Tests enforce the contract** — drift is caught by test suite
- **Field additions are forward-compatible** — new fields get defaults, old clients ignore them
- **Exit codes are signals** — claws use them for conditional logic (0→continue, 1→escalate, 2→timeout)
- **Timestamps are audit trails** — every envelope includes ISO 8601 UTC time for chronological ordering
## Related docs
- **`ERROR_HANDLING.md`** — Unified error-handling pattern for claws (one handler for all 14 clawable commands)
- **`SCHEMAS.md`** — JSON protocol specification (read before implementing)
- **`OPT_OUT_AUDIT.md`** — Governance for the 12 non-clawable surfaces
- **`OPT_OUT_DEMAND_LOG.md`** — Active survey recording real demand signals (evidence base for decisions)
- **`ROADMAP.md`** — macro roadmap and macro pain points
- **`PHILOSOPHY.md`** — system design intent
- **`PARITY.md`** — status of Python ↔ Rust protocol equivalence
**Status:** Frozen (feature-complete), ready for review + merge
---
## One-Liner (reviewer-ready)
> **Phase 0 is now frozen, reviewer-mapped, and merge-ready; Phase 1 remains intentionally deferred behind the locked priority order.**
This is the single sentence that captures branch state. Use it in PR titles, review summaries, and Phase 1 handoff notes.
---
## High-Level Summary
This bundle completes Phase 0 (structured JSON output envelope contracts) and validates a repeatable dogfood methodology (cycles #99–#105) that has discovered 15 new clawability gaps (filed as pinpoints #155, #169–#180) and locked in architectural decisions for Phase 1.
**Key property:** The bundle is *dependency-clean*. Every commit can be reviewed independently. No commit depends on uncommitted follow-up. The freeze holds: no code changes will land on this branch after merge.
---
## Why Review This Now
### What lands when this merges:
1.**Phase 0 guarantees** (4 commits) — JSON output envelopes now follow `SCHEMAS.md` contracts. Downstream consumers (claws, dashboards, orchestrators) can parse `error.kind`, `error.operation`, `error.target`, `error.hint` as first-class fields instead of scraping prose.
2.**Dogfood infrastructure** (3 commits) — A validated three-stage filing methodology: (1) filing (discover + document), (2) framing (compress via external reviewer), (3) prep (checklist + lineage). Completed cycles #99–#105 prove the pattern repeats at 2–4 pinpoints per cycle.
3.**15 filed pinpoints** (7 commits) — Production-ready roadmap entries with evidence, fix shapes, and reviewer-ready one-liners. No implementation code, pure documentation. These unblock Phase 1 branch creation.
4.**Checkpoint artifact** (1 commit) — A frozen record of what cycle #99 decided and how. Audit trail for multi-cycle work.
### What does NOT land:
- No implementation of any filed pinpoint (#155–#186). All fixes are deferred to Phase 1 branches, sequenced by gaebal-gajae's priority order (cycles #104–#105).
- No schema changes. SCHEMAS.md is frozen at the contract that Phase 0 guarantees.
- No new dependencies. Cargo.toml is unchanged from the base branch.
---
## Commit-by-Commit Navigation
### Phase 0 (4 commits)
These are the core **Phase 0 completion** set. Each one is a self-contained capability unlock.
1.**`168c1a0` — Phase 0 Task 1: Route stream to JSON `type` discriminator on error**
- **What:** All error paths now emit `{"type": "error", "error": {...}}` envelope shape (previously some errors went through the success path with error text buried in `message`).
- **Why it matters:** Downstream claws can now reliably check `if response.type == "error"` instead of parsing prose.
- **Review focus:** Diff routing in `emit_error_response()` and friends. Verify every error exit path hits the JSON discriminator.
- **What:** When a text-mode user sees `{"error": ...}` escape into their terminal unexpectedly, they get a `SCHEMAS.md` violation warning + hint. Prevents silent envelope shape drift.
- **Why it matters:** Text-mode users are first-class. JSON contract violations are visible + auditable.
- **Review focus:** The `silent_emit_guard()` wrapper and its condition. Verify it gates all JSON output paths.
- **What:** Adds golden-fixture test `schemas_contract_holds_on_static_verbs` that asserts every verb's JSON shape matches SCHEMAS.md as of this commit. Future drifts are caught.
- **Why it matters:** Schema is now truth-testable, not aspirational.
- **Review focus:** The fixture names and which verbs are covered. Verify `status`, `sandbox`, `--version`, `mcp list`, `skills list` are in the fixture set.
- **What:** New test `error_kind_and_error_field_presence_are_gated_together` asserts that if `type: "error"` is present, both `error` field and `error.kind` are always populated (no partial shapes).
- **Why it matters:** Downstream consumers can rely on shape consistency. No more "sometimes error.kind is missing" surprises.
- **Review focus:** The parity assertion logic. Verify it covers all error-emission sites.
- **What:** Documents the three-stage filing discipline that cycles #99–#105 will use (filing → framing → prep). Locks the "5-axis density rule" (freeze when a branch spans 5+ axes).
- **Why it matters:** Audit trail. Future cycles know what #99 decided.
- **Review focus:** The decision rationale in ROADMAP.md. Is the freeze doctrine sound for your project?
- **What:** Gaebal-gajae provides surgical one-liners for #181–#183, plus insights (agents is the reference implementation for #183 canonical shape).
- **Why it matters:** Framings now survive reader compression. Reviewers can understand the issue in 1 sentence + 1 justification.
- **Review focus:** The rewritten framings. Do they improve on the original verbose descriptions?
8.**`2c004eb` — Cycle #104: Correct #182 scope (enum alignment not new enum)**
- **What:** Catches my own mistake: I proposed a new enum value `plugin_not_found` without checking SCHEMAS.md. Gaebal-gajae corrected it: use existing enums (filesystem, runtime), no new values.
- **Why it matters:** Demonstrates the doctrine correction loop. Catch regressions early.
- **Review focus:** The scope correction logic. Do you agree with "existing contract alignment > new enum"?
- **What:** More corrections from gaebal-gajae: #184/#185 belong to #171 lineage (not new family), #186 to #169/#170 lineage. Agents is the reference for #183 fix.
- **Why it matters:** Family tree hygiene. Each pinpoint sits in the right narrative arc.
- **Review focus:** The family tree reorganization. Is the new structure clearer?
- **CI status:** Ready (no CI jobs run until merge)
---
## Integration Notes
### What the main branch gains:
-`SCHEMAS.md` now has a regression lock. Future commits that drift the shape are caught.
- Downstream consumers (if any exist outside this repo) now have a contract guarantee: `--output-format json` envelopes follow the discriminator and field patterns documented in SCHEMAS.md.
- If someone lands a fix for #155, #169, #170, #171, etc. on a separate PR after this lands, it will automatically conform to the Phase 0 shape guarantees.
### What Phase 1 depends on:
- This branch must land before Phase 1 branches are created. Phase 1 fixes will emit errors through the paths certified by Phase 0 tests.
- Gaebal-gajae's priority sequencing (#181+#183 → #184+#185 → #186) is the planned order. Follow it when planning Phase 1 PRs.
- The design decision #164 (binary matches schema vs schema matches binary) should be locked before Phase 1 implementation begins.
### What is explicitly deferred:
- **Implementation of any pinpoint.** Only documentation and test coverage.
- **Schema additions.** All filed work uses existing enum values.
- **New dependencies.** Cargo.toml is unchanged.
- **Database/persistence.** Session/state handling is unchanged.
---
## Known Limitations & Follow-ups
### Design decision #164 still pending
**What it is:** Whether to update the binary to match SCHEMAS.md (Option A) or update SCHEMAS.md to match the binary (Option B).
**Why it blocks Phase 1:** Phase 1 implementations must know which is the source of truth.
**Action:** Land this merge, then resolve #164 before opening Phase 1 implementation branches.
### Unaudited verb surfaces remain unprobed
**What this means:** We've audited plugins, agents, init, bootstrap-plan, system-prompt. Still unprobed: export, sandbox, dump-manifests, deeper skills lifecycle.
**Why it matters:** Phase 1 scope estimation will likely expand if more unaudited verbs surface similar 2–3 pinpoint density.
**Action:** Cycles #106+ will continue probing unaudited surfaces. Phase 1 sequence adjusts if new families emerge.
---
## Reviewer Checkpoints
**Before approving:**
1. ✅ Do the Phase 0 commits actually deliver what they claim? (Test coverage, routing changes, guard logic)
2. ✅ Is the SCHEMAS.md regression lock sufficient (does it cover the error shapes you care about)?
3. ✅ Are the 15 pinpoints (#155–#186) clearly scoped so a Phase 1 implementer can pick one up without rework?
4. ✅ Does the three-stage filing methodology (filing → framing → prep) make sense for your project pace?
5. ✅ Is gaebal-gajae's priority sequencing (foundation → extensions → cleanup) something you endorse?
**Before squashing/fast-forwarding:**
1. ✅ No outstanding merge conflicts with main
2. ✅ All 227 tests pass on main (not just this branch)
3. ✅ No style drift (fmt + clippy clean)
**After merge:**
1. ✅ Tag the merge commit as `phase-0-complete` for easy reference
2. ✅ Update the issue/PR #164 status to "awaiting decision before Phase 1 kickoff"
- **For leadership:** Is the Phase 0 shape guarantee (error.kind + error.operation + error.target + error.hint always together) a contract we want to support for 2+ major versions?
- **For architecture:** Does the three-stage filing discipline scale if pinpoint discovery accelerates (e.g. 10+ new gaps per cycle)?
- **For product:** Should the SCHEMAS.md version be bumped to 2.1 after Phase 0 lands to signal the new guarantees?
---
## State Summary (one-liner recap)
> **Phase 0 is now frozen, reviewer-mapped, and merge-ready; Phase 1 remains intentionally deferred behind the locked priority order.**
---
**Branch ready for review. Awaiting approval + merge signal.**
**Purpose:** Build a unified error handler for orchestration code using claw-code as a library or subprocess.
After cycles #178–#179 (parser-front-door hole closure), claw-code's error interface is deterministic, machine-readable, and clawable: **one error handler for all 14 clawable commands.**
---
## Quick Reference: Exit Codes and Envelopes
Every clawable command returns JSON on stdout when `--output-format json` is requested.
**IMPORTANT:** The exit code contract below applies **only when `--output-format json` is explicitly set**. Text mode follows argparse conventions and may return different exit codes (e.g., `2` for argparse parse errors). Claws consuming claw-code as a subprocess MUST always pass `--output-format json` to get the documented contract.
| Exit Code | Meaning | Response Format | Example |
**Practical rule for claws:** always pass `--output-format json`. This eliminates text-mode surprises and gives you the documented exit-code contract for every error path.
---
## One-Handler Pattern
Build a single error-recovery function that works for all 14 clawable commands:
| **parse** | 1 | Argparse error (unknown command, missing arg, invalid flag) | No | Real error message included (#179); valid choices list for discoverability |
| **session_not_found** | 1 | load-session target doesn't exist | No | session_id and directory included in envelope |
| **filesystem** | 1 | Directory missing, permission denied, disk full | Yes | Transient issues (disk space, NFS flake) can be retried |
| **runtime** | 1 | Engine error (unexpected exception, malformed input) | Depends | `error.retryable` field in envelope specifies |
| **timeout** | 2 | Engine timeout with cooperative cancellation | Yes* | `cancel_observed` field signals session safety (#164) |
*Retry safety depends on `cancel_observed`:
-`cancel_observed=true` → session is safe to reuse
-`cancel_observed=false` → session may be wedged; allocate fresh one
---
## What We Did to Make This Work
### Cycle #178: Parse-Error Envelope
**Problem:**`claw nonexistent --output-format json` returned argparse help text on stderr instead of an envelope.
**Solution:** Catch argparse `SystemExit` in JSON mode and emit a structured error envelope.
**Benefit:** Claws no longer need to parse human help text to understand parse errors.
### Cycle #179: Stderr Hygiene + Real Error Message
**Problem:** Even after #178, argparse usage was leaking to stderr AND the envelope message was generic ("invalid command or argument").
**Solution:** Monkey-patch `parser.error()` in JSON mode to raise an internal exception, preserving argparse's real message verbatim. Suppress stderr entirely in JSON mode.
**Benefit:** Claws see one stream (stdout), one envelope, and real error context (e.g., "invalid choice: typo (choose from ...)") for discoverability.
### Contract: #164 Stage B (`cancel_observed` field)
**Problem:** Timeout results didn't signal whether the engine actually observed the cancellation request.
**Solution:** Add `cancel_observed: bool` field to timeout TurnResult; signal true iff the engine had a fair chance to observe the cancel event.
**Benefit:** Claws can decide "retry with fresh session" vs "reuse this session with larger timeout" based on a single boolean.
---
## Common Mistakes to Avoid
❌ **Don't parse exit code alone**
```python
# BAD: Exit code 1 could mean parse error, not-found, filesystem, or runtime
ifresult.returncode==1:
# What should I do? Unclear.
pass
```
✅ **Do parse error.kind**
```python
# GOOD: error.kind tells you exactly how to recover
matchenvelope['error']['kind']:
case'parse':...
case'session_not_found':...
case'filesystem':...
```
---
❌ **Don't capture both stdout and stderr and assume they're separate concerns**
```python
# BAD (pre-#179): Capture stdout + stderr, then parse stdout as JSON
# But stderr might contain argparse noise that you have to string-match
For reference, the target JSON error envelope shape (SCHEMAS.md, v2.0):
```json
{
"timestamp":"2026-04-22T11:40:00Z",
"command":"load-session",
"exit_code":1,
"output_format":"json",
"schema_version":"2.0",
"error":{
"kind":"session_not_found",
"operation":"session_store.load_session",
"target":"nonexistent",
"retryable":false,
"message":"session 'nonexistent' not found in .port_sessions",
"hint":"use 'list-sessions' to see available sessions"
}
}
```
**This is the target schema after [`FIX_LOCUS_164`](./FIX_LOCUS_164.md) is implemented.** The migration plan includes a dual-mode `--envelope-version=2.0` flag in Phase 1, default version bump in Phase 2, and deprecation in Phase 3. For now, code against v1.0 (Appendix A).
---
## Summary
After cycles #178–#179, **one error handler works for all 14 clawable commands.** No more string-matching, no more stderr parsing, no more exit-code ambiguity. Just parse the JSON, check `error.kind`, and decide: retry, escalate, or reuse session (if safe).
The handler itself is ~80 lines of Python; the patterns are reusable across any language that can speak JSON.
## 0. CRITICAL UPDATE (Cycle #85 via #168 Evidence)
**Premise revision:** This locus document originally framed the problem as **"v1.0 (incoherent) → v2.0 (target schema)"** migration. **Fresh-dogfood validation in cycle #84 proved this framing was underspecified.**
**Actual problem (evidence from #168):**
- There is **no coherent v1.0 envelope contract**. Each verb has a bespoke JSON shape.
-`claw list-sessions --output-format json` emits `{command, sessions}` — has `command` field
-`claw doctor --output-format json` emits `{checks, kind, message, ...}` — no `command` field
- Each verb renderer was written independently with no coordinating contract
**Revised migration plan — three phases instead of two:**
1.**Phase 0 (Emergency):** Fix silent failures (#168 bootstrap JSON). Every `--output-format json` command must emit valid JSON.
2.**Phase 1 (v1.5 Baseline):** Establish minimal JSON invariants across all 14 verbs without breaking existing consumers:
- Every command emits valid JSON when `--output-format json` is passed
- Every command has a top-level `kind` field identifying the verb
- Every error envelope follows the confirmed `{error, hint, kind, type}` shape
- Every success envelope has the verb name in a predictable location
- **Effort:** ~3 dev-days (no new design, just fill gaps and normalize bugs)
3.**Phase 2 (v2.0 Wrapped Envelope):** Execute the original Phase 1 plan documented below — common metadata wrapper, nested data/error objects, opt-in via `--envelope-version=2.0`.
4.**Phase 3 (v2.0 Default):** Original Phase 2 plan below.
5.**Phase 4 (v1.0/v1.5 Deprecation):** Original Phase 3 plan below.
**Why add Phase 0 + Phase 1 (v1.5)?**
- You can't migrate from "incoherent" to "coherent v2.0" in one jump. Intermediate coherence (v1.5 baseline) is required.
- Consumer code built against "whatever v1 emits today" needs a stable target to transition from.
- **Silent failures (bootstrap JSON) must be fixed BEFORE any migration** — otherwise consumers have no way to detect breakage.
**Blocker resolved:** The original blocker "v1.0 design vs v2.0 design" is actually "no v1 design exists; let's make one (v1.5) then migrate." This is a **clearer, lower-risk migration path**.
**Revised effort estimate:** ~9 dev-days total (Phase 0: 1 day + Phase 1/v1.5: 3 days + Phase 2/v2.0: 5 days) instead of ~6 dev-days for a direct v1.0→v2.0 migration (which would have failed given the incoherent baseline).
**Doctrine implication:** Cycles #76–#82 diagnosed "aspirational vs current" correctly but missed that "current" was never a single thing. Cycle #84 fresh-dogfood caught this. **Fresh-dogfood discipline (principle #9) prevented a 6-day migration effort from hitting an unsolvable baseline problem.**
---
## 1. Scope — What This Migration Affects
**Every JSON-emitting verb.** Audit across the 14 documented verbs:
| Verb | Current top-level keys | Schema-conformant? |
1.**Schema parity:** Every `--output-format json` command emits v2.0 envelope shape exactly per SCHEMAS.md
2.**Success/error symmetry:** Success envelopes have `data` field; error envelopes have `error` object; never both
3.**kind semantic unification:**`data.kind` = verb identity (when present); `error.kind` = enum from §4.44. No overloading.
4.**Common metadata:**`timestamp`, `command`, `exit_code`, `output_format`, `schema_version` present in ALL envelopes
5.**Dual-mode support:**`--envelope-version=1|2` flag allows opt-in/opt-out during migration
6.**Tests:** Per-verb golden test fixtures for both v1.0 and v2.0 envelopes
7.**Documentation:** SCHEMAS.md documents both versions with deprecation timeline
---
## 6. Risks
### 6a. Breaking Change Risk
Phase 2 (default version bump) WILL break consumers that depend on flat-shape envelope. Mitigations:
- Dual-mode flag allows opt-in testing before default change
- Long grace period (Phase 3 deprecation ~6 months post-Phase 2)
- Clear migration guide + example consumer code
### 6b. Implementation Risk
14 verbs to migrate. Each verb has its own success shape (`checks`, `agents`, `phases`, etc.). Payload structure stays the same; only the wrapper changes. Mechanical but high-volume.
**Mitigation:** Start with doctor, status, version as pilot. If pattern works, batch remaining 11.
### 6c. Error Classification Remapping Risk
Changing `kind: "cli_parse"` to `error.kind: "parse"` is a breaking change even within the error envelope. Consumers doing `response["kind"] == "cli_parse"` will break.
**Mitigation:** Document explicitly in migration guide. Provide sed script if needed.
**Purpose:** Streamline merging of the 17 review-ready branches by grouping them into safe clusters and providing per-cluster merge order + validation steps.
**Batch strategy:** Merge by cluster, not individual branches. Each cluster shares the same fix pattern, so reviewers can validate one cluster and merge all members together.
**Estimated throughput:** 2-3 clusters per merge session. At current cycle velocity (~1 cluster per 15 min), full queue → merged main in ~2 hours.
- Run: `cargo test -p rusty-claude-cli` (should pass 181 tests)
**Commit strategy:** Rebase all three, squash into single "typed-error: thread kind+hint through 3 families" commit, OR merge individually preserving commit history for bisect clarity.
- [ ]#122 and #122b are binary-level changes, #161 is build-system change
- [ ] All three pass `cargo build`
- [ ] No cross-crate merge conflicts
**Why these three together:** All share the diagnostic-strictness principle. #122 and #122b extend `doctor`, #161 fixes `version`. Merging as a cluster signals the principle to future reviewers.
**Post-merge validation:**
- Rebuild binary
- Run: `claw doctor` (should now check stale-base + broad-cwd)
- Run: `claw version` (should report correct SHA even in worktrees)
- Run: `cargo test` (full suite)
**Commit strategy:** Merge individually preserving history, then add ROADMAP commit explaining the cluster principle. This makes the doctrine visible in git log.
- [ ] All four branches edit help-topic routing in the same regions
- [ ] Verify no merge conflicts (should be sequential, non-overlapping edits)
- [ ]`cargo build` passes
**Why these four together:** All address help-parity (verbs in `--help` → correct help topics). This cluster is the most "batch-like" — identical fix pattern repeated.
**Post-merge validation:**
- Rebuild binary
- Run: `claw diff --help` (should route to help topic, not crash)
- Run: `claw config --help` (ditto)
- Run: `claw --help` (should list all verbs)
**Merge strategy:** Can be fast-forwarded or squashed as a unit since they're all the same pattern.
**Branch-last protocol validation:** All 17 branches here represent work that was:
1. Pinpoint filed (with repro + fix shape)
2. Implemented in scratch/worktree (not directly on main)
3. Verified to build + pass tests
4. Only then branched for review
This artifact provides the final step: **validated merge order + per-cluster risks.**
**Integration-support artifact:** This checklist reduces reviewer cognitive load by pre-answering "which merge order is safest?" and "what could go wrong?" questions.
This document governs the audit and potential promotion of 12 OPT_OUT surfaces (commands that currently do **not** support `--output-format json`).
## OPT_OUT Classification Rationale
A surface is classified as OPT_OUT when:
1.**Human-first by nature:** Rich Markdown prose / diagrams / structured text where JSON would be information loss
2.**Query-filtered alternative exists:** Commands with internal `--query` / `--limit` don't need JSON (users already have escape hatch)
3.**Simulation/debug only:** Not meant for production orchestration (e.g., mode simulators)
4.**Future JSON work is planned:** Documented in ROADMAP with clear upgrade path
---
## OPT_OUT Surfaces (12 Total)
### Group A: Rich-Markdown Reports (4 commands)
**Rationale:** These emit structured narrative prose. JSON would require lossy serialization.
| Command | Output | Current use | JSON case |
|---|---|---|---|
| `summary` | Multi-section workspace summary (Markdown) | Human readability | Not applicable; Markdown is the output |
| `manifest` | Workspace manifest with project tree (Markdown) | Human readability | Not applicable; Markdown is the output |
| `parity-audit` | TypeScript/Python port comparison report (Markdown) | Human readability | Not applicable; Markdown is the output |
| `setup-report` | Preflight + startup diagnostics (Markdown) | Human readability | Not applicable; Markdown is the output |
**Audit decision:** These likely remain OPT_OUT long-term (Markdown-as-output is intentional). If JSON version needed in future, would be a separate `--output-format json` path generating structured data (project summary object, manifest array, audit deltas, setup checklist) — but that's a **new contract**, not an addition to existing Markdown surfaces.
**Pinpoint:**#175 (deferred) — audit whether `summary`/`manifest` should emit JSON structured versions *in parallel* with Markdown, or if Markdown-only is the right UX.
---
### Group B: List Commands with Query Filters (3 commands)
**Rationale:** These already support `--query` and `--limit` for filtering. JSON output would be redundant; users can pipe to `jq`.
| Command | Filtering | Current output | JSON case |
|---|---|---|---|
| `subsystems` | `--limit` | Human-readable list | Use `--query` to filter, users can parse if needed |
| `commands` | `--query`, `--limit`, `--no-plugin-commands`, `--no-skill-commands` | Human-readable list | Use `--query` to filter, users can parse if needed |
| `tools` | `--query`, `--limit`, `--simple-mode` | Human-readable list | Use `--query` to filter, users can parse if needed |
**Audit decision:**`--query` / `--limit` are already the machine-friendly escape hatch. These commands are **intentionally** list-filter-based (not orchestration-primary). Promoting to CLAWABLE would require:
1. Formalizing what the structured output *is* (command array? tool array?)
2. Versioning the schema per command
3. Updating tests to validate per-command schemas
**Cost-benefit:** Low. Users who need structured data can already use `--query` to narrow results, then parse. Effort to promote > value.
**Pinpoint:**#176 (backlog) — audit `--query` UX; consider if a `--query-json` escape hatch (output JSON of matching items) is worth the schema tax.
---
### Group C: Simulation / Debug Surfaces (5 commands)
**Rationale:** These are intentionally **not production-orchestrated**. They simulate behavior, test modes, or debug scenarios. JSON output doesn't add value.
| Command | Purpose | Output | Use case |
|---|---|---|---|
| `remote-mode` | Simulate remote execution | Text (mock session) | Testing harness behavior under remote constraints |
| `ssh-mode` | Simulate SSH execution | Text (mock SSH session) | Testing harness behavior over SSH-like transport |
| `teleport-mode` | Simulate teleport hop | Text (mock hop session) | Testing harness behavior with teleport bouncing |
| `direct-connect-mode` | Simulate direct network | Text (mock session) | Testing harness behavior with direct connectivity |
| `deep-link-mode` | Simulate deep-link invocation | Text (mock deep-link) | Testing harness behavior from URL/deeplink |
**Audit decision:** These are **intentionally simulation-only**. Promoting to CLAWABLE means:
1. "This simulated mode is now a valid orchestration surface"
2. Need to define what JSON output *means* (mock session state? simulation log?)
3. Need versioning + test coverage
**Cost-benefit:** Very low. These are debugging tools, not orchestration endpoints. Effort to promote >> value.
**Pinpoint:**#177 (backlog) — decide if mode simulators should ever be CLAWABLE (probably no).
---
## Audit Workflow (Future Cycles)
### For each surface:
1.**Survey:** Check if any external claw actually uses --output-format with this surface
2.**Cost estimate:** How much schema work + testing?
3.**Value estimate:** How much demand for JSON version?
4.**Decision:** CLAWABLE, remain OPT_OUT, or new pinpoint?
### Promotion criteria (if promoting to CLAWABLE):
A surface moves from OPT_OUT → CLAWABLE **only if**:
- ✅ Clear use case for JSON (not just "hypothetically could be JSON")
- ✅ Schema is simple and stable (not 20+ fields)
- ✅ At least one external claw has requested it
- ✅ Tests can be added without major refactor
- ✅ Maintainability burden is worth the value
### Demote criteria (if staying OPT_OUT):
A surface stays OPT_OUT **if**:
- ✅ JSON would be information loss (Markdown reports)
**Purpose:** Record real demand signals for promoting OPT_OUT surfaces to CLAWABLE. Without this log, the audit criteria in `OPT_OUT_AUDIT.md` have no evidentiary base.
**Status:** Active survey window (post-#178/#179, cycles #21+)
## How to file a demand signal
When any external claw, operator, or downstream consumer actually needs JSON output from one of the 12 OPT_OUT surfaces, add an entry below. **Speculation, "could be useful someday," and internal hypotheticals do NOT count.**
The threshold is intentionally high. Single-use hacks can be served via one-off Markdown parsing; schema promotion is expensive (docs, tests, maintenance).
---
## Demand Signals Received
### Group A: Rich-Markdown Reports
#### `summary`
**Signals received: 0**
Notes: No demand recorded. Markdown output is intentional and useful for human review.
#### `manifest`
**Signals received: 0**
Notes: No demand recorded.
#### `parity-audit`
**Signals received: 0**
Notes: No demand recorded. Report consumers are humans reviewing porting progress, not automation.
#### `setup-report`
**Signals received: 0**
Notes: No demand recorded.
---
### Group B: List Commands with Query Filters
#### `subsystems`
**Signals received: 0**
Notes: `--limit` already provides filtering. No claws requesting JSON.
**Current assessment:** Zero demand for any OPT_OUT surface promotion. This is consistent with `OPT_OUT_AUDIT.md` prediction that all 12 likely stay OPT_OUT long-term.
- Move all 12 surfaces to `PERMANENTLY_OPT_OUT` or similar
- Remove `OPT_OUT_SURFACES` from `test_cli_parity_audit.py` (everything is explicitly non-goal)
- Update `CLAUDE.md` to reflect maintainership mode
- Close `OPT_OUT_AUDIT.md` with "audit complete, no promotions"
### If 1–2 signals on isolated surfaces:
- File individual promotion pinpoints per surface with demand evidence
- Each goes through standard #171/#172/#173 loop (parity audit, SCHEMAS.md, consistency test)
### If high demand (3+ signals):
- Reopen audit: is the OPT_OUT classification actually correct?
- Review whether protocol expansion is warranted
---
## Related Files
- **`OPT_OUT_AUDIT.md`** — Audit criteria, decision table, rationale by group
- **`SCHEMAS.md`** — JSON contract for the 14 CLAWABLE surfaces
- **`tests/test_cli_parity_audit.py`** — Machine enforcement of CLAWABLE/OPT_OUT classification
- **`CLAUDE.md`** — Development posture (maintainership mode)
---
## Philosophy
**Prevent speculative expansion.** The discipline of requiring real signals before promotion protects the protocol from schema bloat. Every new CLAWABLE surface adds:
- A SCHEMAS.md section (maintenance burden)
- Test coverage (test suite tax)
- Documentation (cognitive load for new developers)
- Version compatibility (schema_version bump risk)
If a claw can't articulate *why* it needs JSON for `summary` beyond "it would be nice," then JSON for `summary` is not needed. The Markdown output is a feature, not a gap.
The audit log closes the loop on "governed non-goals": OPT_OUT surfaces are intentionally not clawable until proven otherwise by evidence.
- Canonical document: this top-level `PARITY.md` is the file consumed by `rust/scripts/run_mock_parity_diff.py`.
- Requested 9-lane checkpoint: **All 9 lanes merged on `main`.**
- Current `main` HEAD: `ee31e00` (stub implementations replaced with real AskUserQuestion + RemoteTrigger).
- Repository stats at this checkpoint: **292 commits on `main` / 293 across all branches**, **9 crates**, **48,599 tracked Rust LOC**, **2,568 test LOC**, **3 authors**, date range **2026-03-31 → 2026-04-03**.
- Current `main` HEAD: `ad1cf92` (doctrine loop canonical example).
- Repository stats at this checkpoint: **979 commits on `main`**, **9 crates**, **80,789 tracked Rust LOC**, **4,533 test LOC**, **3 authors**, date **2026-04-23**.
- **Growth since last PARITY update (2026-04-03):** Rust LOC +66% (48,599 → 80,789), Test LOC +76% (2,568 → 4,533), Commits +235% (292 → 979). Current phase: 13 branches awaiting review/integration.
**Why it's Priority 2:** Extensions. Guard clauses on existing envelope shape. Uses envelope from Priority 1.
**Implementation:** Add trailing-args rejection to `init` and unknown-flag rejection to `bootstrap-plan`. Pattern: match existing guard in #171 (extra-args classifier).
**Risk profile:** MEDIUM (adds guards, no shape changes)
**Git Bash / WSL** are optional alternatives, not requirements. If you prefer bash-style paths (`/c/Users/you/...` instead of `C:\Users\you\...`), Git Bash (ships with Git for Windows) works well. In Git Bash, the `MINGW64` prompt is expected and normal — not a broken install.
## Post-build: locate the binary and verify
After running `cargo build --workspace`, the `claw` binary is built but **not** automatically installed to your system. Here's where to find it and how to verify the build succeeded.
### Binary location
After `cargo build --workspace` in `claw-code/rust/`:
**Debug build (default, faster compile):**
- **macOS/Linux:** `rust/target/debug/claw`
- **Windows:** `rust/target/debug/claw.exe`
**Release build (optimized, slower compile):**
- **macOS/Linux:** `rust/target/release/claw`
- **Windows:** `rust/target/release/claw.exe`
If you ran `cargo build` without `--release`, the binary is in the `debug/` folder.
### Verify the build succeeded
Test the binary directly using its path:
```bash
# macOS/Linux (debug build)
./rust/target/debug/claw --help
./rust/target/debug/claw doctor
# Windows PowerShell (debug build)
.\rust\target\debug\claw.exe --help
.\rust\target\debug\claw.exe doctor
```
If these commands succeed, the build is working. `claw doctor` is your first health check — it validates your API key, model access, and tool configuration.
### Optional: Add to PATH
If you want to run `claw` from any directory without the full path, choose one of these approaches:
Build and install to Cargo's default location (`~/.cargo/bin/`, which is usually on PATH):
```bash
# From the claw-code/rust/ directory
cargo install --path . --force
# Then from anywhere
claw --help
```
**Option 3: Update shell profile (bash/zsh)**
Add this line to `~/.bashrc` or `~/.zshrc`:
```bash
export PATH="$(pwd)/rust/target/debug:$PATH"
```
Reload your shell:
```bash
source ~/.bashrc # or source ~/.zshrc
claw --help
```
### Troubleshooting
- **"command not found: claw"** — The binary is in `rust/target/debug/claw`, but it's not on your PATH. Use the full path `./rust/target/debug/claw` or symlink/install as above.
- **"permission denied"** — On macOS/Linux, you may need `chmod +x rust/target/debug/claw` if the executable bit isn't set (rare).
- **Debug vs. release** — If the build is slow, you're in debug mode (default). Add `--release` to `cargo build` for faster runtime, but the build itself will take 5–10 minutes.
> [!NOTE]
> **Auth:** claw requires an **API key** (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, etc.) — Claude subscription login is not a supported auth path.
Run the workspace test suite:
Run the workspace test suite after verifying the binary works:
This is an integration support artifact (per cycle #64 doctrine). Its purpose: let reviewers see all queued branches, cluster membership, and merge priorities without re-deriving from git log.
> **⚠️ CRITICAL: This document describes the TARGET v2.0 envelope schema, not the current v1.0 binary behavior.** The Rust binary currently emits a **flat v1.0 envelope** that does NOT include `timestamp`, `command`, `exit_code`, `output_format`, or `schema_version` fields. See [`FIX_LOCUS_164.md`](./FIX_LOCUS_164.md) for the full migration plan and timeline. **Do not build automation against the field shapes below without first testing against the actual binary output.** Use `claw <command> --output-format json` to inspect what your binary version actually emits.
This document locks the **target** field-level contract for all clawable-surface commands. After the v1.0→v2.0 migration (FIX_LOCUS_164 Phase 2), every command accepting `--output-format json` will conform to the envelope shapes documented here.
**Current v1.0 reality:** See [`ERROR_HANDLING.md`](./ERROR_HANDLING.md) Appendix A for the flat envelope shape the binary actually emits today.
---
## Common Fields (All Envelopes) — TARGET v2.0 SCHEMA
**This section describes the v2.0 target schema. The current v1.0 binary does NOT emit these fields.** See FIX_LOCUS_164.md for the migration timeline.
After v2.0 migration, every command response, success or error, will carry:
```json
{
"timestamp":"2026-04-22T10:10:00Z",
"command":"list-sessions",
"exit_code":0,
"output_format":"json",
"schema_version":"2.0"
}
```
| Field | Type | Required | Notes |
|---|---|---|---|
| `timestamp` | ISO 8601 UTC | Yes | Time command completed |
**Gap**: Current impl lacks `timestamp`, `exit_code`, `output_format`, `schema_version`, `directory`, `sessions_count` (derivable), and the session object uses `id`/`updated_at_ms`/`message_count` instead of `session_id`/`last_modified`/`prompt_count`. Follow-up #250 Option B to align field names and add common-envelope fields.
### `delete-session`
**Status**: ⚠️ Stub only (closed #251 dispatch-order fix; full impl deferred).
**Actual binary envelope** (as of #251 fix):
```json
{
"type":"error",
"command":"delete-session",
"error":"not_yet_implemented",
"kind":"not_yet_implemented"
}
```
Exit code: 1. No credentials required. The stub ensures the verb does NOT fall through to Prompt/auth (the #251 fix), but the actual delete operation is not yet wired.
For nonexistent sessions, emits a local `session_not_found` error (NOT `missing_credentials`):
```json
{
"error":"session not found: nonexistent",
"kind":"session_not_found",
"type":"error",
"hint":"Hint: managed sessions live in .claw/sessions/<hash>/ ..."
}
```
**Aspirational (future) shape**:
```json
{
"timestamp":"2026-04-22T10:10:00Z",
"command":"load-session",
"exit_code":0,
"session_id":"sess_abc123",
"loaded":true,
"directory":".claw/sessions",
"path":".claw/sessions/sess_abc123.jsonl"
}
```
**Gap**: Current impl uses nested `session: {...}` instead of flat fields, and omits common-envelope fields. Follow-up #250 Option B to align.
### `flush-transcript`
**Status**: ⚠️ Stub only (closed #251 dispatch-order fix; full impl deferred).
**Actual binary envelope** (as of #251 fix):
```json
{
"type":"error",
"command":"flush-transcript",
"error":"not_yet_implemented",
"kind":"not_yet_implemented"
}
```
Exit code: 1. No credentials required. Like `delete-session`, this stub resolves the #251 dispatch-order bug but the actual flush operation is not yet wired.
**Aspirational (future) shape**:
```json
{
"timestamp":"2026-04-22T10:10:00Z",
"command":"flush-transcript",
"exit_code":0,
"session_id":"sess_abc123",
"path":".claw/sessions/sess_abc123.jsonl",
"flushed":true,
"messages_count":12,
"input_tokens":4500,
"output_tokens":1200
}
```
### `show-command`
```json
{
"timestamp":"2026-04-22T10:10:00Z",
"command":"show-command",
"exit_code":0,
"name":"add-dir",
"found":true,
"source_hint":"commands/add-dir/add-dir.tsx",
"responsibility":"creates a new directory in the worktree"
error_msg=envelope.get("error","unknown error")# error is a STRING
error_kind=envelope.get("kind")# kind is at TOP LEVEL
print(f"Error: {error_kind} — {error_msg}")
else:
# Success path: verb-specific fields at top level
sessions=envelope.get("sessions",[])
forsessioninsessions:
print(f"Session: {session['id']}")
```
**After v2.0 migration, this code will break.** Claws building for v2.0 compatibility should:
1. Check `schema_version` field
2. Parse differently based on version
3. Or wait until Phase 2 default bump is announced, then migrate
### Why This Mismatch Exists
SCHEMAS.md was written as the **target design** for v2.0. The Rust binary is still on v1.0. The migration (FIX_LOCUS_164) will bring the binary in line with this schema, but it hasn't happened yet.
**This mismatch is the root cause of doc-truthfulness issues #78, #79, #165.** All three docs were documenting the v2.0 target as if it were current reality.
### Questions?
- **"Is v2.0 implemented?"** No. The binary is v1.0. See FIX_LOCUS_164.md for the implementation roadmap.
- **"Should I build against v2.0 schema?"** No. Build against v1.0 (current). Test your code with `claw` to verify.
- **"When does v2.0 ship?"** See FIX_LOCUS_164.md Phase 1 estimate: ~6 dev-days. Not scheduled yet.
- **"Can I use v2.0 now?"** Only if you explicitly pass `--envelope-version=2.0` (which doesn't exist yet in v1.0 binary).
**Status:** 📸 Snapshot of actual binary behavior as of cycle #91 (2026-04-23). Anchored by controlled matrix `/tmp/cycle87-audit/matrix.json` + Phase 0 tests in `output_format_contract.rs`.
### Purpose
This section documents **what each verb actually emits under `--output-format json`** as of the v1.5 emission baseline (post-cycle #89 emission routing fix, pre-Phase 1 shape normalization).
This is a **reference artifact**, not a target schema. It describes the reality that:
1.`--output-format json` exists and emits JSON (enforced by Phase 0 Task 2)
2. All output goes to stdout (enforced by #168c fix, cycle #89)
3. Each verb has a bespoke top-level shape (documented below; to be normalized in Phase 1)
This guide covers the current Rust workspace under `rust/` and the `claw` CLI binary. If you are brand new, make the doctor health check your first run: start `claw`, then run `/doctor`.
> [!TIP]
> **Building orchestration code that calls `claw` as a subprocess?** See [`ERROR_HANDLING.md`](./ERROR_HANDLING.md) for the unified error-handling pattern (one handler for all 14 clawable commands, exit codes, JSON envelope contract, and recovery strategies).
## Quick-start health check
Run this before prompts, sessions, or automation:
@@ -33,6 +36,60 @@ cargo build --workspace
The CLI binary is available at `rust/target/debug/claw` after a debug build. Make the doctor check above your first post-build step.
### Add binary to PATH
To run `claw` from anywhere without typing the full path:
**Option 1: Symlink to a directory already in your PATH**
```bash
# Find a PATH directory (usually ~/.local/bin or /usr/local/bin)
echo$PATH
# Create symlink (adjust path and PATH-dir as needed)
# Should run health check (shows which components are initialized)
claw doctor
# Should show available commands
claw --help
```
If `claw: command not found`, the PATH addition didn't take. Re-check:
```bash
echo$PATH# verify your PATH directory is listed
which claw # should show full path to binary
ls -la ~/.local/bin/claw # if using symlink, verify it exists and points to target/debug/claw
```
## Quick start
### First-run doctor check
@@ -52,6 +109,26 @@ cd rust
**Note:** Diagnostic verbs (`doctor`, `status`, `sandbox`, `version`) support `--output-format json` for machine-readable output. Invalid suffix arguments (e.g., `--json`) are now rejected at parse time rather than falling through to prompt dispatch.
### Initialize a repository
Set up a new repository with `.claw` config, `.claw.json`, `.gitignore` entries, and a `CLAUDE.md` guidance file:
```bash
cd /path/to/your/repo
./target/debug/claw init
```
Text mode (human-readable) shows artifact creation summary with project path and next steps. Idempotent — running multiple times in the same repo marks already-created files as "skipped".
JSON mode for scripting:
```bash
./target/debug/claw init --output-format json
```
Returns structured output with `project_path`, `created[]`, `updated[]`, `skipped[]` arrays (one per artifact), and `artifacts[]` carrying each file's `name` and machine-stable `status` tag. The legacy `message` field preserves backward compatibility.
**Why structured fields matter:** Claws can detect per-artifact state (`created` vs `updated` vs `skipped`) without substring-matching human prose. Use the `created[]`, `updated[]`, and `skipped[]` arrays for conditional follow-up logic (e.g., only commit if files were actually created, not just updated).
### Interactive REPL
```bash
@@ -75,11 +152,148 @@ cd rust
### JSON output for scripting
All clawable commands support `--output-format json` for machine-readable output.
**IMPORTANT SCHEMA VERSION NOTICE:**
The JSON envelope is currently in **v1.0 (flat shape)** and is scheduled to migrate to **v2.0 (nested schema)** in a future release. See [`FIX_LOCUS_164.md`](./FIX_LOCUS_164.md) for the full migration plan.
**Building a dispatcher or orchestration script?** See [`ERROR_HANDLING.md`](./ERROR_HANDLING.md) for the unified error-handling pattern. One code example works for all 14 clawable commands: parse the exit code, classify by `error.kind`, apply recovery strategies (retry, timeout recovery, validation, logging). Use that pattern instead of reimplementing error handling per command.
**Migrating to v2.0?** Check back after [`FIX_LOCUS_164`](./FIX_LOCUS_164.md) is implemented. Phase 1 will add a `--envelope-version=2.0` flag for opt-in access to the structured envelope schema. Phase 2 will make v2.0 the default. Phase 3 will deprecate v1.0.
### Inspect worker state
The `claw state` command reads `.claw/worker-state.json`, which is written by the interactive REPL or a one-shot prompt when a worker executes a task. This file contains the worker ID, session reference, model, and permission mode.
Prerequisite: You must run `claw` (interactive REPL) or `claw prompt <text>` at least once in the repository to produce the worker state file.
```bash
cd rust
./target/debug/claw state
```
JSON mode:
```bash
./target/debug/claw state --output-format json
```
If you run `claw state` before any worker has executed, you will see a helpful error:
```
error: no worker state file found at .claw/worker-state.json
Hint: worker state is written by the interactive REPL or a non-interactive prompt.
Run: claw # start the REPL (writes state on first turn)
Or: claw prompt <text> # run one non-interactive turn
These commands are available inside the interactive REPL (`claw` with no args). They extend the assistant with workspace analysis, planning, and navigation features.
### `/ultraplan` — Deep planning with multi-step reasoning
**Purpose:** Break down a complex task into steps using extended reasoning.
```bash
# Start the REPL
claw
# Inside the REPL
/ultraplan refactor the auth module to use async/await
/ultraplan design a caching layer for database queries
/ultraplan analyze this module for performance bottlenecks
```
Output: A structured plan with numbered steps, reasoning for each step, and expected outcomes. Use this when you want the assistant to think through a problem in detail before coding.
### `/teleport` — Jump to a file or symbol
**Purpose:** Quickly navigate to a file, function, class, or struct by name.
```bash
# Jump to a symbol
/teleport UserService
/teleport authenticate_user
/teleport RequestHandler
# Jump to a file
/teleport src/auth.rs
/teleport crates/runtime/lib.rs
/teleport ./ARCHITECTURE.md
```
Output: The file content, with the requested symbol highlighted or the file fully loaded. Useful for exploring the codebase without manually navigating directories. If multiple matches exist, the assistant shows the top candidates.
### `/bughunter` — Scan for likely bugs and issues
**Purpose:** Analyze code for common pitfalls, anti-patterns, and potential bugs.
```bash
# Scan the entire workspace
/bughunter
# Scan a specific directory or file
/bughunter src/handlers
/bughunter rust/crates/runtime
/bughunter src/auth.rs
```
Output: A list of suspicious patterns with explanations (e.g., "unchecked unwrap()", "potential race condition", "missing error handling"). Each finding includes the file, line number, and suggested fix. Use this as a first pass before a full code review.
**Purpose:** Dump built-in tool and plugin manifests to stdout as JSON, for parity comparison against the upstream Claude Code TypeScript implementation.
**Prerequisite:** This command requires access to upstream source files (`src/commands.ts`, `src/tools.ts`, `src/entrypoints/cli.tsx`). Set `CLAUDE_CODE_UPSTREAM` env var or pass `--manifests-dir`.
**When to use:** Parity work (comparing the Rust port's tool/plugin surface against the canonical TypeScript implementation). Not needed for normal operation.
**Error mode:** If upstream sources are missing, exits with `error-kind: missing_manifests` and a hint about how to provide them.
### `bootstrap-plan` — Show startup component graph
**Purpose:** Print the ordered list of startup components that are initialized when `claw` begins a session. Useful for debugging startup issues or verifying that fast-path optimizations are in place.
```bash
claw bootstrap-plan
```
**Sample output:**
```
- CliEntry
- FastPathVersion
- StartupProfiler
- SystemPromptFastPath
- ChromeMcpFastPath
```
**When to use:**
- Debugging why startup is slow (compare your plan to the expected one)
- Verifying that fast-path components are registered
- Understanding the load order before customizing hooks or plugins
**Related:** See `claw doctor` for health checks against these startup components.
**Purpose:** Report the current state of the ACP (Agent Context Protocol) / Zed editor integration. Currently **discoverability only** — no editor daemon is available yet.
```bash
claw acp
claw acp serve # same output; `serve` is accepted but not yet launchable
claw --acp # alias
claw -acp # alias
```
**Sample output:**
```
ACP / Zed
Status discoverability only
Launch `claw acp serve` / `claw --acp` / `claw -acp` report status only; no editor daemon is available yet
Today use `claw prompt`, the REPL, or `claw doctor` for local verification
Tracking ROADMAP #76
```
**When to use:** Check whether ACP/Zed integration is ready in your current build. Plan around its availability (track ROADMAP #76 for status).
**Today's alternatives:** Use `claw prompt` for one-shot runs, the interactive REPL for iterative work, or `claw doctor` for local verification.
### `export` — Export session transcript
**Purpose:** Export a managed session's transcript to a file or stdout. Operates on the currently-resumed session (requires `--resume`).
```bash
# Export latest session
claw --resume latest export
# Export specific session
claw --resume <session-id> export
```
**Prerequisite:** A managed session must exist under `.claw/sessions/<workspace-fingerprint>/`. If no sessions exist, the command exits with `error-kind: no_managed_sessions` and a hint to start a session first.
**When to use:**
- Archive session transcripts for review
- Share session context with teammates
- Feed session history into downstream tooling
**Related:** Inside the REPL, `/export` is also available as a slash command for the active session.
## Session management
REPL turns are persisted under `.claw/sessions/` in the current workspace.
@@ -324,7 +625,27 @@ cd rust
./target/debug/claw --resume latest /status /diff
```
Useful interactive commands include `/help`, `/status`, `/cost`, `/config`, `/session`, `/model`, `/permissions`, and `/export`.
### Interactive slash commands (inside the REPL)
Useful interactive commands include:
-`/help` — Show help for all available commands
-`/status` — Display current session and workspace status
-`/cost` — Show token usage and cost estimates for the session
-`/config` — Display current configuration and environment state
-`/session` — Show session ID, creation time, and persisted metadata
-`/model` — Display or switch the active model
-`/permissions` — Check sandbox permissions and capability grants
-`/export [file]` — Export the current conversation to a file (or resume from backup)
-`/ultraplan [task]` — Run a deep planning prompt with multi-step reasoning (good for complex refactoring tasks)
-`/teleport <symbol-or-path>` — Jump to a file or symbol by searching the workspace (IDE-like navigation)
-`/bughunter [scope]` — Inspect the codebase for likely bugs in an optional scope (e.g., `src/runtime`)
-`/commit` — Generate a commit message and create a git commit from the conversation
-`/pr [context]` — Draft or create a pull request from the conversation
-`/issue [context]` — Draft or create a GitHub issue from the conversation
-`/diff` — Show unified diff of changes made in the current session
// #144: degrade gracefully on config parse failure (same contract
// as #143 for `status`). Text mode prepends a "Config load error"
// block before the MCP list; the list falls back to empty.
matchloader.load(){
Ok(runtime_config)=>Ok(render_mcp_summary_report(
cwd,
runtime_config.mcp().servers(),
)),
Err(err)=>{
letempty=std::collections::BTreeMap::new();
Ok(format!(
"Config load error\n Status fail\n Summary runtime config failed to load; reporting partial MCP view\n Details {err}\n Hint `claw doctor` classifies config parse errors; fix the listed field and rerun\n\n{}",
// #144: same degradation for `mcp show`; if config won't parse,
// the specific server lookup can't succeed, so report the parse
// error with context.
matchloader.load(){
Ok(runtime_config)=>Ok(render_mcp_server_report(
cwd,
server_name,
runtime_config.mcp().get(server_name),
)),
Err(err)=>Ok(format!(
"Config load error\n Status fail\n Summary runtime config failed to load; cannot resolve `{server_name}`\n Details {err}\n Hint `claw doctor` classifies config parse errors; fix the listed field and rerun"
// #80: show the actual workspace-fingerprint directory instead of lying about .claw/sessions/
letfingerprint_dir=sessions_root
.file_name()
.and_then(|f|f.to_str())
.unwrap_or("<unknown>");
format!(
"session not found: {reference}\nHint: managed sessions live in .claw/sessions/. Try `{LATEST_SESSION_REFERENCE}` for the most recent session or `/session list` in the REPL."
"session not found: {reference}\nHint: managed sessions live in .claw/sessions/{fingerprint_dir}/ (workspace-specific partition).\nTry `{LATEST_SESSION_REFERENCE}` for the most recent session or `/session list` in the REPL."
// #80: show the actual workspace-fingerprint directory instead of lying about .claw/sessions/
letfingerprint_dir=sessions_root
.file_name()
.and_then(|f|f.to_str())
.unwrap_or("<unknown>");
format!(
"no managed sessions found in .claw/sessions/\nStart `claw` to create a session, then rerun with `--resume {LATEST_SESSION_REFERENCE}`."
"no managed sessions found in .claw/sessions/{fingerprint_dir}/\nStart `claw` to create a session, then rerun with `--resume {LATEST_SESSION_REFERENCE}`.\nNote: claw partitions sessions per workspace fingerprint; sessions from other CWDs are invisible."
)
}
@@ -744,6 +763,40 @@ mod tests {
assert_eq!(fp_a1.len(),16,"fingerprint must be a 16-char hex string");
}
/// #151 regression: equivalent paths (e.g. `/tmp/foo` vs `/private/tmp/foo`
/// on macOS where `/tmp` is a symlink to `/private/tmp`) must resolve to
/// the same session store. Previously they diverged because
/// `workspace_fingerprint()` hashed the raw path string. Now
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.