mirror of https://github.com/instructkr/claw-code.git synced 2026-04-09 01:24:49 +08:00

Files

YeonGyu-Kim a9904fe693 docs(roadmap): file CLI provider dispatch bug as #29 , mark #28 as partial

#28 error-copy improvements landed on ff1df4c but real users (nicma,
Jengro) hit `error: missing Anthropic credentials` within hours when
using `--model openai/gpt-4` with OPENAI_API_KEY set and all
ANTHROPIC_* env vars unset on main.

Traced root cause in build_runtime_with_plugin_state at line ~6244:
AnthropicRuntimeClient::new() is hardcoded. BuiltRuntime is
statically typed as ConversationRuntime<AnthropicRuntimeClient, ...>.
providers::detect_provider_kind() computes the right routing at the
metadata layer but the runtime client is never dispatched.

Files #29 with the detailed trace + a focused action plan:
DynamicApiClient enum wrapping Anthropic + OpenAiCompat variants,
retype BuiltRuntime, dispatch in build_runtime based on
detect_provider_kind, integration test with mock OpenAI-compat
server.

#28 is marked partial — the error-copy improvements are real and
stayed in, but the routing gap they were meant to cover is the
actual bug and needs #29 to land.

2026-04-08 17:01:14 +09:00

46 KiB

Raw Blame History

ROADMAP.md

Clawable Coding Harness Roadmap

Goal

Turn claw-code into the most clawable coding harness:

no human-first terminal assumptions
no fragile prompt injection timing
no opaque session state
no hidden plugin or MCP failures
no manual babysitting for routine recovery

This roadmap assumes the primary users are claws wired through hooks, plugins, sessions, and channel events.

Definition of "clawable"

A clawable harness is:

deterministic to start
machine-readable in state and failure modes
recoverable without a human watching the terminal
branch/test/worktree aware
plugin/MCP lifecycle aware
event-first, not log-first
capable of autonomous next-step execution

Current Pain Points

1. Session boot is fragile

trust prompts can block TUI startup
prompts can land in the shell instead of the coding agent
"session exists" does not mean "session is ready"

2. Truth is split across layers

tmux state
clawhip event stream
git/worktree state
test state
gateway/plugin/MCP runtime state

3. Events are too log-shaped

claws currently infer too much from noisy text
important states are not normalized into machine-readable events

4. Recovery loops are too manual

restart worker
accept trust prompt
re-inject prompt
detect stale branch
retry failed startup
classify infra vs code failures manually

5. Branch freshness is not enforced enough

side branches can miss already-landed main fixes
broad test failures can be stale-branch noise instead of real regressions

6. Plugin/MCP failures are under-classified

startup failures, handshake failures, config errors, partial startup, and degraded mode are not exposed cleanly enough

7. Human UX still leaks into claw workflows

too much depends on terminal/TUI behavior instead of explicit agent state transitions and control APIs

Product Principles

State machine first — every worker has explicit lifecycle states.
Events over scraped prose — channel output should be derived from typed events.
Recovery before escalation — known failure modes should auto-heal once before asking for help.
Branch freshness before blame — detect stale branches before treating red tests as new regressions.
Partial success is first-class — e.g. MCP startup can succeed for some servers and fail for others, with structured degraded-mode reporting.
Terminal is transport, not truth — tmux/TUI may remain implementation details, but orchestration state must live above them.
Policy is executable — merge, retry, rebase, stale cleanup, and escalation rules should be machine-enforced.

Roadmap

Phase 1 — Reliable Worker Boot

1. Ready-handshake lifecycle for coding workers

Add explicit states:

spawning
trust_required
ready_for_prompt
prompt_accepted
running
blocked
finished
failed

Acceptance:

prompts are never sent before ready_for_prompt
trust prompt state is detectable and emitted
shell misdelivery becomes detectable as a first-class failure state

2. Trust prompt resolver

Add allowlisted auto-trust behavior for known repos/worktrees.

Acceptance:

trusted repos auto-clear trust prompts
events emitted for trust_required and trust_resolved
non-allowlisted repos remain gated

3. Structured session control API

Provide machine control above tmux:

create worker
await ready
send task
fetch state
fetch last error
restart worker
terminate worker

Acceptance:

a claw can operate a coding worker without raw send-keys as the primary control plane

Phase 2 — Event-Native Clawhip Integration

4. Canonical lane event schema

Define typed events such as:

lane.started
lane.ready
lane.prompt_misdelivery
lane.blocked
lane.red
lane.green
lane.commit.created
lane.pr.opened
lane.merge.ready
lane.finished
lane.failed
branch.stale_against_main

Acceptance:

clawhip consumes typed lane events
Discord summaries are rendered from structured events instead of pane scraping alone

5. Failure taxonomy

Normalize failure classes:

prompt_delivery
trust_gate
branch_divergence
compile
test
plugin_startup
mcp_startup
mcp_handshake
gateway_routing
tool_runtime
infra

Acceptance:

blockers are machine-classified
dashboards and retry policies can branch on failure type

6. Actionable summary compression

Collapse noisy event streams into:

current phase
last successful checkpoint
current blocker
recommended next recovery action

Acceptance:

channel status updates stay short and machine-grounded
claws stop inferring state from raw build spam

Phase 3 — Branch/Test Awareness and Auto-Recovery

7. Stale-branch detection before broad verification

Before broad test runs, compare current branch to main and detect if known fixes are missing.

Acceptance:

emit branch.stale_against_main
suggest or auto-run rebase/merge-forward according to policy
avoid misclassifying stale-branch failures as new regressions

8. Recovery recipes for common failures

Encode known automatic recoveries for:

trust prompt unresolved
prompt delivered to shell
stale branch
compile red after cross-crate refactor
MCP startup handshake failure
partial plugin startup

Acceptance:

one automatic recovery attempt occurs before escalation
the attempted recovery is itself emitted as structured event data

9. Green-ness contract

Workers should distinguish:

targeted tests green
package green
workspace green
merge-ready green

Acceptance:

no more ambiguous "tests passed" messaging
merge policy can require the correct green level for the lane type
a single hung test must not mask other failures: enforce per-test timeouts in CI (cargo test --workspace) so a 6-minute hang in one crate cannot prevent downstream crates from running their suites
when a CI job fails because of a hang, the worker must report it as test.hung rather than a generic failure, so triage doesn't conflate it with a normal assertion failed
recorded pinpoint (2026-04-08): be561bf swapped the local byte-estimate preflight for a count_tokens round-trip and silently returned Ok(()) on any error, so send_message_blocks_oversized_* hung for ~6 minutes per attempt; the resulting workspace job crash hid 6 separate pre-existing CLI regressions (compact flag discarded, piped stdin vs permission prompter, legacy session layout, help/prompt assertions, mock harness count) that only became diagnosable after 8c6dfe5 + 5851f2d restored the fast-fail path

Phase 4 — Claws-First Task Execution

10. Typed task packet format

Define a structured task packet with fields like:

objective
scope
repo/worktree
branch policy
acceptance tests
commit policy
reporting contract
escalation policy

Acceptance:

claws can dispatch work without relying on long natural-language prompt blobs alone
task packets can be logged, retried, and transformed safely

11. Policy engine for autonomous coding

Encode automation rules such as:

if green + scoped diff + review passed -> merge to dev
if stale branch -> merge-forward before broad tests
if startup blocked -> recover once, then escalate
if lane completed -> emit closeout and cleanup session

Acceptance:

doctrine moves from chat instructions into executable rules

12. Claw-native dashboards / lane board

Expose a machine-readable board of:

repos
active claws
worktrees
branch freshness
red/green state
current blocker
merge readiness
last meaningful event

Acceptance:

claws can query status directly
human-facing views become a rendering layer, not the source of truth

Phase 5 — Plugin and MCP Lifecycle Maturity

13. First-class plugin/MCP lifecycle contract

Each plugin/MCP integration should expose:

config validation contract
startup healthcheck
discovery result
degraded-mode behavior
shutdown/cleanup contract

Acceptance:

partial-startup and per-server failures are reported structurally
successful servers remain usable even when one server fails

14. MCP end-to-end lifecycle parity

Close gaps from:

config load
server registration
spawn/connect
initialize handshake
tool/resource discovery
invocation path
error surfacing
shutdown/cleanup

Acceptance:

parity harness and runtime tests cover healthy and degraded startup cases
broken servers are surfaced as structured failures, not opaque warnings

Immediate Backlog (from current real pain)

Priority order: P0 = blocks CI/green state, P1 = blocks integration wiring, P2 = clawability hardening, P3 = swarm-efficiency improvements.

P0 — Fix first (CI reliability)

Isolate render_diff_report tests into tmpdir — done: render_diff_report_for() tests run in temp git repos instead of the live working tree, and targeted cargo test -p rusty-claude-cli render_diff_report -- --nocapture now stays green during branch/worktree activity
Expand GitHub CI from single-crate coverage to workspace-grade verification — done: .github/workflows/rust-ci.yml now runs cargo test --workspace plus fmt/clippy at the workspace level
Add release-grade binary workflow — done: .github/workflows/release.yml now builds tagged Rust release artifacts for the CLI
Add container-first test/run docs — done: Containerfile + docs/container.md document the canonical Docker/Podman workflow for build, bind-mount, and cargo test --workspace usage
Surface doctor / preflight diagnostics in onboarding docs and help — done: README + USAGE now put claw doctor / /doctor in the first-run path and point at the built-in preflight report
Automate branding/source-of-truth residue checks in CI — done: .github/scripts/check_doc_source_of_truth.py and the doc-source-of-truth CI job now block stale repo/org/invite residue in tracked docs and metadata
Eliminate warning spam from first-run help/build path — done: current cargo run -q -p rusty-claude-cli -- --help renders clean help output without a warning wall before the product surface
Promote doctor from slash-only to top-level CLI entrypoint — done: claw doctor is now a local shell entrypoint with regression coverage for direct help and health-report output
Make machine-readable status commands actually machine-readable — done: claw --output-format json status and claw --output-format json sandbox now emit structured JSON snapshots instead of prose tables
Unify legacy config/skill namespaces in user-facing output — done: skills/help JSON/text output now present .claw as the canonical namespace and collapse legacy roots behind .claw-shaped source ids/labels
Honor JSON output on inventory commands like skills and mcp — done: direct CLI inventory commands now honor --output-format json with structured payloads for both skills and MCP inventory
Audit --output-format contract across the whole CLI surface — done: direct CLI commands now honor deterministic JSON/text handling across help/version/status/sandbox/agents/mcp/skills/bootstrap-plan/system-prompt/init/doctor, with regression coverage in output_format_contract.rs and resumed /status JSON coverage

P1 — Next (integration wiring, unblocks verification) 2. Add cross-module integration tests — done: 12 integration tests covering worker→recovery→policy, stale_branch→policy, green_contract→policy, reconciliation flows 3. Wire lane-completion emitter — done: lane_completion module with detect_lane_completion() auto-sets LaneContext::completed from session-finished + tests-green + push-complete → policy closeout 4. Wire SummaryCompressor into the lane event pipeline — done: compress_summary_text() feeds into LaneEvent::Finished detail field in tools/src/lib.rs

P2 — Clawability hardening (original backlog) 5. Worker readiness handshake + trust resolution — done: WorkerStatus state machine with Spawning → TrustRequired → ReadyForPrompt → PromptAccepted → Running lifecycle, trust_auto_resolve + trust_gate_cleared gating 6. Prompt misdelivery detection and recovery — done: prompt_delivery_attempts counter, PromptMisdelivery event detection, auto_recover_prompt_misdelivery + replay_prompt recovery arm 7. Canonical lane event schema in clawhip — done: LaneEvent enum with Started/Blocked/Failed/Finished variants, LaneEvent::new() typed constructor, tools/src/lib.rs integration 8. Failure taxonomy + blocker normalization — done: WorkerFailureKind enum (TrustGate/PromptDelivery/Protocol/Provider), FailureScenario::from_worker_failure_kind() bridge to recovery recipes 9. Stale-branch detection before workspace tests — done: stale_branch.rs module with freshness detection, behind/ahead metrics, policy integration 10. MCP structured degraded-startup reporting — done: McpManager degraded-startup reporting (+183 lines in mcp_stdio.rs), failed server classification (startup/handshake/config/partial), structured failed_servers + recovery_recommendations in tool output 11. Structured task packet format — done: task_packet.rs module with TaskPacket struct, validation, serialization, TaskScope resolution (workspace/module/single-file/custom), integrated into tools/src/lib.rs 12. Lane board / machine-readable status API — done: Lane completion hardening + LaneContext::completed auto-detection + MCP degraded reporting surface machine-readable state 13. Session completion failure classification — done: WorkerFailureKind::Provider + observe_completion() + recovery recipe bridge landed 14. Config merge validation gap — done: config.rs hook validation before deep-merge (+56 lines), malformed entries fail with source-path context instead of merged parse errors 15. MCP manager discovery flaky test — done: manager_discovery_report_keeps_healthy_servers_when_one_server_fails now runs as a normal workspace test again after repeated stable passes, so degraded-startup coverage is no longer hidden behind #[ignore]

Commit provenance / worktree-aware push events — done: LaneCommitProvenance now carries branch/worktree/canonical-commit/supersession metadata in lane events, and dedupe_superseded_commit_events() is applied before agent manifests are written so superseded commit events collapse to the latest canonical lineage
Orphaned module integration audit — done: runtime now keeps session_control and trust_resolver behind #[cfg(test)] until they are wired into a real non-test execution path, so normal builds no longer advertise dead clawability surface area.
Context-window preflight gap — done: provider request sizing now emits context_window_blocked before oversized requests leave the process, using a model-context registry instead of the old naive max-token heuristic.
Subcommand help falls through into runtime/API path — done: claw doctor --help, claw status --help, claw sandbox --help, and nested mcp/skills help are now intercepted locally without runtime/provider startup, with regression tests covering the direct CLI paths.
Session state classification gap (working vs blocked vs finished vs truly stale) — done: agent manifests now derive machine states such as working, blocked_background_job, blocked_merge_conflict, degraded_mcp, interrupted_transport, finished_pending_report, and finished_cleanable, and terminal-state persistence records commit provenance plus derived state so downstream monitoring can distinguish quiet progress from truly idle sessions.
Resumed /status JSON parity gap — dogfooding shows fresh claw status --output-format json now emits structured JSON, but resumed slash-command status still leaks through a text-shaped path in at least one dispatch path. Local CI-equivalent repro fails rust/crates/rusty-claude-cli/tests/resume_slash_commands.rs::resumed_status_command_emits_structured_json_when_requested with expected value at line 1 column 1, so resumed automation can receive text where JSON was explicitly requested. Action: unify fresh vs resumed /status rendering through one output-format contract and add regression coverage so resumed JSON output is guaranteed valid.
Opaque failure surface for session/runtime crashes — done: safe_failure_class() in error.rs classifies all API errors into 8 user-safe classes (provider_auth, provider_internal, provider_retry_exhausted, provider_rate_limit, provider_transport, provider_error, context_window, runtime_io). format_user_visible_api_error in main.rs attaches session ID + request trace ID to every user-visible error. Coverage in opaque_provider_wrapper_surfaces_failure_class_session_and_trace and 3 related tests.
doctor --output-format json check-level structure gap — done: claw doctor --output-format json now keeps the human-readable message/report while also emitting structured per-check diagnostics (name, status, summary, details, plus typed fields like workspace paths and sandbox fallback data), with regression coverage in output_format_contract.rs.
Plugin lifecycle init/shutdown test flakes under workspace-parallel execution — dogfooding surfaced that build_runtime_runs_plugin_lifecycle_init_and_shutdown can fail under cargo test --workspace while passing in isolation because sibling tests race on tempdir-backed shell init script paths. This is test brittleness rather than a code-path regression, but it still destabilizes CI confidence and wastes diagnosis cycles. Action: isolate temp resources per test robustly (unique dirs + no shared cwd assumptions), audit cleanup timing, and add a regression guard so the plugin lifecycle test remains stable under parallel workspace execution.
plugins::hooks::collects_and_runs_hooks_from_enabled_plugins flaked on Linux CI, root cause was a stdin-write race not missing exec bit — done at 172a2ad on 2026-04-08. Dogfooding reproduced this four times on main (CI runs 24120271422, 24120538408, 24121392171, 24121776826), escalating from first-attempt-flake to deterministic-red on the third push. Failure mode was PostToolUse hook .../hooks/post.sh failed to start for "Read": Broken pipe (os error 32) surfacing from HookRunResult. Initial diagnosis was wrong. The first theory (documented in earlier revisions of this entry and in the root-cause note on commit 79da4b8) was that write_hook_plugin in rust/crates/plugins/src/hooks.rs was writing the generated .sh files without the execute bit and Command::new(path).spawn() was racing on fork/exec. An initial chmod-only fix at 4f7b674 was shipped against that theory and still failed CI on run 24121776826 with the same Broken pipe symptom, falsifying the chmod-only hypothesis. Actual root cause. CommandWithStdin::output_with_stdin in rust/crates/plugins/src/hooks.rs was unconditionally propagating write_all errors on the child's stdin pipe, including std::io::ErrorKind::BrokenPipe. The test hook scripts run in microseconds (#!/bin/sh + a single printf), so the child exits and closes its stdin before the parent finishes writing the ~200-byte JSON hook payload. On Linux the pipe raises EPIPE immediately; on macOS the pipe happens to buffer the small payload before the child exits, which is why the race only surfaced on ubuntu CI runners. The parent's write_all returned Err(BrokenPipe), output_with_stdin returned that as a hook failure, and run_command classified the hook as "failed to start" even though the child had already run to completion and printed the expected message to stdout. Fix (commit 172a2ad, force-pushed over 4f7b674). Three parts: (1) actual fix — output_with_stdin now matches the write_all result and swallows BrokenPipe specifically, while propagating all other write errors unchanged; after a BrokenPipe swallow the code still calls wait_with_output() so stdout/stderr/exit code are still captured from the cleanly-exited child. (2) hygiene hardening — a new make_executable helper sets mode 0o755 on each generated .sh via std::os::unix::fs::PermissionsExt under #[cfg(unix)]. This is defense-in-depth for future non-sh hook runners, not the bug that was biting CI. (3) regression guard — new generated_hook_scripts_are_executable test under #[cfg(unix)] asserts each generated .sh file has at least one execute bit set (mode & 0o111 != 0) so future tweaks cannot silently regress the hygiene change. Verification. cargo test --release -p plugins 35 passing, fmt clean, clippy -D warnings clean; CI run 24121999385 went green on first attempt on main for the hotfix commit. Meta-lesson. Broken pipe (os error 32) from a child-process spawn path is ambiguous between "could not exec" and "exec'd and exited before the parent finished writing stdin." The first theory cargo-culted the "could not exec" reading because the ROADMAP scaffolding anchored on the exec-bit guess; falsification came from empirical CI, not from code inspection. Record the pattern: when a pipe error surfaces on fork/exec, instrument what wait_with_output() actually reports on the child before attributing the failure to a permissions or path issue.
Resumed local-command JSON parity gap — done: direct claw --output-format json already had structured renderers for sandbox, mcp, skills, version, and init, but resumed claw --output-format json --resume <session> /… paths still fell back to prose because resumed slash dispatch only emitted JSON for /status. Resumed /sandbox, /mcp, /skills, /version, and /init now reuse the same JSON envelopes as their direct CLI counterparts, with regression coverage in rust/crates/rusty-claude-cli/tests/resume_slash_commands.rs and rust/crates/rusty-claude-cli/tests/output_format_contract.rs.
dev/rust cargo test -p rusty-claude-cli reads host ~/.claude/plugins/installed/ from real $HOME and fails parse-time on any half-installed user plugin — dogfooding on 2026-04-08 (filed from gaebal-gajae's clawhip bullet at message 1491322807026454579 after the provider-matrix branch QA surfaced it) reproduced 11 deterministic failures on clean dev/rust HEAD of the form panicked at crates/rusty-claude-cli/src/main.rs:3953:31: args should parse: "hook path \/Users/yeongyu/.claude/plugins/installed/sample-hooks-bundled/./hooks/pre.sh` does not exist; hook path `...\post.sh` does not exist"coveringparses_prompt_subcommand, parses_permission_mode_flag, defaults_to_repl_when_no_args, parses_resume_flag_with_slash_command, parses_system_prompt_options, parses_bare_prompt_and_json_output_flag, rejects_unknown_allowed_tools, parses_resume_flag_with_multiple_slash_commands, resolves_model_aliases_in_args, parses_allowed_tools_flags_with_aliases_and_lists, parses_login_and_logout_subcommands. **Same failures do NOT reproduce on main** (re-verified with cargo test --release -p rusty-claude-cliagainstmainHEAD79da4b8, all 156 tests pass). **Root cause is two-layered.** First, on dev/rust parse_argseagerly walks user-installed plugin manifests under~~/.claude/plugins/installed/and validates that every declared hook script exists on disk before returning aCliAction, so any half-installed plugin in the developer's real $HOME(in this case~~/.claude/plugins/installed/sample-hooks-bundled/whose.claude-pluginmanifest references./hooks/pre.shand./hooks/post.shbut whosehooks/subdirectory was deleted) makes argv parsing itself fail. Second, the test harness ondev/rustdoes not redirect$HOMEorXDG_CONFIG_HOMEto a fixture for the duration of the test — there is noenv_lock-style guard equivalent to the one main already uses (grep -n env_lock rust/crates/rusty-claude-cli/src/main.rsreturns 0 hits ondev/rustand 30+ hits onmain). Together those two gaps mean dev/rust cargo test -p rusty-claude-cliis non-deterministic on every clean clone whose owner happens to have any non-pristine plugin in~/.claude/. **Action (two parts).** (a) Backport the env_lock-based test isolation pattern from mainintodev/rust's rusty-claude-clitest module so each test runs against a temp$HOME/XDG_CONFIG_HOMEand cannot read host plugin state. (b) Decoupleparse_argsfrom filesystem hook validation ondev/rust(the same decoupling already onmain, where hook validation happens later in the lifecycle than argv parsing) so even outside tests a partially installed user plugin cannot break basic CLI invocation. **Branch scope.** This is a dev/rustcatchup againstmain, not a main` regression. Tracking it here so the dev/rust merge train picks it up before the next dev/rust release rather than rediscovering it in CI.
Auth-provider truth: error copy fails real users at the env-var-vs-header layer — dogfooded live on 2026-04-08 in #claw-code (Sisyphus Labs guild), two separate new users hit adjacent failure modes within minutes of each other that both trace back to the same root: the MissingApiKey / 401 error surface does not teach users how the auth inputs map to HTTP semantics, so a user who sets a "reasonable-looking" env var still hits a hard error with no signpost. Case 1 (varleg, Norway). Wanted to use OpenRouter via the OpenAI-compat path. Found a comparison table claiming "provider-agnostic (Claude, OpenAI, local models)" and assumed it Just Worked. Set OPENAI_API_KEY to an OpenRouter sk-or-v1-... key and a model name without an openai/ prefix; claw's provider detection fell through to Anthropic first because ANTHROPIC_API_KEY was still in the environment. Unsetting ANTHROPIC_API_KEY got them ANTHROPIC_AUTH_TOKEN or ANTHROPIC_API_KEY is not set instead of a useful hint that the OpenAI path was right there. Fix delivered live as a channel reply: use main branch (not dev/rust), export OPENAI_BASE_URL=https://openrouter.ai/api/v1 alongside OPENAI_API_KEY, and prefix the model name with openai/ so the prefix router wins over env-var presence. Case 2 (stanley078852). Had set ANTHROPIC_AUTH_TOKEN="sk-ant-..." and was getting 401 Invalid bearer token from Anthropic. Root cause: sk-ant- keys are x-api-key-header keys, not bearer tokens. ANTHROPIC_API_KEY path in anthropic.rs sends the value as x-api-key; ANTHROPIC_AUTH_TOKEN path sends it as Authorization: Bearer (for OAuth access tokens from claw login). Setting an sk-ant- key in the wrong env var makes claw send it as Bearer sk-ant-... which Anthropic rejects at the edge with 401 before it ever reaches the completions endpoint. The error text propagated all the way to the user (api returned 401 Unauthorized (authentication_error) ... Invalid bearer token) with zero signal that the problem was env-var choice, not key validity. Fix delivered live as a channel reply: move the sk-ant-... key to ANTHROPIC_API_KEY and unset ANTHROPIC_AUTH_TOKEN. Pattern. Both cases are failures at the auth-intent translation layer: the user chose an env var that made syntactic sense to them (OPENAI_API_KEY for OpenAI, ANTHROPIC_AUTH_TOKEN for Anthropic auth) but the actual wire-format routing requires a more specific choice. The error messages surface the HTTP-layer symptom (401, missing-key) without bridging back to "which env var should you have used and why." Action. Three concrete improvements, scoped for a single main-side PR: (a) In ApiError::MissingCredentials Display, when the Anthropic path is the one being reported but OPENAI_API_KEY, XAI_API_KEY, or DASHSCOPE_API_KEY are present in the environment, extend the message with "— but I see $OTHER_KEY set; if you meant to use that provider, prefix your model name with openai/, grok, or qwen/ respectively so prefix routing selects it." (b) In the 401-from-Anthropic error path in anthropic.rs, when the failing auth source is BearerToken AND the bearer token starts with sk-ant-, append "— looks like you put an sk-ant-* API key in ANTHROPIC_AUTH_TOKEN, which is the Bearer-header path. Move it to ANTHROPIC_API_KEY instead (that env var maps to x-api-key, which is the correct header for sk-ant-* keys)." Same treatment for OAuth access tokens landing in ANTHROPIC_API_KEY (symmetric mis-assignment). (c) In rust/README.md on main and the matrix section on dev/rust, add a short "Which env var goes where" paragraph mapping sk-ant-* → ANTHROPIC_API_KEY and OAuth access token → ANTHROPIC_AUTH_TOKEN, with the one-line explanation of x-api-key vs Authorization: Bearer. Verification path. Both improvements can be tested with unit tests against ApiError::fmt output (the prefix-routing hint) and with a targeted integration test that feeds an sk-ant-*-shaped token into BearerToken and asserts the fmt output surfaces the correction hint (no HTTP call needed). Source. Live users in #claw-code at 1491328554598924389 (varleg) and 1491329840706486376 (stanley078852) on 2026-04-08. Partial landing (ff1df4c). Action parts (a), (b), (c) shipped on main: MissingCredentials now carries an optional hint field and renders adjacent-provider signals, Anthropic 401 + sk-ant-* bearer gets a correction hint, USAGE.md has a "Which env var goes where" section. BUT the copy fix only helps users who fell through to the Anthropic auth path by accident — it does NOT fix the underlying routing bug where the CLI instantiates AnthropicRuntimeClient unconditionally and ignores prefix routing at the runtime-client layer. That deeper routing gap is tracked separately as #29 below and was filed within hours of #28 landing when live users still hit missing Anthropic credentials with --model openai/gpt-4 and all ANTHROPIC_* env vars unset.
CLI provider dispatch is hardcoded to Anthropic, ignoring prefix routing — dogfooded live on 2026-04-08 within hours of ROADMAP #28 landing. Users in #claw-code (nicma at 1491342350960562277, Jengro at 1491345009021030533) followed the exact "use main, set OPENAI_API_KEY and OPENAI_BASE_URL, unset ANTHROPIC_*, prefix the model with openai/" checklist from the #28 error-copy improvements AND STILL hit error: missing Anthropic credentials; export ANTHROPIC_AUTH_TOKEN or ANTHROPIC_API_KEY before calling the Anthropic API. Reproduction on main HEAD ff1df4c: unset ANTHROPIC_API_KEY ANTHROPIC_AUTH_TOKEN; export OPENAI_API_KEY=sk-...; export OPENAI_BASE_URL=https://api.openai.com/v1; claw --model openai/gpt-4 prompt 'test' → reproduces the error deterministically. Root cause (traced). rust/crates/rusty-claude-cli/src/main.rs at build_runtime_with_plugin_state (line ~6221) unconditionally builds AnthropicRuntimeClient::new(session_id, model, ...) without consulting providers::detect_provider_kind(&model). BuiltRuntime at line ~2855 is statically typed as ConversationRuntime<AnthropicRuntimeClient, CliToolExecutor>, so even if the dispatch logic existed there would be nowhere to slot an alternative client. providers/mod.rs::metadata_for_model correctly identifies openai/gpt-4 as ProviderKind::OpenAi at the metadata layer — the routing decision is computed correctly, it's just never used to pick a runtime client. The result is that the CLI is structurally single-provider (Anthropic only) even though the api crate's openai_compat.rs, XAI_ENV_VARS, DASHSCOPE_ENV_VARS, and send_message_streaming all exist and are exercised by unit tests inside the api crate. The provider matrix in rust/README.md is misleading because it describes the api-crate capabilities, not the CLI's actual dispatch behaviour. Why #28 didn't catch this. ROADMAP #28 focused on the MissingCredentials error message (adding hints when adjacent provider env vars are set, or when a bearer token starts with sk-ant-*). None of its tests exercised the build_runtime code path — they were all unit tests against ApiError::fmt output. The routing bug survives #28 because the Display improvements fire AFTER the hardcoded Anthropic client has already been constructed and failed. You need the CLI to dispatch to a different client in the first place for the new hints to even surface at the right moment. Action (single focused commit). (1) New OpenAiCompatRuntimeClient struct in rust/crates/rusty-claude-cli/src/main.rs mirroring AnthropicRuntimeClient but delegating to openai_compat::send_message_streaming. One client type handles OpenAI, xAI, DashScope, and any OpenAI-compat endpoint — they differ only in base URL and auth env var, both of which come from the ProviderMetadata returned by metadata_for_model. (2) New enum DynamicApiClient { Anthropic(AnthropicRuntimeClient), OpenAiCompat(OpenAiCompatRuntimeClient) } that implements runtime::ApiClient by matching on the variant and delegating. (3) Retype BuiltRuntime from ConversationRuntime<AnthropicRuntimeClient, CliToolExecutor> to ConversationRuntime<DynamicApiClient, CliToolExecutor>, update the Deref/DerefMut/new spots. (4) In build_runtime_with_plugin_state, call detect_provider_kind(&model) and construct either variant of DynamicApiClient. Prefix routing wins over env-var presence (that's the whole point). (5) Integration test using a mock OpenAI-compat server (reuse mock_parity_harness pattern from crates/api/tests/) that feeds claw --model openai/gpt-4 prompt 'test' with OPENAI_BASE_URL pointed at the mock and no ANTHROPIC_* env vars, asserts the request reaches the mock, and asserts the response round-trips as an AssistantEvent. (6) Unit test that build_runtime_with_plugin_state with model="openai/gpt-4" returns a BuiltRuntime whose inner client is the DynamicApiClient::OpenAiCompat variant. Verification. cargo test --workspace, cargo fmt --all, cargo clippy --workspace. Source. Live users nicma (1491342350960562277) and Jengro (1491345009021030533) in #claw-code on 2026-04-08, within hours of #28 landing.
Phantom completions root cause: global session store has no per-worktree isolation —

Root cause. The session store under ~/.local/share/opencode is global to the host. Every opencode serve instance — including the parallel lane workers spawned per worktree — reads and writes the same on-disk session directory. Sessions are keyed only by id and timestamp, not by the workspace they were created in, so there is no structural barrier between a session created in worktree /tmp/b4-phantom-diag and one created in /tmp/b4-omc-flat. Whichever serve instance picks up a given session id can drive it from whatever CWD that serve happens to be running in.

Impact. Parallel lanes silently cross wires. A lane reports a clean run — file edits, builds, tests — and the orchestrator marks the lane green, but the writes were applied against another worktree's CWD because a sibling opencode serve won the session race. The originating worktree shows no diff, the other worktree gains unexplained edits, and downstream consumers (clawhip lane events, PR pushes, merge gates) treat the empty originator as a successful no-op. These are the "phantom completions" we keep chasing: success messaging without any landed changes in the lane that claimed them, plus stray edits in unrelated lanes whose own runs never touched those files. Because the report path is happy, retries and recovery recipes never fire, so the lane silently wedges until a human notices the diff is empty.

Proposed fix. Bind every session to its workspace root + branch at creation time and refuse to drive it from any other CWD.
- At session creation, capture the canonical workspace root (resolved git worktree path) and the active branch and persist them on the session record.
- On every load (opencode serve, slash-command resume, lane recovery), validate that the current process CWD matches the persisted workspace root before any tool with side effects (file_ops, bash, git) is allowed to run. Mismatches surface as a typed WorkspaceMismatch failure class instead of silently writing to the wrong tree.
- Namespace the on-disk session path under the workspace fingerprint (e.g. <session_store>/<workspace_hash>/<session_id>) so two parallel opencode serve instances physically cannot collide on the same session id.
- Forks inherit the parent's workspace root by default; an explicit re-bind is required to move a session to a new worktree, and that re-bind is itself recorded as a structured event so the orchestrator can audit cross-worktree handoffs.
- Surface a branch.workspace_mismatch lane event so clawhip stops counting wrong-CWD writes as lane completions.
Status. A workspace_root field has been added to Session in rust/crates/runtime/src/session.rs (with builder, accessor, JSON + JSONL round-trip, fork inheritance, and given/when/then test coverage in persists_workspace_root_round_trip_and_forks_inherit_it). The CWD validation, the namespaced on-disk path, and the branch.workspace_mismatch lane event are still outstanding and tracked under this item.

P3 — Swarm efficiency 13. Swarm branch-lock protocol — done: branch_lock::detect_branch_lock_collisions() now detects same-branch/same-scope and nested-module collisions before parallel lanes drift into duplicate implementation 14. Commit provenance / worktree-aware push events — done: lane event provenance now includes branch/worktree/superseded/canonical lineage metadata, and manifest persistence de-dupes superseded commit events before downstream consumers render them

Suggested Session Split

Session A — worker boot protocol

Focus:

trust prompt detection
ready-for-prompt handshake
prompt misdelivery detection

Session B — clawhip lane events

Focus:

canonical lane event schema
failure taxonomy
summary compression

Session C — branch/test intelligence

Focus:

stale-branch detection
green-level contract
recovery recipes

Session D — MCP lifecycle hardening

Focus:

startup/handshake reliability
structured failed server reporting
degraded-mode runtime behavior
lifecycle tests/harness coverage

Session E — typed task packets + policy engine

Focus:

structured task format
retry/merge/escalation rules
autonomous lane closure behavior

MVP Success Criteria

We should consider claw-code materially more clawable when:

a claw can start a worker and know with certainty when it is ready
claws no longer accidentally type tasks into the shell
stale-branch failures are identified before they waste debugging time
clawhip reports machine states, not just tmux prose
MCP/plugin startup failures are classified and surfaced cleanly
a coding lane can self-recover from common startup and branch issues without human babysitting

Short Version

claw-code should evolve from:

a CLI a human can also drive

to:

a claw-native execution runtime
an event-native orchestration substrate
a plugin/hook-first autonomous coding harness

Deployment Architecture Gap (filed from dogfood 2026-04-08)

WorkerState is in the runtime; /state is NOT in opencode serve

Root cause discovered during batch 8 dogfood.

worker_boot.rs has a solid WorkerStatus state machine (Spawning → TrustRequired → ReadyForPrompt → Running → Finished/Failed). It is exported from runtime/src/lib.rs as a public API. But claw-code is a plugin loaded inside the opencode binary — it cannot add HTTP routes to opencode serve. The HTTP server is 100% owned by the upstream opencode process (v1.3.15).

Impact: There is no way to curl localhost:4710/state and get back a JSON WorkerStatus. Any such endpoint would require either:

Upstreaming a /state route into opencode's HTTP server (requires a PR to sst/opencode), or
Writing a sidecar HTTP process that queries the WorkerRegistry in-process (possible but fragile), or
Writing WorkerStatus to a well-known file path (.claw/worker-state.json) that an external observer can poll.

Recommended path: Option 3 — emit WorkerStatus transitions to .claw/worker-state.json on every state change. This is purely within claw-code's plugin scope, requires no upstream changes, and gives clawhip a file it can poll to distinguish a truly stalled worker from a quiet-but-progressing one.

Action item: Wire WorkerRegistry::transition() to atomically write .claw/worker-state.json on every state transition. Add a claw state CLI subcommand that reads and prints this file. Add regression test.

Prior session note: A previous session summary claimed commit 0984cca landed a /state HTTP endpoint via axum. This was incorrect — no such commit exists on main, axum is not a dependency, and the HTTP server is not ours. The actual work that exists: worker_boot.rs with WorkerStatus enum + WorkerRegistry, fully wired into runtime/src/lib.rs as public exports.

Startup Friction Gap: No Default trusted_roots in Settings (filed 2026-04-08)

Every lane starts with manual trust babysitting unless caller explicitly passes roots

Root cause discovered during direct dogfood of WorkerCreate tool.

WorkerCreate accepts a trusted_roots: Vec<String> parameter. If the caller omits it (or passes []), every new worker immediately enters TrustRequired and stalls — requiring manual intervention to advance to ReadyForPrompt. There is no mechanism to configure a default allowlist in settings.json or .claw/settings.json.

Impact: Batch tooling (clawhip, lane orchestrators) must pass trusted_roots explicitly on every WorkerCreate call. If a batch script forgets the field, all workers in that batch stall silently at trust_required. This was the root cause of several "batch 8 lanes not advancing" incidents.

Recommended fix:

Add a trusted_roots field to RuntimeConfig (or a nested [trust] table), loaded via ConfigLoader.
In WorkerRegistry::spawn_worker(), merge config-level trusted_roots with any per-call overrides.
Default: empty list (safest). Users opt in by adding their repo paths to settings.
Update config_validate schema with the new field.

Action item: Wire RuntimeConfig::trusted_roots() → WorkerRegistry::spawn_worker() default. Cover with test: config with trusted_roots = ["/tmp"] → spawning worker in /tmp/x auto-resolves trust without caller passing the field.

Observability Transport Decision (filed 2026-04-08)

Canonical state surface: CLI/file-based. HTTP endpoint deferred.

Decision: claw state reading .claw/worker-state.json is the blessed observability contract for clawhip and downstream tooling. This is not a stepping-stone — it is the supported surface. Build against it.

Rationale:

claw-code is a plugin running inside the opencode binary. It cannot add HTTP routes to opencode serve — that server belongs to upstream sst/opencode.
The file-based surface is fully within plugin scope: emit_state_file() in worker_boot.rs writes atomically on every WorkerStatus transition.
claw state --output-format json gives clawhip everything it needs: status, is_ready, seconds_since_update, trust_gate_cleared, last_event, updated_at.
Polling a local file has lower latency and fewer failure modes than an HTTP round-trip to a sidecar.
An HTTP state endpoint would require either (a) upstreaming a route to sst/opencode — a multi-week PR cycle with no guarantee of acceptance — or (b) a sidecar process that queries WorkerRegistry in-process, which is fragile and adds an extra failure domain.

What downstream tooling (clawhip) should do:

After WorkerCreate, poll .claw/worker-state.json (or run claw state --output-format json) in the worker's CWD at whatever interval makes sense (e.g. 5s).
Trust seconds_since_update > 60 in trust_required status as the stall signal.
Call WorkerResolveTrust tool to unblock, or WorkerRestart to reset.

HTTP endpoint tracking: Not scheduled. If a concrete use case emerges that file polling cannot serve (e.g. remote workers over a network boundary), open a new issue to upstream a /worker/state route to sst/opencode at that time. Until then: file/CLI is canonical.

Provider Routing: Model-Name Prefix Must Win Over Env-Var Presence (fixed 2026-04-08, `0530c50`)

`openai/gpt-4.1-mini` was silently misrouted to Anthropic when ANTHROPIC_API_KEY was set

Root cause: metadata_for_model returned None for any model not matching claude or grok prefix. detect_provider_kind then fell through to auth-sniffer order: first has_auth_from_env_or_saved() (Anthropic), then OPENAI_API_KEY, then XAI_API_KEY.

If ANTHROPIC_API_KEY was present in the environment (e.g. user has both Anthropic and OpenRouter configured), any unknown model — including explicitly namespaced ones like openai/gpt-4.1-mini — was silently routed to the Anthropic client, which then failed with missing Anthropic credentials or a confusing 402/auth error rather than routing to OpenAI-compatible.

Fix: Added explicit prefix checks in metadata_for_model:

openai/ prefix → ProviderKind::OpenAi
gpt- prefix → ProviderKind::OpenAi

Model name prefix now wins unconditionally over env-var presence. Regression test locked in: providers::tests::openai_namespaced_model_routes_to_openai_not_anthropic.

Lesson: Auth-sniffer fallback order is fragile. Any new provider added in the future should be registered in metadata_for_model via a model-name prefix, not left to env-var order. This is the canonical extension point.

46 KiB Raw Blame History

ROADMAP.md

Clawable Coding Harness Roadmap

Goal

Definition of "clawable"

Current Pain Points

1. Session boot is fragile

2. Truth is split across layers

3. Events are too log-shaped

4. Recovery loops are too manual

5. Branch freshness is not enforced enough

6. Plugin/MCP failures are under-classified

7. Human UX still leaks into claw workflows

Product Principles

Roadmap

Phase 1 — Reliable Worker Boot

1. Ready-handshake lifecycle for coding workers

2. Trust prompt resolver

3. Structured session control API

Phase 2 — Event-Native Clawhip Integration

4. Canonical lane event schema

5. Failure taxonomy

6. Actionable summary compression

Phase 3 — Branch/Test Awareness and Auto-Recovery

7. Stale-branch detection before broad verification

8. Recovery recipes for common failures

9. Green-ness contract

Phase 4 — Claws-First Task Execution

10. Typed task packet format

11. Policy engine for autonomous coding

12. Claw-native dashboards / lane board

Phase 5 — Plugin and MCP Lifecycle Maturity

13. First-class plugin/MCP lifecycle contract

14. MCP end-to-end lifecycle parity

Immediate Backlog (from current real pain)

Suggested Session Split

Session A — worker boot protocol

Session B — clawhip lane events

Session C — branch/test intelligence

Session D — MCP lifecycle hardening

Session E — typed task packets + policy engine

MVP Success Criteria

Short Version

Deployment Architecture Gap (filed from dogfood 2026-04-08)

WorkerState is in the runtime; /state is NOT in opencode serve

Startup Friction Gap: No Default trusted_roots in Settings (filed 2026-04-08)

Every lane starts with manual trust babysitting unless caller explicitly passes roots

Observability Transport Decision (filed 2026-04-08)

Canonical state surface: CLI/file-based. HTTP endpoint deferred.

Provider Routing: Model-Name Prefix Must Win Over Env-Var Presence (fixed 2026-04-08, 0530c50)

openai/gpt-4.1-mini was silently misrouted to Anthropic when ANTHROPIC_API_KEY was set

46 KiB

Raw Blame History

Provider Routing: Model-Name Prefix Must Win Over Env-Var Presence (fixed 2026-04-08, `0530c50`)

`openai/gpt-4.1-mini` was silently misrouted to Anthropic when ANTHROPIC_API_KEY was set