diff --git a/.gitignore b/.gitignore index cfa4cf3..8cb4333 100644 --- a/.gitignore +++ b/.gitignore @@ -5,3 +5,7 @@ archive/ # Claude Code local artifacts .claude/settings.local.json .claude/sessions/ +# Claw Code local artifacts +.claw/settings.local.json +.claw/sessions/ +status-help.txt diff --git a/ROADMAP.md b/ROADMAP.md index 400c292..8187843 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -1,5 +1,3 @@ -# ROADMAP.md - # Clawable Coding Harness Roadmap ## Goal @@ -299,6 +297,7 @@ Priority order: P0 = blocks CI/green state, P1 = blocks integration wiring, P2 = 12. Audit `--output-format` contract across the whole CLI surface — **done**: direct CLI commands now honor deterministic JSON/text handling across help/version/status/sandbox/agents/mcp/skills/bootstrap-plan/system-prompt/init/doctor, with regression coverage in `output_format_contract.rs` and resumed `/status` JSON coverage **P1 — Next (integration wiring, unblocks verification)** +1. Worker readiness handshake + trust resolution — **done**: `WorkerStatus` state machine with `Spawning` → `TrustRequired` → `ReadyForPrompt` → `PromptAccepted` → `Running` lifecycle, `trust_auto_resolve` + `trust_gate_cleared` gating 2. Add cross-module integration tests — **done**: 12 integration tests covering worker→recovery→policy, stale_branch→policy, green_contract→policy, reconciliation flows 3. Wire lane-completion emitter — **done**: `lane_completion` module with `detect_lane_completion()` auto-sets `LaneContext::completed` from session-finished + tests-green + push-complete → policy closeout 4. Wire `SummaryCompressor` into the lane event pipeline — **done**: `compress_summary_text()` feeds into `LaneEvent::Finished` detail field in `tools/src/lib.rs` @@ -325,12 +324,11 @@ Priority order: P0 = blocks CI/green state, P1 = blocks integration wiring, P2 = 22. **Opaque failure surface for session/runtime crashes** — **done**: `safe_failure_class()` in `error.rs` classifies all API errors into 8 user-safe classes (`provider_auth`, `provider_internal`, `provider_retry_exhausted`, `provider_rate_limit`, `provider_transport`, `provider_error`, `context_window`, `runtime_io`). `format_user_visible_api_error` in `main.rs` attaches session ID + request trace ID to every user-visible error. Coverage in `opaque_provider_wrapper_surfaces_failure_class_session_and_trace` and 3 related tests. 23. **`doctor --output-format json` check-level structure gap** — **done**: `claw doctor --output-format json` now keeps the human-readable `message`/`report` while also emitting structured per-check diagnostics (`name`, `status`, `summary`, `details`, plus typed fields like workspace paths and sandbox fallback data), with regression coverage in `output_format_contract.rs`. 24. **Plugin lifecycle init/shutdown test flakes under workspace-parallel execution** — dogfooding surfaced that `build_runtime_runs_plugin_lifecycle_init_and_shutdown` can fail under `cargo test --workspace` while passing in isolation because sibling tests race on tempdir-backed shell init script paths. This is test brittleness rather than a code-path regression, but it still destabilizes CI confidence and wastes diagnosis cycles. **Action:** isolate temp resources per test robustly (unique dirs + no shared cwd assumptions), audit cleanup timing, and add a regression guard so the plugin lifecycle test remains stable under parallel workspace execution. -25. **`plugins::hooks::collects_and_runs_hooks_from_enabled_plugins` flaked on Linux CI, root cause was a stdin-write race not missing exec bit** — **done at `172a2ad` on 2026-04-08**. Dogfooding reproduced this four times on `main` (CI runs [24120271422](https://github.com/ultraworkers/claw-code/actions/runs/24120271422), [24120538408](https://github.com/ultraworkers/claw-code/actions/runs/24120538408), [24121392171](https://github.com/ultraworkers/claw-code/actions/runs/24121392171), [24121776826](https://github.com/ultraworkers/claw-code/actions/runs/24121776826)), escalating from first-attempt-flake to deterministic-red on the third push. Failure mode was `PostToolUse hook .../hooks/post.sh failed to start for "Read": Broken pipe (os error 32)` surfacing from `HookRunResult`. **Initial diagnosis was wrong.** The first theory (documented in earlier revisions of this entry and in the root-cause note on commit `79da4b8`) was that `write_hook_plugin` in `rust/crates/plugins/src/hooks.rs` was writing the generated `.sh` files without the execute bit and `Command::new(path).spawn()` was racing on fork/exec. An initial chmod-only fix at `4f7b674` was shipped against that theory and **still failed CI on run `24121776826`** with the same `Broken pipe` symptom, falsifying the chmod-only hypothesis. **Actual root cause.** `CommandWithStdin::output_with_stdin` in `rust/crates/plugins/src/hooks.rs` was unconditionally propagating `write_all` errors on the child's stdin pipe, including `std::io::ErrorKind::BrokenPipe`. The test hook scripts run in microseconds (`#!/bin/sh` + a single `printf`), so the child exits and closes its stdin before the parent finishes writing the ~200-byte JSON hook payload. On Linux the pipe raises `EPIPE` immediately; on macOS the pipe happens to buffer the small payload before the child exits, which is why the race only surfaced on ubuntu CI runners. The parent's `write_all` returned `Err(BrokenPipe)`, `output_with_stdin` returned that as a hook failure, and `run_command` classified the hook as "failed to start" even though the child had already run to completion and printed the expected message to stdout. **Fix (commit `172a2ad`, force-pushed over `4f7b674`).** Three parts: (1) **actual fix** — `output_with_stdin` now matches the `write_all` result and swallows `BrokenPipe` specifically, while propagating all other write errors unchanged; after a `BrokenPipe` swallow the code still calls `wait_with_output()` so stdout/stderr/exit code are still captured from the cleanly-exited child. (2) **hygiene hardening** — a new `make_executable` helper sets mode `0o755` on each generated `.sh` via `std::os::unix::fs::PermissionsExt` under `#[cfg(unix)]`. This is defense-in-depth for future non-sh hook runners, not the bug that was biting CI. (3) **regression guard** — new `generated_hook_scripts_are_executable` test under `#[cfg(unix)]` asserts each generated `.sh` file has at least one execute bit set (`mode & 0o111 != 0`) so future tweaks cannot silently regress the hygiene change. **Verification.** `cargo test --release -p plugins` 35 passing, fmt clean, clippy `-D warnings` clean; CI run [24121999385](https://github.com/ultraworkers/claw-code/actions/runs/24121999385) went green on first attempt on `main` for the hotfix commit. **Meta-lesson.** `Broken pipe (os error 32)` from a child-process spawn path is ambiguous between "could not exec" and "exec'd and exited before the parent finished writing stdin." The first theory cargo-culted the "could not exec" reading because the ROADMAP scaffolding anchored on the exec-bit guess; falsification came from empirical CI, not from code inspection. Record the pattern: when a pipe error surfaces on fork/exec, instrument what `wait_with_output()` actually reports on the child before attributing the failure to a permissions or path issue. +25. **`plugins::hooks::collects_and_runs_hooks_from_enabled_plugins` flaked on Linux CI, root cause was a stdin-write race not missing exec bit** — **done at `172a2ad` on 2026-04-08**. Dogfooding reproduced this four times on `main` (CI runs [24120271422](https://github.com/ultraworkers/claw-code/actions/runs/24120271422), [24120538408](https://github.com/ultraworkers/claw-code/actions/runs/24120538408), [24121392171](https://github.com/ultraworkers/claw-code/actions/runs/24121392171), [24121776826](https://github.com/ultraworkers/claw-code/actions/runs/24121776826)), escalating from first-attempt-flake to deterministic-red on the third push. Failure mode was `PostToolUse hook .../hooks/post.sh failed to start for "Read": Broken pipe (os error 32)` surfacing from `HookRunResult`. **Initial diagnosis was wrong.** The first theory (documented in earlier revisions of this entry and in the root-cause note on commit `79da4b8`) was that `write_hook_plugin` in `rust/crates/plugins/src/hooks.rs` was writing the generated `.sh` files without the execute bit and `Command::new(path).spawn()` was racing on fork/exec. An initial chmod-only fix at `4f7b674` was shipped against that theory and **still failed CI on run `24121776826`** with the same `Broken pipe` symptom, falsifying the chmod-only hypothesis. **Actual root cause.** `CommandWithStdin::output_with_stdin` in `rust/crates/plugins/src/hooks.rs` was unconditionally propagating `write_all` errors on the child's stdin pipe, including `std::io::ErrorKind::BrokenPipe`. The test hook scripts run in microseconds (`#!/bin/sh` + a single `printf`), so the child exits and closes its stdin before the parent finishes writing the ~200-byte JSON hook payload. On Linux the pipe raises `EPIPE` immediately; on macOS the pipe happens to buffer the small payload before the child exits, which is why the race only surfaced on ubuntu CI runners. The parent's `write_all` returned `Err(BrokenPipe)`, `output_with_stdin` returned that as a hook failure, and `run_command` classified the hook as "failed to start" even though the child had already run to completion and printed the expected message to stdout. **Fix (commit `172a2ad`, force-pushed over `4f7b674`).** Three parts: (1) **actual fix** — `output_with_stdin` now matches the `write_all` result and swallows `BrokenPipe` specifically, while propagating all other write errors unchanged; after a `BrokenPipe` swallow the code still calls `wait_with_output()` so stdout/stderr/exit code are still captured from the cleanly-exited child. (2) **hygiene hardening** — a new `make_executable` helper sets mode `0o755` on each generated `.sh` via `std::os::unix::fs::PermissionsExt` under `#[cfg(unix)]`. This is defense-in-depth for future non-sh hook runners, not the bug that was biting CI. (3) **regression guard** — new `generated_hook_scripts_are_executable` test under `#[cfg(unix)]` asserts each generated `.sh` file has at least one execute bit set (`mode & 0o111 != 0`) so future tweaks cannot silently regress the hygiene change. **Verification.** `cargo test --release -p plugins` 35 passing, fmt clean, clippy `-D warnings` clean; CI run [24121999385](https://github.com/ultraworkers/claw-code/actions/runs/24121999385) went green on first attempt on `main` for the hotfix commit. **Meta-lesson.** `Broken pipe (os error 32)` from a child-process spawn path is ambiguous between "could not exec" and "exec'd and exited before the parent finished writing stdin." The first theory cargo-culted the "could not exec" reading because the ROADMAP scaffolding anchored on the exec-bit guess; falsification came from empirical CI, not from code inspection. Record the pattern: when a pipe error surfaces on fork/exec, instrument what `wait_with_output()` actually reports on the child before attributing the failure to a permissions or issue. 26. **Resumed local-command JSON parity gap** — **done**: direct `claw --output-format json` already had structured renderers for `sandbox`, `mcp`, `skills`, `version`, and `init`, but resumed `claw --output-format json --resume /…` paths still fell back to prose because resumed slash dispatch only emitted JSON for `/status`. Resumed `/sandbox`, `/mcp`, `/skills`, `/version`, and `/init` now reuse the same JSON envelopes as their direct CLI counterparts, with regression coverage in `rust/crates/rusty-claude-cli/tests/resume_slash_commands.rs` and `rust/crates/rusty-claude-cli/tests/output_format_contract.rs`. 27. **`dev/rust` `cargo test -p rusty-claude-cli` reads host `~/.claude/plugins/installed/` from real `$HOME` and fails parse-time on any half-installed user plugin** — dogfooding on 2026-04-08 (filed from gaebal-gajae's clawhip bullet at message `1491322807026454579` after the provider-matrix branch QA surfaced it) reproduced 11 deterministic failures on clean `dev/rust` HEAD of the form `panicked at crates/rusty-claude-cli/src/main.rs:3953:31: args should parse: "hook path \`/Users/yeongyu/.claude/plugins/installed/sample-hooks-bundled/./hooks/pre.sh\` does not exist; hook path \`...\post.sh\` does not exist"` covering `parses_prompt_subcommand`, `parses_permission_mode_flag`, `defaults_to_repl_when_no_args`, `parses_resume_flag_with_slash_command`, `parses_system_prompt_options`, `parses_bare_prompt_and_json_output_flag`, `rejects_unknown_allowed_tools`, `parses_resume_flag_with_multiple_slash_commands`, `resolves_model_aliases_in_args`, `parses_allowed_tools_flags_with_aliases_and_lists`, `parses_login_and_logout_subcommands`. **Same failures do NOT reproduce on `main`** (re-verified with `cargo test --release -p rusty-claude-cli` against `main` HEAD `79da4b8`, all 156 tests pass). **Root cause is two-layered.** First, on `dev/rust` `parse_args` eagerly walks user-installed plugin manifests under `~/.claude/plugins/installed/` and validates that every declared hook script exists on disk before returning a `CliAction`, so any half-installed plugin in the developer's real `$HOME` (in this case `~/.claude/plugins/installed/sample-hooks-bundled/` whose `.claude-plugin` manifest references `./hooks/pre.sh` and `./hooks/post.sh` but whose `hooks/` subdirectory was deleted) makes argv parsing itself fail. Second, the test harness on `dev/rust` does not redirect `$HOME` or `XDG_CONFIG_HOME` to a fixture for the duration of the test — there is no `env_lock`-style guard equivalent to the one `main` already uses (`grep -n env_lock rust/crates/rusty-claude-cli/src/main.rs` returns 0 hits on `dev/rust` and 30+ hits on `main`). Together those two gaps mean `dev/rust` `cargo test -p rusty-claude-cli` is non-deterministic on every clean clone whose owner happens to have any non-pristine plugin in `~/.claude/`. **Action (two parts).** (a) Backport the `env_lock`-based test isolation pattern from `main` into `dev/rust`'s `rusty-claude-cli` test module so each test runs against a temp `$HOME`/`XDG_CONFIG_HOME` and cannot read host plugin state. (b) Decouple `parse_args` from filesystem hook validation on `dev/rust` (the same decoupling already on `main`, where hook validation happens later in the lifecycle than argv parsing) so even outside tests a partially installed user plugin cannot break basic CLI invocation. **Branch scope.** This is a `dev/rust` catchup against `main`, not a `main` regression. Tracking it here so the dev/rust merge train picks it up before the next dev/rust release rather than rediscovering it in CI. - 28. **Auth-provider truth: error copy fails real users at the env-var-vs-header layer** — dogfooded live on 2026-04-08 in #claw-code (Sisyphus Labs guild), two separate new users hit adjacent failure modes within minutes of each other that both trace back to the same root: the `MissingApiKey` / 401 error surface does not teach users how the auth inputs map to HTTP semantics, so a user who sets a "reasonable-looking" env var still hits a hard error with no signpost. **Case 1 (varleg, Norway).** Wanted to use OpenRouter via the OpenAI-compat path. Found a comparison table claiming "provider-agnostic (Claude, OpenAI, local models)" and assumed it Just Worked. Set `OPENAI_API_KEY` to an OpenRouter `sk-or-v1-...` key and a model name without an `openai/` prefix; claw's provider detection fell through to Anthropic first because `ANTHROPIC_API_KEY` was still in the environment. Unsetting `ANTHROPIC_API_KEY` got them `ANTHROPIC_AUTH_TOKEN or ANTHROPIC_API_KEY is not set` instead of a useful hint that the OpenAI path was right there. Fix delivered live as a channel reply: use `main` branch (not `dev/rust`), export `OPENAI_BASE_URL=https://openrouter.ai/api/v1` alongside `OPENAI_API_KEY`, and prefix the model name with `openai/` so the prefix router wins over env-var presence. **Case 2 (stanley078852).** Had set `ANTHROPIC_AUTH_TOKEN="sk-ant-..."` and was getting 401 `Invalid bearer token` from Anthropic. Root cause: `sk-ant-` keys are `x-api-key`-header keys, not bearer tokens. `ANTHROPIC_API_KEY` path in `anthropic.rs` sends the value as `x-api-key`; `ANTHROPIC_AUTH_TOKEN` path sends it as `Authorization: Bearer` (for OAuth access tokens from `claw login`). Setting an `sk-ant-` key in the wrong env var makes claw send it as `Bearer sk-ant-...` which Anthropic rejects at the edge with 401 before it ever reaches the completions endpoint. The error text propagated all the way to the user (`api returned 401 Unauthorized (authentication_error) ... Invalid bearer token`) with zero signal that the problem was env-var choice, not key validity. Fix delivered live as a channel reply: move the `sk-ant-...` key to `ANTHROPIC_API_KEY` and unset `ANTHROPIC_AUTH_TOKEN`. **Pattern.** Both cases are failures at the *auth-intent translation* layer: the user chose an env var that made syntactic sense to them (`OPENAI_API_KEY` for OpenAI, `ANTHROPIC_AUTH_TOKEN` for Anthropic auth) but the actual wire-format routing requires a more specific choice. The error messages surface the HTTP-layer symptom (401, missing-key) without bridging back to "which env var should you have used and why." **Action.** Three concrete improvements, scoped for a single `main`-side PR: (a) In `ApiError::MissingCredentials` Display, when the Anthropic path is the one being reported but `OPENAI_API_KEY`, `XAI_API_KEY`, or `DASHSCOPE_API_KEY` are present in the environment, extend the message with "— but I see `$OTHER_KEY` set; if you meant to use that provider, prefix your model name with `openai/`, `grok`, or `qwen/` respectively so prefix routing selects it." (b) In the 401-from-Anthropic error path in `anthropic.rs`, when the failing auth source is `BearerToken` AND the bearer token starts with `sk-ant-`, append "— looks like you put an `sk-ant-*` API key in `ANTHROPIC_AUTH_TOKEN`, which is the Bearer-header path. Move it to `ANTHROPIC_API_KEY` instead (that env var maps to `x-api-key`, which is the correct header for `sk-ant-*` keys)." Same treatment for OAuth access tokens landing in `ANTHROPIC_API_KEY` (symmetric mis-assignment). (c) In `rust/README.md` on `main` and the matrix section on `dev/rust`, add a short "Which env var goes where" paragraph mapping `sk-ant-*` → `ANTHROPIC_API_KEY` and OAuth access token → `ANTHROPIC_AUTH_TOKEN`, with the one-line explanation of `x-api-key` vs `Authorization: Bearer`. **Verification path.** Both improvements can be tested with unit tests against `ApiError::fmt` output (the prefix-routing hint) and with a targeted integration test that feeds an `sk-ant-*`-shaped token into `BearerToken` and asserts the fmt output surfaces the correction hint (no HTTP call needed). **Source.** Live users in #claw-code at `1491328554598924389` (varleg) and `1491329840706486376` (stanley078852) on 2026-04-08. **Partial landing (`ff1df4c`).** Action parts (a), (b), (c) shipped on `main`: `MissingCredentials` now carries an optional hint field and renders adjacent-provider signals, Anthropic 401 + `sk-ant-*` bearer gets a correction hint, USAGE.md has a "Which env var goes where" section. BUT the copy fix only helps users who fell through to the Anthropic auth path by accident — it does NOT fix the underlying routing bug where the CLI instantiates `AnthropicRuntimeClient` unconditionally and ignores prefix routing at the runtime-client layer. That deeper routing gap is tracked separately as #29 below and was filed within hours of #28 landing when live users still hit `missing Anthropic credentials` with `--model openai/gpt-4` and all `ANTHROPIC_*` env vars unset. -29. **CLI provider dispatch is hardcoded to Anthropic, ignoring prefix routing** — **done at `8dc6580` on 2026-04-08**. Changed `AnthropicRuntimeClient.client` from concrete `AnthropicClient` to `ApiProviderClient` (the api crate’s `ProviderClient` enum), which dispatches to Anthropic / xAI / OpenAi at construction time based on `detect_provider_kind(&resolved_model)`. 1 file, +59 −7, all 182 rusty-claude-cli tests pass, CI green at run `24125825431`. Users can now run `claw --model openai/gpt-4.1-mini prompt "hello"` with only `OPENAI_API_KEY` set and it routes correctly. **Original filing below for the trace record.** Dogfooded live on 2026-04-08 within hours of ROADMAP #28 landing. Users in #claw-code (nicma at `1491342350960562277`, Jengro at `1491345009021030533`) followed the exact "use main, set OPENAI_API_KEY and OPENAI_BASE_URL, unset ANTHROPIC_*, prefix the model with `openai/`" checklist from the #28 error-copy improvements AND STILL hit `error: missing Anthropic credentials; export ANTHROPIC_AUTH_TOKEN or ANTHROPIC_API_KEY before calling the Anthropic API`. **Reproduction on `main` HEAD `ff1df4c`:** `unset ANTHROPIC_API_KEY ANTHROPIC_AUTH_TOKEN; export OPENAI_API_KEY=sk-...; export OPENAI_BASE_URL=https://api.openai.com/v1; claw --model openai/gpt-4 prompt 'test'` → reproduces the error deterministically. **Root cause (traced).** `rust/crates/rusty-claude-cli/src/main.rs` at `build_runtime_with_plugin_state` (line ~6221) unconditionally builds `AnthropicRuntimeClient::new(session_id, model, ...)` without consulting `providers::detect_provider_kind(&model)`. `BuiltRuntime` at line ~2855 is statically typed as `ConversationRuntime`, so even if the dispatch logic existed there would be nowhere to slot an alternative client. `providers/mod.rs::metadata_for_model` correctly identifies `openai/gpt-4` as `ProviderKind::OpenAi` at the metadata layer — the routing decision is *computed* correctly, it's just *never used* to pick a runtime client. The result is that the CLI is structurally single-provider (Anthropic only) even though the `api` crate's `openai_compat.rs`, `XAI_ENV_VARS`, `DASHSCOPE_ENV_VARS`, and `send_message_streaming` all exist and are exercised by unit tests inside the `api` crate. The provider matrix in `rust/README.md` is misleading because it describes the api-crate capabilities, not the CLI's actual dispatch behaviour. **Why #28 didn't catch this.** ROADMAP #28 focused on the `MissingCredentials` error *message* (adding hints when adjacent provider env vars are set, or when a bearer token starts with `sk-ant-*`). None of its tests exercised the `build_runtime` code path — they were all unit tests against `ApiError::fmt` output. The routing bug survives #28 because the `Display` improvements fire AFTER the hardcoded Anthropic client has already been constructed and failed. You need the CLI to dispatch to a different client in the first place for the new hints to even surface at the right moment. **Action (single focused commit).** (1) New `OpenAiCompatRuntimeClient` struct in `rust/crates/rusty-claude-cli/src/main.rs` mirroring `AnthropicRuntimeClient` but delegating to `openai_compat::send_message_streaming`. One client type handles OpenAI, xAI, DashScope, and any OpenAI-compat endpoint — they differ only in base URL and auth env var, both of which come from the `ProviderMetadata` returned by `metadata_for_model`. (2) New enum `DynamicApiClient { Anthropic(AnthropicRuntimeClient), OpenAiCompat(OpenAiCompatRuntimeClient) }` that implements `runtime::ApiClient` by matching on the variant and delegating. (3) Retype `BuiltRuntime` from `ConversationRuntime` to `ConversationRuntime`, update the Deref/DerefMut/new spots. (4) In `build_runtime_with_plugin_state`, call `detect_provider_kind(&model)` and construct either variant of `DynamicApiClient`. Prefix routing wins over env-var presence (that's the whole point). (5) Integration test using a mock OpenAI-compat server (reuse `mock_parity_harness` pattern from `crates/api/tests/`) that feeds `claw --model openai/gpt-4 prompt 'test'` with `OPENAI_BASE_URL` pointed at the mock and no `ANTHROPIC_*` env vars, asserts the request reaches the mock, and asserts the response round-trips as an `AssistantEvent`. (6) Unit test that `build_runtime_with_plugin_state` with `model="openai/gpt-4"` returns a `BuiltRuntime` whose inner client is the `DynamicApiClient::OpenAiCompat` variant. **Verification.** `cargo test --workspace`, `cargo fmt --all`, `cargo clippy --workspace`. **Source.** Live users nicma (`1491342350960562277`) and Jengro (`1491345009021030533`) in #claw-code on 2026-04-08, within hours of #28 landing. +29. **CLI provider dispatch is hardcoded to Anthropic, ignoring prefix routing** — **done at `8dc6580` on 2026-04-08**. Changed `AnthropicRuntimeClient.client` from concrete `AnthropicClient` to `ApiProviderClient` (the api crate's `ProviderClient` enum), which dispatches to Anthropic / xAI / OpenAi at construction time based on `detect_provider_kind(&resolved_model)`. 1 file, +59 −7, all 182 rusty-claude-cli tests pass, CI green at run `24125825431`. Users can now run `claw --model openai/gpt-4.1-mini prompt "hello"` with only `OPENAI_API_KEY` set and it routes correctly. **Original filing below for the trace record.** Dogfooded live on 2026-04-08 within hours of ROADMAP #28 landing. Users in #claw-code (nicma at `1491342350960562277`, Jengro at `1491345009021030533`) followed the exact "use main, set OPENAI_API_KEY and OPENAI_BASE_URL, unset ANTHROPIC_*, prefix the model with `openai/`" checklist from the #28 error-copy improvements AND STILL hit `error: missing Anthropic credentials; export ANTHROPIC_AUTH_TOKEN or ANTHROPIC_API_KEY before calling the Anthropic API`. **Reproduction on `main` HEAD `ff1df4c`:** `unset ANTHROPIC_API_KEY ANTHROPIC_AUTH_TOKEN; export OPENAI_API_KEY=sk-...; export OPENAI_BASE_URL=https://api.openai.com/v1; claw --model openai/gpt-4 prompt 'test'` → reproduces the error deterministically. **Root cause (traced).** `rust/crates/rusty-claude-cli/src/main.rs` at `build_runtime_with_plugin_state` (line ~6221) unconditionally builds `AnthropicRuntimeClient::new(session_id, model, ...)` without consulting `providers::detect_provider_kind(&model)`. `BuiltRuntime` at line ~2855 is statically typed as `ConversationRuntime`, so even if the dispatch logic existed there would be nowhere to slot an alternative client. `providers/mod.rs::metadata_for_model` correctly identifies `openai/gpt-4` as `ProviderKind::OpenAi` at the metadata layer — the routing decision is *computed* correctly, it's just *never used* to pick a runtime client. The result is that the CLI is structurally single-provider (Anthropic only) even though the `api` crate's `openai_compat.rs`, `XAI_ENV_VARS`, `DASHSCOPE_ENV_VARS`, and `send_message_streaming` all exist and are exercised by unit tests inside the `api` crate. The provider matrix in `rust/README.md` is misleading because it describes the api-crate capabilities, not the CLI's actual dispatch behaviour. **Why #28 didn't catch this.** ROADMAP #28 focused on the `MissingCredentials` error *message* (adding hints when adjacent provider env vars are set, or when a bearer token starts with `sk-ant-*`). None of its tests exercised the `build_runtime` code path — they were all unit tests against `ApiError::fmt` output. The routing bug survives #28 because the `Display` improvements fire AFTER the hardcoded Anthropic client has already been constructed and failed. You need the CLI to dispatch to a different client in the first place for the new hints to even surface at the right moment. **Action (single focused commit).** (1) New `OpenAiCompatRuntimeClient` struct in `rust/crates/rusty-claude-cli/src/main.rs` mirroring `AnthropicRuntimeClient` but delegating to `openai_compat::send_message_streaming`. One client type handles OpenAI, xAI, DashScope, and any OpenAI-compat endpoint — they differ only in base URL and auth env var, both of which come from the `ProviderMetadata` returned by `metadata_for_model`. (2) New enum `DynamicApiClient { Anthropic(AnthropicRuntimeClient), OpenAiCompat(OpenAiCompatRuntimeClient) }` that implements `runtime::ApiClient` by matching on the variant and delegating. (3) Retype `BuiltRuntime` from `ConversationRuntime` to `ConversationRuntime`, update the Deref/DerefMut/new spots. (4) In `build_runtime_with_plugin_state`, call `detect_provider_kind(&model)` and construct either variant of `DynamicApiClient`. Prefix routing wins over env-var presence (that's the whole point). (5) Integration test using a mock OpenAI-compat server (reuse `mock_parity_harness` pattern from `crates/api/tests/`) that feeds `claw --model openai/gpt-4 prompt 'test'` with `OPENAI_BASE_URL` pointed at the mock and no `ANTHROPIC_*` env vars, asserts the request reaches the mock, and asserts the response round-trips as an `AssistantEvent`. (6) Unit test that `build_runtime_with_plugin_state` with `model="openai/gpt-4"` returns a `BuiltRuntime` whose inner client is the `DynamicApiClient::OpenAiCompat` variant. **Verification.** `cargo test --workspace`, `cargo fmt --all`, `cargo clippy --workspace`. **Source.** Live users nicma (`1491342350960562277`) and Jengro (`1491345009021030533`) in #claw-code on 2026-04-08, within hours of #28 landing. 41. **Phantom completions root cause: global session store has no per-worktree isolation** — @@ -348,63 +346,6 @@ Priority order: P0 = blocks CI/green state, P1 = blocks integration wiring, P2 = **Status.** A `workspace_root` field has been added to `Session` in `rust/crates/runtime/src/session.rs` (with builder, accessor, JSON + JSONL round-trip, fork inheritance, and given/when/then test coverage in `persists_workspace_root_round_trip_and_forks_inherit_it`). The CWD validation, the namespaced on-disk path, and the `branch.workspace_mismatch` lane event are still outstanding and tracked under this item. -**P3 — Swarm efficiency** -13. Swarm branch-lock protocol — **done**: `branch_lock::detect_branch_lock_collisions()` now detects same-branch/same-scope and nested-module collisions before parallel lanes drift into duplicate implementation -14. Commit provenance / worktree-aware push events — **done**: lane event provenance now includes branch/worktree/superseded/canonical lineage metadata, and manifest persistence de-dupes superseded commit events before downstream consumers render them - -## Suggested Session Split - -### Session A — worker boot protocol -Focus: -- trust prompt detection -- ready-for-prompt handshake -- prompt misdelivery detection - -### Session B — clawhip lane events -Focus: -- canonical lane event schema -- failure taxonomy -- summary compression - -### Session C — branch/test intelligence -Focus: -- stale-branch detection -- green-level contract -- recovery recipes - -### Session D — MCP lifecycle hardening -Focus: -- startup/handshake reliability -- structured failed server reporting -- degraded-mode runtime behavior -- lifecycle tests/harness coverage - -### Session E — typed task packets + policy engine -Focus: -- structured task format -- retry/merge/escalation rules -- autonomous lane closure behavior - -## MVP Success Criteria - -We should consider claw-code materially more clawable when: -- a claw can start a worker and know with certainty when it is ready -- claws no longer accidentally type tasks into the shell -- stale-branch failures are identified before they waste debugging time -- clawhip reports machine states, not just tmux prose -- MCP/plugin startup failures are classified and surfaced cleanly -- a coding lane can self-recover from common startup and branch issues without human babysitting - -## Short Version - -claw-code should evolve from: -- a CLI a human can also drive - -to: -- a **claw-native execution runtime** -- an **event-native orchestration substrate** -- a **plugin/hook-first autonomous coding harness** - ## Deployment Architecture Gap (filed from dogfood 2026-04-08) ### WorkerState is in the runtime; /state is NOT in opencode serve @@ -505,7 +446,7 @@ Model name prefix now wins unconditionally over env-var presence. Regression tes 40. **Surface broken installed plugins before they become support ghosts** — community-support lane. Clawhip commit `ff6d3b7` on worktree `claw-code-community-support-plugin-list-load-failures` / branch `community-support/plugin-list-load-failures`. When an installed plugin has a broken manifest (missing hook scripts, parse errors, bad json), the plugin silently fails to load and the user sees nothing — no warning, no list entry, no hint. Related to ROADMAP #27 (host plugin path leaking into tests) but at the user-facing surface: the test gap and the UX gap are siblings of the same root. Landing on `main` will close the silent-ghost class of support issues where users report "my plugin does nothing" with no error to share. Track until merged to `main`. -40. **Surface broken installed plugins before they become support ghosts** — community-support lane. Clawhip commit `ff6d3b7` on worktree `claw-code-community-support-plugin-list-load-failures` / branch `community-support/plugin-list-load-failures`. When an installed plugin has a broken manifest (missing hook scripts, parse errors, bad json), the plugin silently fails to load and the user sees nothing — no warning, no list entry, no hint. Related to ROADMAP #27 (host plugin path leaking into tests) but at the user-facing surface: the test gap and the UX gap are siblings of the same root. Landing on `main` will close the silent-ghost class of support issues where users report "my plugin does nothing" with no error to share. Track until merged to `main`. +40. **Surface broken installed plugins before they become support ghosts** — community-support lane. Clawhip commit `ff6d3b7` on worktree `claw-code-community-support-plugin-list-load-failures` / branch `community-support/plugin-list-load-failures`. When an installed plugin has a broken manifest (missing hook scripts, parse errors, bad json), the plugin silently fails to load and the user sees nothing — no warning, no list entry, no hint. Related to ROADMAP #27 (host plugin path leaking into tests) but at the user-facing surface: the test gap and the UX gap are siblings of the same same root. Landing on `main` will close the silent-ghost class of support issues where users report "my plugin does nothing" with no error to share. Track until merged to `main`. 41. **Stop ambient plugin state from skewing CLI regression checks** — community-support lane. Clawhip commit `7d493a7` on worktree `claw-code-community-support-plugin-test-sealing` / branch `community-support/plugin-test-sealing`. Companion to #40: the test sealing gap is the CI/developer side of the same root — host `~/.claude/plugins/installed/` bleeds into CLI test runs, making regression checks non-deterministic on any machine with a non-pristine plugin install. Closely related to ROADMAP #27 (dev/rust `cargo test` reads host plugin state). Track until merged to `main`. @@ -517,11 +458,11 @@ Model name prefix now wins unconditionally over env-var presence. Regression tes 45. **`claw dump-manifests` fails with opaque "No such file or directory"** — dogfooded 2026-04-09. `claw dump-manifests` emits `error: failed to extract manifests: No such file or directory (os error 2)` with no indication of which file or directory is missing. **Partial fix at `47aa1a5`+1**: error message now includes `looked in: ` so the build-tree path is visible, what manifests are, or how to fix it. Fix shape: (a) surface the missing path in the error message; (b) add a pre-check that explains what manifests are and where they should be (e.g. `.claw/manifests/` or the plugins directory); (c) if the command is only valid after `claw init` or after installing plugins, say so explicitly. Source: Jobdori dogfood 2026-04-09. -46. **`/tokens`, `/cache`, `/stats` were dead spec — parse arms missing** — dogfooded 2026-04-09. All three had spec entries with `resume_supported: true` but no parse arms, producing the circular error "Unknown slash command: /tokens — Did you mean /tokens". Also `SlashCommand::Stats` existed but was unimplemented in both REPL and resume dispatch. **Done at `60ec2ae` 2026-04-09**: `"tokens" | "cache"` now alias to `SlashCommand::Stats`; `Stats` is wired in both REPL and resume path with full JSON output. Source: Jobdori dogfood 2026-04-09. +46. **`/tokens`, `/cache`, `/stats` were dead spec — parse arms missing** — dogfooded 2026-04-09. All three had spec entries with `resume_supported: true` but no parse arms, producing the circular error "Unknown slash command: /tokens — Did you mean /tokens". Also `SlashCommand::Stats` existed but was unimplemented in both REPL and resume dispatch. **Done at `60ec2ae` 2026-04-09**: `"tokens" | "cache"` now alias to `SlashCommand::Stats`; `Stats` is wired in both REPL and resume path with full JSON output. Source: Jobdori dogfood. -47. **`/diff` fails with cryptic "unknown option 'cached'" outside a git repo; resume /diff used wrong CWD** — dogfooded 2026-04-09. `claw --resume /diff` in a non-git directory produced `git diff --cached failed: error: unknown option 'cached'` because git falls back to `--no-index` mode outside a git tree. Also resume `/diff` used `session_path.parent()` (the `.claw/sessions//` dir) as CWD for the diff — never a git repo. **Done at `aef85f8` 2026-04-09**: `render_diff_report_for()` now checks `git rev-parse --is-inside-work-tree` first and returns a clear "no git repository" message; resume `/diff` uses `std::env::current_dir()`. Source: Jobdori dogfood 2026-04-09. +47. **`/diff` fails with cryptic "unknown option 'cached'" outside a git repo; resume /diff used wrong CWD** — dogfooded 2026-04-09. `claw --resume /diff` in a non-git directory produced `git diff --cached failed: error: unknown option 'cached'` because git falls back to `--no-index` mode outside a git tree. Also resume `/diff` used `session_path.parent()` (the `.claw/sessions//` dir) as CWD for the diff — never a git repo. **Done at `aef85f8` 2026-04-09**: `render_diff_report_for()` now checks `git rev-parse --is-inside-work-tree` first and returns a clear "no git repository" message; resume `/diff` uses `std::env::current_dir()`. Source: Jobdori dogfood. -48. **Piped stdin triggers REPL startup and banner instead of one-shot prompt** — dogfooded 2026-04-09. `echo "hello" | claw` started the interactive REPL, printed the ASCII banner, consumed the pipe without sending anything to the API, then exited. `parse_args` always returned `CliAction::Repl` when no args were given, never checking whether stdin was a pipe. **Done at `84b77ec` 2026-04-09**: when `rest.is_empty()` and stdin is not a terminal, read the pipe and dispatch as `CliAction::Prompt`. Empty pipe still falls through to REPL. Source: Jobdori dogfood 2026-04-09. +48. **Piped stdin triggers REPL startup and banner instead of one-shot prompt** — dogfooded 2026-04-09. `echo "hello" | claw` started the interactive REPL, printed the ASCII banner, consumed the pipe without sending anything to the API, then exited. `parse_args` always returned `CliAction::Repl` when no args were given, never checking whether stdin was a pipe. **Done at `84b77ec` 2026-04-09**: when `rest.is_empty()` and stdin is not a terminal, read the pipe and dispatch as `CliAction::Prompt`. Empty pipe still falls through to REPL. Source: Jobdori dogfood. 49. **Resumed slash command errors emitted as prose in `--output-format json` mode** — dogfooded 2026-04-09. `claw --output-format json --resume /commit` called `eprintln!()` and `exit(2)` directly, bypassing the JSON formatter. Both the slash-command parse-error path and the `run_resume_command` Err path now check `output_format` and emit `{"type":"error","error":"...","command":"..."}`. **Done at `da42421` 2026-04-09**. Source: gaebal-gajae ROADMAP #26 track; Jobdori dogfood. @@ -548,3 +489,5 @@ Model name prefix now wins unconditionally over env-var presence. Regression tes 60. **`glob_search` silently returns 0 results for brace expansion patterns** — user report 2026-04-10 in #claw-code (zero, Windows/Unity). Patterns like `Assets/**/*.{cs,uxml,uss}` returned 0 files because the `glob` crate (v0.3) does not support shell-style brace groups. The agent fell back to shell tools as a workaround. **Done at `3a6c9a5` 2026-04-10**: added `expand_braces()` pre-processor that expands brace groups before passing to `glob::glob()`. Handles nested braces. Results deduplicated via `HashSet`. 5 regression tests. Source: zero in #claw-code; traced by gaebal-gajae. 61. **`OPENAI_BASE_URL` ignored when model name has no recognized prefix** — user report 2026-04-10 in #claw-code (MaxDerVerpeilte, Ollama). User set `OPENAI_BASE_URL=http://127.0.0.1:11434/v1` with model `qwen2.5-coder:7b` but claw asked for Anthropic credentials. `detect_provider_kind()` checks model prefix first, then falls through to env-var presence — but `OPENAI_BASE_URL` was not in the cascade, so unrecognized model names always hit the Anthropic default. **Done at `1ecdb10` 2026-04-10**: `OPENAI_BASE_URL` + `OPENAI_API_KEY` now beats Anthropic env-check. `OPENAI_BASE_URL` alone (no key, e.g. Ollama) is last-resort before Anthropic default. Source: MaxDerVerpeilte in #claw-code; traced by gaebal-gajae. + +62. **Worker state file surface not implemented** — dogfooded 2026-04-10. ROADMAP "Deployment Architecture Gap" section documents `.claw/worker-state.json` as the "canonical observability contract" for clawhip, but `emit_state_file()` is not wired in `WorkerRegistry::transition()`. The file is never written. **Status:** documentation exists, code does not. Fix shape: wire `emit_state_file()` to write `.claw/worker-state.json` atomically on every `WorkerStatus` transition; add `claw state` CLI subcommand to read/display it. Blocker: none. Source: Jobdori dogfood. diff --git a/rust/crates/commands/src/lib.rs b/rust/crates/commands/src/lib.rs index 4904f9f..dd109b1 100644 --- a/rust/crates/commands/src/lib.rs +++ b/rust/crates/commands/src/lib.rs @@ -4,7 +4,7 @@ use std::fmt; use std::fs; use std::path::{Path, PathBuf}; -use plugins::{PluginError, PluginManager, PluginSummary}; +use plugins::{PluginError, PluginLoadFailure, PluginManager, PluginSummary}; use runtime::{ compact_session, CompactionConfig, ConfigLoader, ConfigSource, McpOAuthConfig, McpServerConfig, ScopedMcpServerConfig, Session, @@ -2210,10 +2210,15 @@ pub fn handle_plugins_slash_command( manager: &mut PluginManager, ) -> Result { match action { - None | Some("list") => Ok(PluginsCommandResult { - message: render_plugins_report(&manager.list_installed_plugins()?), - reload_runtime: false, - }), + None | Some("list") => { + let report = manager.installed_plugin_registry_report()?; + let plugins = report.summaries(); + let failures = report.failures(); + Ok(PluginsCommandResult { + message: render_plugins_report_with_failures(&plugins, failures), + reload_runtime: false, + }) + } Some("install") => { let Some(target) = target else { return Ok(PluginsCommandResult { @@ -2667,6 +2672,48 @@ pub fn render_plugins_report(plugins: &[PluginSummary]) -> String { lines.join("\n") } +#[must_use] +pub fn render_plugins_report_with_failures( + plugins: &[PluginSummary], + failures: &[PluginLoadFailure], +) -> String { + let mut lines = vec!["Plugins".to_string()]; + + // Show successfully loaded plugins + if plugins.is_empty() { + lines.push(" No plugins installed.".to_string()); + } else { + for plugin in plugins { + let enabled = if plugin.enabled { + "enabled" + } else { + "disabled" + }; + lines.push(format!( + " {name:<20} v{version:<10} {enabled}", + name = plugin.metadata.name, + version = plugin.metadata.version, + )); + } + } + + // Show warnings for broken plugins + if !failures.is_empty() { + lines.push("".to_string()); + lines.push("Warnings:".to_string()); + for failure in failures { + lines.push(format!( + " ⚠️ Failed to load {} plugin from `{}`", + failure.kind, + failure.plugin_root.display() + )); + lines.push(format!(" Error: {}", failure.error())); + } + } + + lines.join("\n") +} + fn render_plugin_install_report(plugin_id: &str, plugin: Option<&PluginSummary>) -> String { let name = plugin.map_or(plugin_id, |plugin| plugin.metadata.name.as_str()); let version = plugin.map_or("unknown", |plugin| plugin.metadata.version.as_str());