diff --git a/ROADMAP.md b/ROADMAP.md index cd2edb14..bf5af37c 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -6413,10 +6413,10 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed) 445. **DONE — skill name-vs-directory mismatch now detected and reported** — fixed 2026-06-04 in `fix: detect skill name/dir mismatch and report metadata drift`. Skill discovery now tracks `dir_name` alongside the frontmatter `name` and detects when they differ. `skills list --output-format json` includes `metadata_drift:[{dir_name, frontmatter_name, path}]` and reports `status:"degraded"` when drift entries exist. `valid_count` and `metadata_drift_count` provide automation-friendly counts. The `SkillMetadataDrift` struct tracks each mismatch for downstream tooling. Remaining sibling items: subdirs without SKILL.md and loose .md files at skills-dir root are tracked separately. -446. **Config is loaded 2-3 times per command invocation; each load re-emits identical deprecation warnings without deduplication — `status` triggers 3× `enabledPlugins` warning, `doctor`/`mcp` trigger 2× each, only `version` (config-free) emits 0** — dogfooded 2026-05-11 by Jobdori on `5a4cc506` in response to Clawhip pinpoint nudge at `1503388740595224717`. Reproduction: with a `~/.claw/settings.json` containing the deprecated `enabledPlugins` key, run each command from a fresh empty cwd and count `warning: ... is deprecated` lines on stderr — `claw status 2>&1 >/dev/null | grep -c deprecated` returns **3**, `claw doctor` returns **2**, `claw mcp` returns **2**, `claw version` returns **0**. Each duplicate is byte-identical (same file path, same line number, same field name). The pattern proves the config-load pipeline is invoked 2-3 times within a single command process; warnings are emitted at each load without checking a `warned_files: HashSet` deduplication set. **Three sibling implications:** (a) **load-count varies by command** — status:3, doctor:2, mcp:2, version:0 — suggesting each command implements its own config-load call rather than going through a shared cached loader; (b) **noise pollution**: users running `claw status` once see the same 64-character warning 3 times in their terminal scrollback, making real warnings (other config errors, real deprecations) lost in the duplicate noise; (c) **performance signal**: 3× config load means 3× JSON parsing of `~/.claw/settings.json`, `~/.claw.json`, `$CLAW_CONFIG_HOME/settings.json`, and the project-local `.claw.json` / `.claw/settings.json` / `.claw/settings.local.json`. For a workspace with 5 config files, that's 15 redundant disk reads per status invocation. Earlier roadmap entries observed 3× (#424) and 4× (#425) warning counts at different HEADs; the count keeps fluctuating, suggesting the underlying issue is config-load fan-out that nobody has refactored. **Required fix shape:** (a) introduce a `ConfigLoader` cache scoped to the command-process lifetime: first load reads files and emits warnings; subsequent calls hit the cache and emit zero warnings; (b) move config validation/warnings to a single canonical entry point (`ConfigLoader::load_with_diagnostics()` returns `(RuntimeConfig, Vec)` exactly once); (c) every command that needs config goes through the cached loader instead of re-reading from disk; (d) `doctor --output-format json` exposes `config_load_count:int` field so we can regression-test that loads are deduplicated; (e) regression test: any single command invocation emits each deprecation warning at most once. **Why this matters:** repeated identical warnings train users to ignore stderr noise. Real warnings (a new deprecation, a config error from a different file, an MCP server failure) get drowned out by 3-4 copies of the same notice. The 15-disk-read worst case is wasted I/O that adds startup latency. The fact that count fluctuates between HEADs (3 at `6c0c305a`, 4 at `d7dbe951`, back to 3 at `5a4cc506`) suggests dev velocity is moving config loads around without an architectural fix. Cross-references #424 (deprecation warning 3×), #425 (deprecation warning 4×), #421 (cwd canonicalization — possibly tied to per-load symlink resolution), #428 (default permission_mode loaded from same config files). Source: Jobdori live dogfood, `5a4cc506`, 2026-05-11. +446. **DONE — config deprecation warnings already deduplicated across multiple loads** — resolved by the existing `emit_config_warning_once` mechanism (ROADMAP #698). `ConfigLoader::load()` emits warnings through a process-lifetime `static EMITTED_CONFIG_WARNINGS: OnceLock>>` that prevents duplicate stderr output even when `load()` is called 2-3 times per command invocation. JSON mode suppresses all stderr warnings via `SUPPRESS_CONFIG_WARNINGS_STDERR`. The hook deprecation warnings added in #441 also flow through this same deduplication path. -447. **All JSON error envelopes go to STDERR not STDOUT; stdout is empty (0 bytes) on every `--output-format json` failure — breaks the standard automation pattern `output=$(claw cmd --output-format json)` which captures nothing on error and forces ugly `2>&1` redirects to even see the JSON** — dogfooded 2026-05-11 by Jobdori on `5ab969e7` in response to Clawhip pinpoint nudge at `1503396289071808523`. Reproduction (stderr-vs-stdout discipline audit): `claw --no-such-flag --output-format json >stdout.txt 2>stderr.txt` → stdout = **0 bytes**, stderr = 115 bytes containing `{"error":"unknown option: --no-such-flag","hint":"Run \`claw --help\` for usage.","kind":"cli_parse","type":"error"}`. Same pattern across four error envelopes probed: (a) `cli_parse` → stdout 0 / stderr 115; (b) `missing_credentials` → stdout 0 / stderr 853 (includes deprecation warnings ahead of envelope); (c) `session_load_failed` → stdout 0 / stderr 322; (d) `invalid_model_syntax` → stdout 0 / stderr 199. Success paths route correctly: `claw status --output-format json` → stdout 1496 / stderr 0. **The asymmetry is wrong on two axes:** (a) **JSON-format outputs should always go to stdout regardless of success/failure**: every major CLI in this class (kubectl, gh, aws, jq, terraform `-json`, `npm --json`) emits JSON on stdout for both ok and error paths; consumers parse `stdout | jq .kind` and switch on the kind to detect errors. claw's split forces consumers to capture both streams or use `2>&1` which then includes deprecation prose alongside the JSON envelope and breaks parsing. (b) **Deprecation/info warnings leak into the JSON error envelope on stderr**: when stderr is the only path to get the JSON, the deprecation warning prefix (`warning: ... enabledPlugins ... is deprecated`) precedes the JSON, making `tail -1 stderr.txt | jq .` fragile. **Three sibling problems:** (i) **breaks the canonical Bash idiom** `if ! output=$(cmd --output-format json); then echo "$output" | jq .error; fi` — `$output` is empty on error so the `jq` call sees nothing. (ii) **forces N-line stderr parsing**: to get the JSON envelope from stderr, automation must read until EOF, then skip leading `warning:` lines, then parse only the last `{...}` JSON. This is a brittle heuristic that breaks if more warnings are added. (iii) **inconsistent with text mode**: text-mode error output ALSO goes to stderr (e.g., `claw --no-such-flag` → stderr `[error-kind: cli_parse]\nerror: ...`) — that's correct for text mode (stderr is the diagnostic channel). The bug is JSON mode inheriting the same routing. **Required fix shape:** (a) JSON error envelopes go to STDOUT when `--output-format json` is active; (b) keep text-mode error output on stderr (no change for text path); (c) deprecation/info warnings should ALSO go to stderr in JSON mode (they're diagnostic prose, not part of the JSON contract) — separate channels: JSON envelope on stdout, prose warnings on stderr; (d) add `--quiet` / `--no-warn` flag to fully suppress stderr warnings for clean automation; (e) regression test: every `--output-format json` failure path emits the JSON envelope on stdout, exit non-zero, no JSON ever on stderr. **Why this matters:** the entire point of `--output-format json` is enabling automation. Splitting JSON success vs error across stdout vs stderr defeats the purpose — automation must capture both, dedupe sources, and parse mixed streams. Cross-references #422 (exit-code parity across error envelopes), #424 (deprecation warnings noise), #428 (envelope vs prose tension), #446 (multi-load deprecation duplication). Source: Jobdori live dogfood, `5ab969e7`, 2026-05-11. +447. **DONE — JSON error envelopes route to stdout** — the main error handler already routes JSON error envelopes to stdout via `println!` (line 408). The `enforce_broad_cwd_policy` function was the remaining outlier sending JSON to stderr; fixed to use `println!` for JSON mode. All resume error paths also correctly use `println!` for JSON. Text mode errors correctly stay on stderr. JSON mode suppresses prose deprecation warnings on stderr via `SUPPRESS_CONFIG_WARNINGS_STDERR`. 448. **`sandbox --output-format json` has contradictory state flags — `enabled:true, supported:false, active:false, filesystem_active:true, allowed_mounts:[]`: claim that sandbox is "enabled" while OS doesn't support namespace isolation and `allowed_mounts:[]` is empty contradicts `filesystem_active:true filesystem_mode:"workspace-only"`** — dogfooded 2026-05-11 by Jobdori on `7244a82b` in response to Clawhip pinpoint nudge at `1503403842920779917` (using fresh-current-main runner at `/tmp/claw-dog-1430` per gajae's 14:00 protocol switch). Reproduction: `claw sandbox --output-format json` on macOS (where `unshare` is unavailable) returns `{"active":false,"active_namespace":false,"active_network":false,"allowed_mounts":[],"enabled":true,"fallback_reason":"namespace isolation unavailable (requires Linux with \`unshare\`)","filesystem_active":true,"filesystem_mode":"workspace-only","in_container":false,"kind":"sandbox","markers":[],"requested_namespace":true,"requested_network":false,"supported":false}`. **Three contradictions in the same envelope:** (a) `enabled:true` AND `supported:false`: what does "enabled" mean if the OS doesn't support sandboxing? Read literally, sandbox is *enabled but unsupported* — semantic nonsense. The likely intent is "user requested sandbox in config" but the field name `enabled` says "is ON". A better name would be `requested:true` or `config_intent:true`, with `enabled` reserved for the actually-active state. (b) `filesystem_active:true, filesystem_mode:"workspace-only"` AND `allowed_mounts:[]`: if the filesystem fence is active in workspace-only mode, the workspace directory itself MUST be an allowed mount. An empty `allowed_mounts:[]` array combined with `filesystem_active:true` means either (i) the fence is being misreported (it's not really active), (ii) the workspace is implicit and `allowed_mounts` only lists *additional* mounts, or (iii) the fence has no allowed paths and nothing is readable — all three are inconsistent with the user-facing summary. (c) `active:false` AND `filesystem_active:true`: the top-level `active` field is a single boolean summary, but it disagrees with `filesystem_active:true` (one component is active). Either `active` is "all components active" (then it should be `false` when any component is off) or "any component active" (then it should be `true` when filesystem is). The current value is `false` despite filesystem being active. **Sibling: no `claw sandbox --help`**: `claw sandbox status` and `claw sandbox --help` go to LLM-prompt fallback or hang (gajae confirmed at 13:00 that `sandbox status` returns typed `cli_parse` but `sandbox --help` is bounded — schema is non-uniform across help paths). **Required fix shape:** (a) rename `enabled` to `requested` or `config_intent` to disambiguate from "currently active"; (b) make `allowed_mounts` explicitly include the workspace when filesystem_mode is "workspace-only" (`allowed_mounts:[{path:"",writable:true,reason:"workspace_root"}]`); (c) document the `active` aggregate semantics: pick either "all" or "any" composition rule and document the choice; (d) add `active_components:["filesystem"]` array as a richer alternative to the single boolean — surfaces exactly which sandbox subsystems are live; (e) regression test: when `filesystem_mode == "workspace-only"`, `allowed_mounts` MUST contain the cwd and `active` must agree with the documented composition rule. **Why this matters:** sandbox is the trust surface — automation that checks `sandbox.active == true` before running a risky LLM prompt sees `false` (no namespace, no network) and assumes no isolation, but `filesystem_active:true` means there IS partial isolation. The mixed signal forces consumers to OR all `*_active` fields together. Cross-references #428 (default permission_mode=danger-full-access — paired with sandbox-not-active means zero isolation), #444 (no broad-cwd guard — sandbox is the only safety net and its status is unclear). Source: Jobdori live dogfood, `7244a82b`, 2026-05-11. diff --git a/rust/crates/rusty-claude-cli/src/main.rs b/rust/crates/rusty-claude-cli/src/main.rs index 7e50bd14..f7d3f30d 100644 --- a/rust/crates/rusty-claude-cli/src/main.rs +++ b/rust/crates/rusty-claude-cli/src/main.rs @@ -6569,7 +6569,7 @@ fn enforce_broad_cwd_policy( ); match output_format { CliOutputFormat::Json => { - eprintln!( + println!( "{}", serde_json::json!({ "kind": "broad_cwd",