Retire the stale dead-session opacity backlog item with regression proof

ROADMAP #38 no longer reflects current main. The runtime already runs a post-compaction session-health probe, but the backlog lacked explicit regression proof. This change adds focused tests for the two important behaviors: a broken tool surface aborts a compacted session with a targeted error, while a freshly compacted empty session does not false-positive as dead. With that proof in place, the roadmap item can be marked done. Constraint: User required fresh cargo fmt/clippy/test evidence before closing any backlog item Rejected: Leave #38 open because the implementation already existed | backlog stays stale and invites duplicate work Confidence: high Scope-risk: narrow Reversibility: clean Directive: Reopen #38 only with a fresh same-turn repro that bypasses the current health-probe gate Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace Not-tested: No live long-running dogfood session replay beyond existing automated coverage
2026-06-08 07:22:18 +08:00 · 2026-04-11 18:47:25 +00:00
parent 7ea4535cce
commit 257aeb82dd
2 changed files with 83 additions and 1 deletions
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -440,7 +440,7 @@ Model name prefix now wins unconditionally over env-var presence. Regression tes
 37. **Claude subscription login path should be removed, not deprecated** -- dogfooded 2026-04-09. Official auth should be API key only (`ANTHROPIC_API_KEY`) or OAuth bearer token via `ANTHROPIC_AUTH_TOKEN`; the local `claw login` / `claw logout` subscription-style flow created legal/billing ambiguity and a misleading saved-OAuth fallback. **Done (verified 2026-04-11):** removed the direct `claw login` / `claw logout` CLI surface, removed `/login` and `/logout` from shared slash-command discovery, changed both CLI and provider startup auth resolution to ignore saved OAuth credentials, and updated auth diagnostics to point only at `ANTHROPIC_API_KEY` / `ANTHROPIC_AUTH_TOKEN`. Verification: targeted `commands`, `api`, and `rusty-claude-cli` tests for removed login/logout guidance and ignored saved OAuth all pass, and `cargo check -p api -p commands -p rusty-claude-cli` passes. Source: gaebal-gajae policy decision 2026-04-09.
-38. **Dead-session opacity: bot cannot self-detect compaction vs broken tool surface** -- dogfooded 2026-04-09. Jobdori session spent ~15h declaring itself "dead" in-channel while tools were actually returning correct results within each turn. Root cause: context compaction causes tool outputs to be summarised away between turns, making the bot interpret absence-of-remembered-output as tool failure. This is a distinct failure mode from ROADMAP #31 (executor quirks): the session is alive and tools are functional, but the agent cannot tell the difference between "my last tool call produced no output" (compaction) and "the tool is broken". Downstream: repetitive false-dead signals in the channel, work not getting done despite the execution surface being live. Fix shape: (a) probe with a short known-output command at turn start if context has been compacted; (b) gate "I am dead" declarations behind at least one within-turn tool call with a verified non-empty result; (c) consider adding a session-health canary cron that fires a wake with a minimal probe and checks the result. Source: Jobdori self-dogfood 2026-04-09; observed in #clawcode-building-in-public across multiple Clawhip nudge cycles.
+38. **Dead-session opacity: bot cannot self-detect compaction vs broken tool surface** -- dogfooded 2026-04-09. Jobdori session spent ~15h declaring itself "dead" in-channel while tools were actually returning correct results within each turn. Root cause: context compaction causes tool outputs to be summarised away between turns, making the bot interpret absence-of-remembered-output as tool failure. This is a distinct failure mode from ROADMAP #31 (executor quirks): the session is alive and tools are functional, but the agent cannot tell the difference between "my last tool call produced no output" (compaction) and "the tool is broken". **Done (verified 2026-04-11):** `ConversationRuntime::run_turn()` now runs a post-compaction session-health probe through `glob_search`, fails fast with a targeted recovery error if the tool surface is broken, and skips the probe for a freshly compacted empty session. Fresh regression coverage proves both the failure gate and the empty-session bypass. Source: Jobdori self-dogfood 2026-04-09; observed in #clawcode-building-in-public across multiple Clawhip nudge cycles.
 39. **Several slash commands are registered but not implemented: /branch, /rewind, /ide, /tag, /output-style, /add-dir** -- dogfooded 2026-04-09. These commands appear in the REPL completions surface but silently print 'Command registered but not yet implemented.' and return false. Users (mezz2301 in #claw-code) hit this as 'many features are not supported in this version now'. Fix shape: either (a) implement the missing commands, or (b) remove them from completions/help output until they are ready, so the discovery surface matches what actually works. Source: mezz2301 in #claw-code 2026-04-09; pinpointed in main.rs:3728.
--- a/rust/crates/runtime/src/conversation.rs
+++ b/rust/crates/runtime/src/conversation.rs
@@ -1611,6 +1611,88 @@ mod tests {
        );
    }
    #[test]
    fn compaction_health_probe_blocks_turn_when_tool_executor_is_broken() {
        struct SimpleApi;
        impl ApiClient for SimpleApi {
            fn stream(
                &mut self,
                _request: ApiRequest,
            ) -> Result<Vec<AssistantEvent>, RuntimeError> {
                panic!("API should not run when health probe fails");
            }
        }
        let mut session = Session::new();
        session.record_compaction("summarized earlier work", 4);
        session
            .push_user_text("previous message")
            .expect("message should append");
        let tool_executor = StaticToolExecutor::new().register("glob_search", |_input| {
            Err(ToolError::new("transport unavailable"))
        });
        let mut runtime = ConversationRuntime::new(
            session,
            SimpleApi,
            tool_executor,
            PermissionPolicy::new(PermissionMode::DangerFullAccess),
            vec!["system".to_string()],
        );
        let error = runtime
            .run_turn("trigger", None)
            .expect_err("health probe failure should abort the turn");
        assert!(
            error
                .to_string()
                .contains("Session health probe failed after compaction"),
            "unexpected error: {error}"
        );
        assert!(
            error.to_string().contains("transport unavailable"),
            "expected underlying probe error: {error}"
        );
    }
    #[test]
    fn compaction_health_probe_skips_empty_compacted_session() {
        struct SimpleApi;
        impl ApiClient for SimpleApi {
            fn stream(
                &mut self,
                _request: ApiRequest,
            ) -> Result<Vec<AssistantEvent>, RuntimeError> {
                Ok(vec![
                    AssistantEvent::TextDelta("done".to_string()),
                    AssistantEvent::MessageStop,
                ])
            }
        }
        let mut session = Session::new();
        session.record_compaction("fresh summary", 2);
        let tool_executor = StaticToolExecutor::new().register("glob_search", |_input| {
            Err(ToolError::new(
                "glob_search should not run for an empty compacted session",
            ))
        });
        let mut runtime = ConversationRuntime::new(
            session,
            SimpleApi,
            tool_executor,
            PermissionPolicy::new(PermissionMode::DangerFullAccess),
            vec!["system".to_string()],
        );
        let summary = runtime
            .run_turn("trigger", None)
            .expect("empty compacted session should not fail health probe");
        assert_eq!(summary.auto_compaction, None);
        assert_eq!(runtime.session().messages.len(), 2);
    }
    #[test]
    fn build_assistant_message_requires_message_stop_event() {
        // given