mirror of
https://github.com/instructkr/claw-code.git
synced 2026-04-13 03:24:49 +08:00
Retire the stale dead-session opacity backlog item with regression proof
ROADMAP #38 no longer reflects current main. The runtime already runs a post-compaction session-health probe, but the backlog lacked explicit regression proof. This change adds focused tests for the two important behaviors: a broken tool surface aborts a compacted session with a targeted error, while a freshly compacted empty session does not false-positive as dead. With that proof in place, the roadmap item can be marked done. Constraint: User required fresh cargo fmt/clippy/test evidence before closing any backlog item Rejected: Leave #38 open because the implementation already existed | backlog stays stale and invites duplicate work Confidence: high Scope-risk: narrow Reversibility: clean Directive: Reopen #38 only with a fresh same-turn repro that bypasses the current health-probe gate Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace Not-tested: No live long-running dogfood session replay beyond existing automated coverage
This commit is contained in:
@@ -440,7 +440,7 @@ Model name prefix now wins unconditionally over env-var presence. Regression tes
|
|||||||
|
|
||||||
37. **Claude subscription login path should be removed, not deprecated** -- dogfooded 2026-04-09. Official auth should be API key only (`ANTHROPIC_API_KEY`) or OAuth bearer token via `ANTHROPIC_AUTH_TOKEN`; the local `claw login` / `claw logout` subscription-style flow created legal/billing ambiguity and a misleading saved-OAuth fallback. **Done (verified 2026-04-11):** removed the direct `claw login` / `claw logout` CLI surface, removed `/login` and `/logout` from shared slash-command discovery, changed both CLI and provider startup auth resolution to ignore saved OAuth credentials, and updated auth diagnostics to point only at `ANTHROPIC_API_KEY` / `ANTHROPIC_AUTH_TOKEN`. Verification: targeted `commands`, `api`, and `rusty-claude-cli` tests for removed login/logout guidance and ignored saved OAuth all pass, and `cargo check -p api -p commands -p rusty-claude-cli` passes. Source: gaebal-gajae policy decision 2026-04-09.
|
37. **Claude subscription login path should be removed, not deprecated** -- dogfooded 2026-04-09. Official auth should be API key only (`ANTHROPIC_API_KEY`) or OAuth bearer token via `ANTHROPIC_AUTH_TOKEN`; the local `claw login` / `claw logout` subscription-style flow created legal/billing ambiguity and a misleading saved-OAuth fallback. **Done (verified 2026-04-11):** removed the direct `claw login` / `claw logout` CLI surface, removed `/login` and `/logout` from shared slash-command discovery, changed both CLI and provider startup auth resolution to ignore saved OAuth credentials, and updated auth diagnostics to point only at `ANTHROPIC_API_KEY` / `ANTHROPIC_AUTH_TOKEN`. Verification: targeted `commands`, `api`, and `rusty-claude-cli` tests for removed login/logout guidance and ignored saved OAuth all pass, and `cargo check -p api -p commands -p rusty-claude-cli` passes. Source: gaebal-gajae policy decision 2026-04-09.
|
||||||
|
|
||||||
38. **Dead-session opacity: bot cannot self-detect compaction vs broken tool surface** -- dogfooded 2026-04-09. Jobdori session spent ~15h declaring itself "dead" in-channel while tools were actually returning correct results within each turn. Root cause: context compaction causes tool outputs to be summarised away between turns, making the bot interpret absence-of-remembered-output as tool failure. This is a distinct failure mode from ROADMAP #31 (executor quirks): the session is alive and tools are functional, but the agent cannot tell the difference between "my last tool call produced no output" (compaction) and "the tool is broken". Downstream: repetitive false-dead signals in the channel, work not getting done despite the execution surface being live. Fix shape: (a) probe with a short known-output command at turn start if context has been compacted; (b) gate "I am dead" declarations behind at least one within-turn tool call with a verified non-empty result; (c) consider adding a session-health canary cron that fires a wake with a minimal probe and checks the result. Source: Jobdori self-dogfood 2026-04-09; observed in #clawcode-building-in-public across multiple Clawhip nudge cycles.
|
38. **Dead-session opacity: bot cannot self-detect compaction vs broken tool surface** -- dogfooded 2026-04-09. Jobdori session spent ~15h declaring itself "dead" in-channel while tools were actually returning correct results within each turn. Root cause: context compaction causes tool outputs to be summarised away between turns, making the bot interpret absence-of-remembered-output as tool failure. This is a distinct failure mode from ROADMAP #31 (executor quirks): the session is alive and tools are functional, but the agent cannot tell the difference between "my last tool call produced no output" (compaction) and "the tool is broken". **Done (verified 2026-04-11):** `ConversationRuntime::run_turn()` now runs a post-compaction session-health probe through `glob_search`, fails fast with a targeted recovery error if the tool surface is broken, and skips the probe for a freshly compacted empty session. Fresh regression coverage proves both the failure gate and the empty-session bypass. Source: Jobdori self-dogfood 2026-04-09; observed in #clawcode-building-in-public across multiple Clawhip nudge cycles.
|
||||||
|
|
||||||
39. **Several slash commands are registered but not implemented: /branch, /rewind, /ide, /tag, /output-style, /add-dir** -- dogfooded 2026-04-09. These commands appear in the REPL completions surface but silently print 'Command registered but not yet implemented.' and return false. Users (mezz2301 in #claw-code) hit this as 'many features are not supported in this version now'. Fix shape: either (a) implement the missing commands, or (b) remove them from completions/help output until they are ready, so the discovery surface matches what actually works. Source: mezz2301 in #claw-code 2026-04-09; pinpointed in main.rs:3728.
|
39. **Several slash commands are registered but not implemented: /branch, /rewind, /ide, /tag, /output-style, /add-dir** -- dogfooded 2026-04-09. These commands appear in the REPL completions surface but silently print 'Command registered but not yet implemented.' and return false. Users (mezz2301 in #claw-code) hit this as 'many features are not supported in this version now'. Fix shape: either (a) implement the missing commands, or (b) remove them from completions/help output until they are ready, so the discovery surface matches what actually works. Source: mezz2301 in #claw-code 2026-04-09; pinpointed in main.rs:3728.
|
||||||
|
|
||||||
|
|||||||
@@ -1611,6 +1611,88 @@ mod tests {
|
|||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn compaction_health_probe_blocks_turn_when_tool_executor_is_broken() {
|
||||||
|
struct SimpleApi;
|
||||||
|
impl ApiClient for SimpleApi {
|
||||||
|
fn stream(
|
||||||
|
&mut self,
|
||||||
|
_request: ApiRequest,
|
||||||
|
) -> Result<Vec<AssistantEvent>, RuntimeError> {
|
||||||
|
panic!("API should not run when health probe fails");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
let mut session = Session::new();
|
||||||
|
session.record_compaction("summarized earlier work", 4);
|
||||||
|
session
|
||||||
|
.push_user_text("previous message")
|
||||||
|
.expect("message should append");
|
||||||
|
|
||||||
|
let tool_executor = StaticToolExecutor::new().register("glob_search", |_input| {
|
||||||
|
Err(ToolError::new("transport unavailable"))
|
||||||
|
});
|
||||||
|
let mut runtime = ConversationRuntime::new(
|
||||||
|
session,
|
||||||
|
SimpleApi,
|
||||||
|
tool_executor,
|
||||||
|
PermissionPolicy::new(PermissionMode::DangerFullAccess),
|
||||||
|
vec!["system".to_string()],
|
||||||
|
);
|
||||||
|
|
||||||
|
let error = runtime
|
||||||
|
.run_turn("trigger", None)
|
||||||
|
.expect_err("health probe failure should abort the turn");
|
||||||
|
assert!(
|
||||||
|
error
|
||||||
|
.to_string()
|
||||||
|
.contains("Session health probe failed after compaction"),
|
||||||
|
"unexpected error: {error}"
|
||||||
|
);
|
||||||
|
assert!(
|
||||||
|
error.to_string().contains("transport unavailable"),
|
||||||
|
"expected underlying probe error: {error}"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn compaction_health_probe_skips_empty_compacted_session() {
|
||||||
|
struct SimpleApi;
|
||||||
|
impl ApiClient for SimpleApi {
|
||||||
|
fn stream(
|
||||||
|
&mut self,
|
||||||
|
_request: ApiRequest,
|
||||||
|
) -> Result<Vec<AssistantEvent>, RuntimeError> {
|
||||||
|
Ok(vec![
|
||||||
|
AssistantEvent::TextDelta("done".to_string()),
|
||||||
|
AssistantEvent::MessageStop,
|
||||||
|
])
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
let mut session = Session::new();
|
||||||
|
session.record_compaction("fresh summary", 2);
|
||||||
|
|
||||||
|
let tool_executor = StaticToolExecutor::new().register("glob_search", |_input| {
|
||||||
|
Err(ToolError::new(
|
||||||
|
"glob_search should not run for an empty compacted session",
|
||||||
|
))
|
||||||
|
});
|
||||||
|
let mut runtime = ConversationRuntime::new(
|
||||||
|
session,
|
||||||
|
SimpleApi,
|
||||||
|
tool_executor,
|
||||||
|
PermissionPolicy::new(PermissionMode::DangerFullAccess),
|
||||||
|
vec!["system".to_string()],
|
||||||
|
);
|
||||||
|
|
||||||
|
let summary = runtime
|
||||||
|
.run_turn("trigger", None)
|
||||||
|
.expect("empty compacted session should not fail health probe");
|
||||||
|
assert_eq!(summary.auto_compaction, None);
|
||||||
|
assert_eq!(runtime.session().messages.len(), 2);
|
||||||
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn build_assistant_message_requires_message_stop_event() {
|
fn build_assistant_message_requires_message_stop_event() {
|
||||||
// given
|
// given
|
||||||
|
|||||||
Reference in New Issue
Block a user