mirror of
https://github.com/instructkr/claw-code.git
synced 2026-04-13 03:24:49 +08:00
Make completed review lanes emit machine-readable verdicts
The next repo-local sweep target was ROADMAP #67: scoped review lanes could stop with prose-only output, leaving downstream consumers to infer approval or rejection from later chatter. The fix teaches completed lane persistence to recognize review-style `APPROVE`/`REJECT`/`BLOCKED` results, attach structured verdict metadata to `lane.finished`, and keep ordinary non-review lanes on the existing quality-floor path. Constraint: Preserve the existing non-review lane summary path while enriching only review-style completions Rejected: Add a brand-new lane event type just for review results | unnecessary when `lane.finished` already carries structured metadata and downstream consumers can read it there Confidence: high Scope-risk: narrow Reversibility: clean Directive: If review verdict parsing changes later, update `extract_review_outcome`, the finished-event payload fields, and the review-lane regression together Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE Not-tested: External consumers that may still ignore `lane.finished.data.reviewVerdict`
This commit is contained in:
@@ -496,7 +496,7 @@ Model name prefix now wins unconditionally over env-var presence. Regression tes
|
|||||||
|
|
||||||
62. **Worker state file surface not implemented** — **done (verified 2026-04-12):** current `main` already wires `emit_state_file(worker)` into the worker transition path in `rust/crates/runtime/src/worker_boot.rs`, atomically writes `.claw/worker-state.json`, and exposes the documented reader surface through `claw state` / `claw state --output-format json` in `rust/crates/rusty-claude-cli/src/main.rs`. Fresh proof exists in `runtime` regression `emit_state_file_writes_worker_status_on_transition`, the end-to-end `tools` regression `recovery_loop_state_file_reflects_transitions`, and direct CLI parsing coverage for `state` / `state --output-format json`. Source: Jobdori dogfood.
|
62. **Worker state file surface not implemented** — **done (verified 2026-04-12):** current `main` already wires `emit_state_file(worker)` into the worker transition path in `rust/crates/runtime/src/worker_boot.rs`, atomically writes `.claw/worker-state.json`, and exposes the documented reader surface through `claw state` / `claw state --output-format json` in `rust/crates/rusty-claude-cli/src/main.rs`. Fresh proof exists in `runtime` regression `emit_state_file_writes_worker_status_on_transition`, the end-to-end `tools` regression `recovery_loop_state_file_reflects_transitions`, and direct CLI parsing coverage for `state` / `state --output-format json`. Source: Jobdori dogfood.
|
||||||
|
|
||||||
**Scope note (verified 2026-04-12):** ROADMAP #31, #43, and #63-#68 currently appear to describe acpx/droid or upstream OMX/server orchestration behavior, not claw-code source already present in this repository. Repo-local searches for `acpx`, `use-droid`, `run-acpx`, `commit-wrapper`, `ultraclaw`, `roadmap-nudge-10min`, `OMX_TMUX_INJECT`, `/hooks/health`, and `/hooks/status` found no implementation hits outside `ROADMAP.md`, and the earlier state-surface note already records that the HTTP server is not owned by claw-code. With #45 and #69 now fixed, the remaining unresolved items in this section look like external tracking notes rather than confirmed repo-local backlog; re-check if new repo-local evidence appears.
|
**Scope note (verified 2026-04-12):** ROADMAP #31, #43, and #63-#68 currently appear to describe acpx/droid or upstream OMX/server orchestration behavior, not claw-code source already present in this repository. Repo-local searches for `acpx`, `use-droid`, `run-acpx`, `commit-wrapper`, `ultraclaw`, `roadmap-nudge-10min`, `OMX_TMUX_INJECT`, `/hooks/health`, and `/hooks/status` found no implementation hits outside `ROADMAP.md`, and the earlier state-surface note already records that the HTTP server is not owned by claw-code. With #45, #67, and #69 now fixed, the remaining unresolved items in this section look like external tracking notes rather than confirmed repo-local backlog; re-check if new repo-local evidence appears.
|
||||||
|
|
||||||
63. **Droid session completion semantics broken: code arrives after "status: completed"** — dogfooded 2026-04-12. Ultraclaw droid sessions (use-droid via acpx) report `session.status: completed` before file writes are fully flushed/synced to the working tree. Discovered +410 lines of "late-arriving" droid output that appeared after I had already assessed 8 sessions as "no code produced." This creates false-negative assessments and duplicate work. **Fix shape:** (a) droid agent should only report completion after explicit file-write confirmation (fsync or existence check); (b) or, claw-code should expose a `pending_writes` status that indicates "agent responded, disk flush pending"; (c) lane orchestrators should poll for file changes for N seconds after completion before final assessment. **Blocker:** none. Source: Jobdori ultraclaw dogfood 2026-04-12.
|
63. **Droid session completion semantics broken: code arrives after "status: completed"** — dogfooded 2026-04-12. Ultraclaw droid sessions (use-droid via acpx) report `session.status: completed` before file writes are fully flushed/synced to the working tree. Discovered +410 lines of "late-arriving" droid output that appeared after I had already assessed 8 sessions as "no code produced." This creates false-negative assessments and duplicate work. **Fix shape:** (a) droid agent should only report completion after explicit file-write confirmation (fsync or existence check); (b) or, claw-code should expose a `pending_writes` status that indicates "agent responded, disk flush pending"; (c) lane orchestrators should poll for file changes for N seconds after completion before final assessment. **Blocker:** none. Source: Jobdori ultraclaw dogfood 2026-04-12.
|
||||||
|
|
||||||
@@ -506,7 +506,7 @@ Model name prefix now wins unconditionally over env-var presence. Regression tes
|
|||||||
|
|
||||||
66. **Completion-aware reminder shutdown missing** — dogfooded 2026-04-12. Ultraclaw batch completed and was reported as done, but 10-minute cron reminder (`roadmap-nudge-10min`) kept firing into channel as if work still pending. Reminder/cron state not coupled to terminal task state. **Fix shape:** (a) cron jobs should check task completion state before firing; (b) or, provide explicit `cron.remove` on task completion; (c) or, reminders should include "work complete" detection and auto-expire. Blocker: none. Source: gaebal-gajae dogfood analysis 2026-04-12.
|
66. **Completion-aware reminder shutdown missing** — dogfooded 2026-04-12. Ultraclaw batch completed and was reported as done, but 10-minute cron reminder (`roadmap-nudge-10min`) kept firing into channel as if work still pending. Reminder/cron state not coupled to terminal task state. **Fix shape:** (a) cron jobs should check task completion state before firing; (b) or, provide explicit `cron.remove` on task completion; (c) or, reminders should include "work complete" detection and auto-expire. Blocker: none. Source: gaebal-gajae dogfood analysis 2026-04-12.
|
||||||
|
|
||||||
67. **Scoped review lanes do not emit structured verdicts** — dogfooded 2026-04-12. OMX review lanes now have improved scope (specific ROADMAP items, specific files, explicit APPROVE/REJECT contract), but the stop event only contains the review request — not the actual verdict. Operators must infer approval/rejection/blockage from later git commits or surrounding chatter. **Fix shape:** emit structured review result on stop with: `verdict: approve|reject|blocked`, `target: commit/diff reviewed`, `rationale: short summary`. Blocker: none. Source: gaebal-gajae dogfood analysis 2026-04-12.
|
67. **Scoped review lanes do not emit structured verdicts** — **done (verified 2026-04-12):** completed lane persistence in `rust/crates/tools/src/lib.rs` now recognizes review-style `APPROVE`/`REJECT`/`BLOCKED` results and records structured `reviewVerdict`, `reviewTarget`, and `reviewRationale` metadata on the `lane.finished` event while preserving existing non-review lane behavior. Regression coverage locks both the normal completion path and a scoped review-lane completion payload. **Original filing below.**
|
||||||
|
|
||||||
68. **Internal reinjection/resume paths leak opaque control prose** — dogfooded 2026-04-12. OMX lanes stopping with `Continue from current mode state. [OMX_TMUX_INJECT]` expose internal implementation details instead of operator-meaningful state. The event tells us *that* tmux reinjection happened, but not *why* (retry after failure? resume after idle? manual recovery?), *what state was preserved*, or *what the lane was trying to do*. **Fix shape:** recovery/reinject events should emit structured cause like: `resume_after_stop`, `retry_after_tool_failure`, `tmux_reinject_after_idle`, `manual_recovery` plus preserved state / target lane info. Never leak bare internal markers like `[OMX_TMUX_INJECT]` as the primary summary. Blocker: none. Source: gaebal-gajae dogfood analysis 2026-04-12.
|
68. **Internal reinjection/resume paths leak opaque control prose** — dogfooded 2026-04-12. OMX lanes stopping with `Continue from current mode state. [OMX_TMUX_INJECT]` expose internal implementation details instead of operator-meaningful state. The event tells us *that* tmux reinjection happened, but not *why* (retry after failure? resume after idle? manual recovery?), *what state was preserved*, or *what the lane was trying to do*. **Fix shape:** recovery/reinject events should emit structured cause like: `resume_after_stop`, `retry_after_tool_failure`, `tmux_reinject_after_idle`, `manual_recovery` plus preserved state / target lane info. Never leak bare internal markers like `[OMX_TMUX_INJECT]` as the primary summary. Blocker: none. Source: gaebal-gajae dogfood analysis 2026-04-12.
|
||||||
|
|
||||||
|
|||||||
@@ -3783,6 +3783,11 @@ fn persist_agent_terminal_state(
|
|||||||
}
|
}
|
||||||
|
|
||||||
const MIN_LANE_SUMMARY_WORDS: usize = 7;
|
const MIN_LANE_SUMMARY_WORDS: usize = 7;
|
||||||
|
const REVIEW_VERDICTS: &[(&str, &str)] = &[
|
||||||
|
("APPROVE", "approve"),
|
||||||
|
("REJECT", "reject"),
|
||||||
|
("BLOCKED", "blocked"),
|
||||||
|
];
|
||||||
const CONTROL_ONLY_SUMMARY_WORDS: &[&str] = &[
|
const CONTROL_ONLY_SUMMARY_WORDS: &[&str] = &[
|
||||||
"ack",
|
"ack",
|
||||||
"commit",
|
"commit",
|
||||||
@@ -3831,6 +3836,12 @@ struct LaneFinishedSummaryData {
|
|||||||
raw_summary: Option<String>,
|
raw_summary: Option<String>,
|
||||||
#[serde(rename = "wordCount")]
|
#[serde(rename = "wordCount")]
|
||||||
word_count: usize,
|
word_count: usize,
|
||||||
|
#[serde(rename = "reviewVerdict", skip_serializing_if = "Option::is_none")]
|
||||||
|
review_verdict: Option<String>,
|
||||||
|
#[serde(rename = "reviewTarget", skip_serializing_if = "Option::is_none")]
|
||||||
|
review_target: Option<String>,
|
||||||
|
#[serde(rename = "reviewRationale", skip_serializing_if = "Option::is_none")]
|
||||||
|
review_rationale: Option<String>,
|
||||||
}
|
}
|
||||||
|
|
||||||
#[derive(Debug, Clone)]
|
#[derive(Debug, Clone)]
|
||||||
@@ -3844,6 +3855,13 @@ struct LaneSummaryAssessment {
|
|||||||
apply_quality_floor: bool,
|
apply_quality_floor: bool,
|
||||||
reasons: Vec<String>,
|
reasons: Vec<String>,
|
||||||
word_count: usize,
|
word_count: usize,
|
||||||
|
review_outcome: Option<ReviewLaneOutcome>,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone)]
|
||||||
|
struct ReviewLaneOutcome {
|
||||||
|
verdict: String,
|
||||||
|
rationale: Option<String>,
|
||||||
}
|
}
|
||||||
|
|
||||||
fn build_lane_finished_summary(
|
fn build_lane_finished_summary(
|
||||||
@@ -3857,6 +3875,12 @@ fn build_lane_finished_summary(
|
|||||||
Some(summary) => Some(compose_lane_summary_fallback(manifest, Some(summary))),
|
Some(summary) => Some(compose_lane_summary_fallback(manifest, Some(summary))),
|
||||||
None => Some(compose_lane_summary_fallback(manifest, None)),
|
None => Some(compose_lane_summary_fallback(manifest, None)),
|
||||||
};
|
};
|
||||||
|
let review_outcome = assessment.review_outcome.clone();
|
||||||
|
let review_target = review_outcome
|
||||||
|
.as_ref()
|
||||||
|
.map(|_| manifest.description.trim())
|
||||||
|
.filter(|value| !value.is_empty())
|
||||||
|
.map(str::to_string);
|
||||||
|
|
||||||
LaneFinishedSummary {
|
LaneFinishedSummary {
|
||||||
detail,
|
detail,
|
||||||
@@ -3865,6 +3889,11 @@ fn build_lane_finished_summary(
|
|||||||
reasons: assessment.reasons,
|
reasons: assessment.reasons,
|
||||||
raw_summary: raw_summary.map(str::to_string),
|
raw_summary: raw_summary.map(str::to_string),
|
||||||
word_count: assessment.word_count,
|
word_count: assessment.word_count,
|
||||||
|
review_verdict: review_outcome
|
||||||
|
.as_ref()
|
||||||
|
.map(|outcome| outcome.verdict.clone()),
|
||||||
|
review_target,
|
||||||
|
review_rationale: review_outcome.and_then(|outcome| outcome.rationale),
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -3882,11 +3911,13 @@ fn assess_lane_summary_quality(summary: &str) -> LaneSummaryAssessment {
|
|||||||
reasons.push(String::from("empty"));
|
reasons.push(String::from("empty"));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
let review_outcome = extract_review_outcome(summary);
|
||||||
|
|
||||||
let control_only = !words.is_empty()
|
let control_only = !words.is_empty()
|
||||||
&& words
|
&& words
|
||||||
.iter()
|
.iter()
|
||||||
.all(|word| CONTROL_ONLY_SUMMARY_WORDS.contains(&word.as_str()));
|
.all(|word| CONTROL_ONLY_SUMMARY_WORDS.contains(&word.as_str()));
|
||||||
if control_only {
|
if control_only && review_outcome.is_none() {
|
||||||
reasons.push(String::from("control_only"));
|
reasons.push(String::from("control_only"));
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -3894,6 +3925,7 @@ fn assess_lane_summary_quality(summary: &str) -> LaneSummaryAssessment {
|
|||||||
|| summary.contains('/')
|
|| summary.contains('/')
|
||||||
|| summary.contains(':')
|
|| summary.contains(':')
|
||||||
|| summary.contains('#')
|
|| summary.contains('#')
|
||||||
|
|| review_outcome.is_some()
|
||||||
|| words
|
|| words
|
||||||
.iter()
|
.iter()
|
||||||
.any(|word| CONTEXTUAL_SUMMARY_WORDS.contains(&word.as_str()));
|
.any(|word| CONTEXTUAL_SUMMARY_WORDS.contains(&word.as_str()));
|
||||||
@@ -3905,6 +3937,7 @@ fn assess_lane_summary_quality(summary: &str) -> LaneSummaryAssessment {
|
|||||||
apply_quality_floor: !reasons.is_empty(),
|
apply_quality_floor: !reasons.is_empty(),
|
||||||
reasons,
|
reasons,
|
||||||
word_count,
|
word_count,
|
||||||
|
review_outcome,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -3928,6 +3961,24 @@ fn compose_lane_summary_fallback(manifest: &AgentOutput, raw_summary: Option<&st
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
fn extract_review_outcome(summary: &str) -> Option<ReviewLaneOutcome> {
|
||||||
|
let mut lines = summary
|
||||||
|
.lines()
|
||||||
|
.map(str::trim)
|
||||||
|
.filter(|line| !line.is_empty());
|
||||||
|
let first = lines.next()?;
|
||||||
|
let verdict = REVIEW_VERDICTS.iter().find_map(|(prefix, verdict)| {
|
||||||
|
first
|
||||||
|
.eq_ignore_ascii_case(prefix)
|
||||||
|
.then(|| (*verdict).to_string())
|
||||||
|
})?;
|
||||||
|
let rationale = lines.collect::<Vec<_>>().join(" ").trim().to_string();
|
||||||
|
Some(ReviewLaneOutcome {
|
||||||
|
verdict,
|
||||||
|
rationale: (!rationale.is_empty()).then_some(compress_summary_text(&rationale)),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
fn derive_agent_state(
|
fn derive_agent_state(
|
||||||
status: &str,
|
status: &str,
|
||||||
result: Option<&str>,
|
result: Option<&str>,
|
||||||
@@ -7522,6 +7573,46 @@ mod tests {
|
|||||||
"control_only"
|
"control_only"
|
||||||
);
|
);
|
||||||
|
|
||||||
|
let review = execute_agent_with_spawn(
|
||||||
|
AgentInput {
|
||||||
|
description: "Review commit 1234abcd for ROADMAP #67".to_string(),
|
||||||
|
prompt: "Review the scoped diff".to_string(),
|
||||||
|
subagent_type: Some("Verification".to_string()),
|
||||||
|
name: Some("review-lane".to_string()),
|
||||||
|
model: None,
|
||||||
|
},
|
||||||
|
|job| {
|
||||||
|
persist_agent_terminal_state(
|
||||||
|
&job.manifest,
|
||||||
|
"completed",
|
||||||
|
Some("APPROVE\n\nTarget: commit 1234abcd\nRationale: scoped diff is safe."),
|
||||||
|
None,
|
||||||
|
)
|
||||||
|
},
|
||||||
|
)
|
||||||
|
.expect("review agent should succeed");
|
||||||
|
|
||||||
|
let review_manifest =
|
||||||
|
std::fs::read_to_string(&review.manifest_file).expect("review manifest should exist");
|
||||||
|
let review_manifest_json: serde_json::Value =
|
||||||
|
serde_json::from_str(&review_manifest).expect("review manifest json");
|
||||||
|
assert_eq!(
|
||||||
|
review_manifest_json["laneEvents"][1]["data"]["reviewVerdict"],
|
||||||
|
"approve"
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
review_manifest_json["laneEvents"][1]["data"]["reviewTarget"],
|
||||||
|
"Review commit 1234abcd for ROADMAP #67"
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
review_manifest_json["laneEvents"][1]["data"]["reviewRationale"],
|
||||||
|
"Target: commit 1234abcd Rationale: scoped diff is safe."
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
review_manifest_json["laneEvents"][1]["data"]["qualityFloorApplied"],
|
||||||
|
false
|
||||||
|
);
|
||||||
|
|
||||||
let spawn_error = execute_agent_with_spawn(
|
let spawn_error = execute_agent_with_spawn(
|
||||||
AgentInput {
|
AgentInput {
|
||||||
description: "Spawn error task".to_string(),
|
description: "Spawn error task".to_string(),
|
||||||
|
|||||||
Reference in New Issue
Block a user