claw-code

mirror of https://github.com/instructkr/claw-code.git synced 2026-07-18 21:08:28 +08:00

Author	SHA1	Message	Date
Yeachan-Heo	639a54275d	Stop stale branches from polluting workspace test signals Workspace-wide verification now preflights the current branch against main so stale or diverged branches surface missing commits before broad cargo tests run. The lane failure taxonomy is also collapsed to the blocker classes the roadmap lane needs so automation can branch on a smaller, stable set of categories. Constraint: Broad workspace tests should not run when main is ahead and would produce stale-branch noise Rejected: Run workspace tests unconditionally \| makes stale-branch failures indistinguishable from real regressions Confidence: medium Scope-risk: moderate Reversibility: clean Directive: Keep workspace-test preflight scoped to broad test commands until command classification grows more precise Tested: cargo test -p runtime stale_branch -- --nocapture; cargo test -p tools lane_failure_taxonomy_normalizes_common_blockers -- --nocapture; cargo test -p tools bash_workspace_tests_are_blocked_when_branch_is_behind_main -- --nocapture; cargo test -p tools bash_targeted_tests_skip_branch_preflight -- --nocapture Not-tested: clean worktree cargo test --workspace still fails on pre-existing rusty-claude-cli tests default_permission_mode_uses_project_config_when_env_is_unset and single_word_slash_command_names_return_guidance_instead_of_hitting_prompt_mode	2026-04-04 14:01:31 +00:00
Jobdori	fc675445e6	feat(tools): add lane_completion module (P1.3) Implement automatic lane completion detection: - detect_lane_completion(): checks session-finished + tests-green + pushed - evaluate_completed_lane(): triggers CloseoutLane + CleanupSession actions - 6 tests covering all conditions Bridges the gap where LaneContext::completed was a passive bool that nothing automatically set. Now completion is auto-detected. ROADMAP P1.3 marked done.	2026-04-04 22:05:49 +09:00
Jobdori	8b2f959a98	test(runtime): add worker→recovery→policy integration test Adds worker_provider_failure_flows_through_recovery_to_policy(): - Worker boots, sends prompt, encounters provider failure - observe_completion() classifies as WorkerFailureKind::Provider - from_worker_failure_kind() bridges to FailureScenario - attempt_recovery() executes RestartWorker recipe - Post-recovery context evaluates to merge-ready via PolicyEngine Completes the P2.8/P2.13 wiring verification with a full cross-module integration test. 660 tests pass.	2026-04-04 21:27:44 +09:00
Jobdori	9de97c95cc	feat(recovery): bridge WorkerFailureKind to FailureScenario (P2.8/P2.13) Connect worker_boot failure classification to recovery_recipes policy: - Add FailureScenario::ProviderFailure variant - Add FailureScenario::from_worker_failure_kind() bridge function mapping every WorkerFailureKind to a concrete FailureScenario - Add RecoveryStep::RestartWorker for provider failure recovery - Add recipe for ProviderFailure: RestartWorker -> AlertHuman escalation - 3 new tests: bridge mapping, recipe structure, recovery attempt cycle Previously a claw that detected WorkerFailureKind::Provider had no machine-readable path to 'what should I do about this?'. Now it can call from_worker_failure_kind() -> recipe_for() -> attempt_recovery() as a single structured chain. Closes the silo between worker_boot and recovery_recipes.	2026-04-04 20:07:36 +09:00
Jobdori	736069f1ab	feat(worker_boot): classify session completion failures (P2.13) Add WorkerFailureKind::Provider variant and observe_completion() method to classify degraded session completions as structured failures. - Detects finish='unknown' + zero tokens as provider failure - Detects finish='error' as provider failure - Normal completions transition to Finished state - 2 new tests verify classification behavior This closes the gap where sessions complete but produce no output, and the failure mode wasn't machine-readable for recovery policy. ROADMAP P2.13 backlog item added.	2026-04-04 19:37:57 +09:00
Jobdori	69b9232acf	test(runtime): add cross-module integration tests (P1.2) Add integration_tests.rs with 11 tests covering: - stale_branch + policy_engine: stale detection flows into policy, fresh branches don't trigger stale rules, end-to-end stale lane merge-forward action - green_contract + policy_engine: satisfied/unsatisfied contract evaluation, green level comparison for merge decisions - reconciliation + policy_engine: reconciled lanes match reconcile condition, reconciled context has correct defaults, non-reconciled lanes don't trigger reconcile rules - stale_branch module: apply_policy generates correct actions for rebase, merge-forward, warn-only, and fresh noop cases These tests verify that adjacent modules actually connect correctly — catching wiring gaps that unit tests miss. Addresses ROADMAP P1.2: cross-module integration tests.	2026-04-04 17:05:03 +09:00
Jobdori	2dfda31b26	feat(tools): wire SummaryCompressor into lane.finished event detail The SummaryCompressor (runtime::summary_compression) was exported but called nowhere. Lane events emitted a Finished variant with detail: None even when the agent produced a result string. Wire compress_summary_text() into the Finished event detail field so that: - result prose is compressed to ≤1200 chars / 24 lines before storage - duplicate lines and whitespace noise are removed - the event detail is machine-readable, not raw prose blob - None is still emitted when result is empty/None (no regression) This is the P1.4 wiring item from ROADMAP: 'Wire SummaryCompressor into the lane event pipeline — exported but called nowhere; LaneEvent stream never fed through compressor.' cargo test --workspace: 643 pass (1 pre-existing flaky), fmt clean.	2026-04-04 16:35:33 +09:00
Jobdori	d558a2d7ac	feat(policy): add lane reconciliation events and policy support Add terminal lane states for when a lane discovers its work is already landed in main, superseded by another lane, or has an empty diff: LaneEventName: - lane.reconciled — branch already merged, no action needed - lane.merged — work successfully merged - lane.superseded — work replaced by another lane/commit - lane.closed — lane manually closed PolicyAction::Reconcile with ReconcileReason enum: - AlreadyMerged — branch tip already in main - Superseded — another lane landed the same work - EmptyDiff — PR would be empty - ManualClose — operator closed the lane PolicyCondition::LaneReconciled — matches lanes that reached a no-action-required terminal state. LaneContext::reconciled() constructor for lanes that discovered they have nothing to do. This closes the gap where lanes like 9404-9410 could discover 'nothing to do' but had no typed terminal state to express it. The policy engine can now auto-closeout reconciled lanes instead of leaving them in limbo. Addresses ROADMAP P1.3 (lane-completion emitter) groundwork. Tests: 4 new tests covering reconcile rule firing, context defaults, non-reconciled lanes not triggering reconcile rules, and reason variant distinctness. Full workspace suite: 643 pass, 0 fail.	2026-04-04 16:12:06 +09:00
Yeachan-Heo	ac3ad57b89	fix(ci): apply rustfmt to main	2026-04-04 02:18:52 +00:00
Jobdori	3327d0e3fe	fix(tests): isolate render_diff_report tests from real working-tree state Replace with_current_dir+render_diff_report() with direct render_diff_report_for(&root) calls in the three diff-report tests. The env_lock mutex only serializes within one test binary; cargo test --workspace runs binaries in parallel, so set_current_dir races were possible across binaries. render_diff_report_for(cwd) accepts an explicit path and requires no global state mutation, making the tests reliably green under full workspace parallelism.	2026-04-04 05:33:18 +09:00
Jobdori	6d35399a12	fix: resolve merge conflicts in lib.rs re-exports	2026-04-04 00:48:26 +09:00
Jobdori	a1aba3c64a	merge: ultraclaw/recovery-recipes into main	2026-04-04 00:45:14 +09:00
Jobdori	4ee76ee7f4	merge: ultraclaw/summary-compression into main	2026-04-04 00:45:13 +09:00
Jobdori	6d7c617679	merge: ultraclaw/session-control-api into main	2026-04-04 00:45:12 +09:00
Jobdori	5ad05c68a3	merge: ultraclaw/mcp-lifecycle-harden into main	2026-04-04 00:45:12 +09:00
Jobdori	eff9404d30	merge: ultraclaw/green-contract into main	2026-04-04 00:45:11 +09:00
Jobdori	d126a3dca4	merge: ultraclaw/trust-resolver into main	2026-04-04 00:45:10 +09:00
Jobdori	a91e855d22	merge: ultraclaw/plugin-lifecycle into main	2026-04-04 00:45:10 +09:00
Jobdori	db97aa3da3	merge: ultraclaw/policy-engine into main	2026-04-04 00:45:09 +09:00
Jobdori	ba08b0eb93	merge: ultraclaw/task-packet into main	2026-04-04 00:45:08 +09:00
Jobdori	d9644cd13a	feat(runtime): trust prompt resolver	2026-04-04 00:44:08 +09:00
Jobdori	8321fd0c6b	feat(runtime): actionable summary compression for lane event streams	2026-04-04 00:43:30 +09:00
Jobdori	c18f8a0da1	feat(runtime): structured session control API for claw-native worker management	2026-04-04 00:43:30 +09:00
Jobdori	c5aedc6e4e	feat(runtime): stale branch detection	2026-04-04 00:42:55 +09:00
Jobdori	13015f6428	feat(runtime): hardened MCP lifecycle with phase tracking and degraded-mode reporting	2026-04-04 00:42:43 +09:00
JobdoriandSisyphus	f12cb76d6f	feat(runtime): green-ness contract Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-04 00:42:41 +09:00
Jobdori	2787981632	feat(runtime): recovery recipes	2026-04-04 00:42:39 +09:00
Jobdori	b543760d03	feat(runtime): trust prompt resolver with allowlist and events	2026-04-04 00:42:28 +09:00
Jobdori	18340b561e	feat(runtime): first-class plugin lifecycle contract with degraded-mode support	2026-04-04 00:41:51 +09:00
Jobdori	d74ecf7441	feat(runtime): policy engine for autonomous lane management	2026-04-04 00:40:50 +09:00
Jobdori	e1db949353	feat(runtime): typed task packet format for structured claw dispatch	2026-04-04 00:40:20 +09:00
Jobdori	02634d950e	feat(runtime): stale-branch detection with freshness check and policy	2026-04-04 00:40:01 +09:00
Jobdori	f5e94f3c92	feat(runtime): plugin lifecycle	2026-04-04 00:38:35 +09:00
Yeachan-Heo	f76311f9d6	Prevent worker prompts from outrunning boot readiness Add a foundational worker_boot control plane and tool surface for reliable startup. The new registry tracks trust gates, ready-for-prompt handshakes, prompt delivery attempts, and shell misdelivery recovery so callers can coordinate worker boot above raw terminal transport. Constraint: Current main has no tmux-backed worker control API to extend directly Constraint: First slice must stay deterministic and fully testable in-process Rejected: Wire the first implementation straight to tmux panes \| would couple transport details to unfinished state semantics Rejected: Ship parser helpers without control tools \| would not enforce the ready-before-prompt contract end to end Confidence: high Scope-risk: moderate Reversibility: clean Directive: Treat WorkerObserve heuristics as a temporary transport adapter and replace them with typed runtime events before widening automation policy Tested: cargo test -p runtime worker_boot Tested: cargo test -p tools worker_tools Tested: cargo check -p runtime -p tools Not-tested: Real tmux/TTY trust prompts and live worker boot on an actual coding session Not-tested: Full cargo clippy -p runtime -p tools --all-targets -- -D warnings (fails on pre-existing warnings outside this slice)	2026-04-03 15:20:22 +00:00
Yeachan-Heo	56ee33e057	Make agent lane state machine-readable The background Agent tool already persisted lane-adjacent state via a JSON manifest and a markdown transcript, making it the smallest viable vertical slice for the ROADMAP lane-event work. This change adds canonical typed lane events to the manifest and normalizes terminal blockers into the shared failure taxonomy so downstream clawhip-style consumers can branch on structured state instead of scraping prose alone. The slice is intentionally narrow: it covers agent start, finish, blocked, and failed transitions plus blocker classification, while leaving broader lane orchestration and external consumers for later phases. Tests lock the manifest schema and taxonomy mapping so future extensions can add events without regressing the typed baseline. Constraint: Land a fresh-main vertical slice without inventing a larger lane framework first Rejected: Add a brand-new lane subsystem across crates \| too broad for one verified slice Rejected: Only add markdown log annotations \| still log-shaped and not machine-first Confidence: high Scope-risk: narrow Reversibility: clean Directive: Extend the same event names and failure classes before adding any alternate manifest schema for lane reporting Tested: cargo test -p tools agent_persists_handoff_metadata -- --nocapture Tested: cargo test -p tools agent_fake_runner_can_persist_completion_and_failure -- --nocapture Tested: cargo test -p tools lane_failure_taxonomy_normalizes_common_blockers -- --nocapture Not-tested: Full clawhip consumer integration or multi-crate event plumbing	2026-04-03 15:20:22 +00:00
Yeachan-Heo	bf5eb8785e	Recover the MCP lane on top of current main This resolves the stale-branch merge against origin/main, keeps the MCP runtime wiring, and preserves prompt-approved CLI tool execution after the mock parity harness additions landed upstream. Constraint: Branch had to absorb origin/main changes through a contentful merge before more MCP work Constraint: Prompt-approved runtime tool execution must continue working with new CLI/mock parity coverage Rejected: Keep permission enforcer attached inside CliToolExecutor for conversation turns \| caused prompt-approved bash parity flow to fail as a tool error Rejected: Defer the merge and continue on stale history \| would leave the lane red against current main Confidence: high Scope-risk: moderate Reversibility: clean Directive: Runtime permission policy and executor-side permission enforcement are separate layers; do not reapply executor enforcement to conversation turns without revalidating mock parity harness approval flows Tested: cargo test -p rusty-claude-cli --test mock_parity_harness -- --nocapture; cargo test -p rusty-claude-cli -- --nocapture; cargo test --workspace -- --nocapture Not-tested: Additional live remote/provider scenarios beyond the existing workspace suite	2026-04-03 14:51:18 +00:00
Yeachan-Heo	b3fe057559	Close the MCP lifecycle gap from config to runtime tool execution This wires configured MCP servers into the CLI/runtime path so discovered MCP tools, resource wrappers, search visibility, shutdown handling, and best-effort discovery all work together instead of living as isolated runtime primitives. Constraint: Keep non-MCP startup flows working without new required config Constraint: Preserve partial availability when one configured MCP server fails discovery Rejected: Fail runtime startup on any MCP discovery error \| too brittle for mixed healthy/broken server configs Rejected: Keep MCP support runtime-only without registry wiring \| left discovery and invocation unreachable from the CLI tool lane Confidence: high Scope-risk: moderate Reversibility: clean Directive: Runtime MCP tools are registry-backed but executed through CliToolExecutor state; keep future tool-registry changes aligned with that split Tested: cargo test -p runtime mcp -- --nocapture; cargo test -p tools -- --nocapture; cargo test -p rusty-claude-cli -- --nocapture; cargo test --workspace -- --nocapture Not-tested: Live remote MCP transports (http/sse/ws/sdk) remain unsupported in the CLI execution path	2026-04-03 14:31:25 +00:00
Jobdori	a2351fe867	feat(harness+usage): add auto_compact and token_cost parity scenarios Two new mock parity harness scenarios: 1. auto_compact_triggered (session-compaction category) - Mock returns 50k input tokens, validates auto_compaction key is present in JSON output - Validates format parity; trigger behavior covered by conversation::tests::auto_compacts_when_cumulative_input_threshold_is_crossed 2. token_cost_reporting (token-usage category) - Mock returns known token counts (1k input, 500 output) - Validates input/output token fields present in JSON output Additional changes: - Add estimated_cost to JSON prompt output (format_usd + pricing_for_model) - Add final_text_sse_with_usage and text_message_response_with_usage helpers to mock-anthropic-service for parameterized token counts - Add ScenarioCase.extra_env and ScenarioCase.resume_session fields - Update mock_parity_scenarios.json: 10 -> 12 scenarios - Update harness request count assertion: 19 -> 21 cargo test --workspace: 558 passed, 0 failed	2026-04-03 22:41:42 +09:00
Jobdori	6325add99e	fix(tests): add env_lock to permission-sensitive CLI arg tests Tests relying on PermissionMode::DangerFullAccess as default were flaky under --workspace runs because other tests set RUSTY_CLAUDE_PERMISSION_MODE without cleanup. Added env_lock() and explicit env var removal to 7 affected tests. Fixes: workspace-level cargo test flake (1 random test fails per run)	2026-04-03 22:07:12 +09:00
Jobdori	bd9c145ea1	feat(commands): reach upstream slash command parity — 135 → 141 specs Add 6 final slash commands: - agent: manage sub-agents and spawned sessions - subagent: control active subagent execution - reasoning: toggle extended reasoning mode - budget: show/set token budget limits - rate-limit: configure API rate limiting - metrics: show performance and usage metrics Reach upstream parity target of 141 slash command specs.	2026-04-03 19:55:12 +09:00
Jobdori	0490636031	feat(commands): expand slash command surface 67 → 135 specs Add 68 new slash command specs covering: - Approval flow: approve/deny - Editing: undo, retry, paste, image, screenshot - Code ops: test, lint, build, run, fix, refactor, explain, docs, perf - Git: git, stash, blame, log - LSP: symbols, references, definition, hover, diagnostics, autofix - Navigation: focus/unfocus, web, map, search, workspace - Model: max-tokens, temperature, system-prompt, tool-details - Session: history, tokens, cache, pin/unpin, bookmarks, format - Infra: cron, team, parallel, multi, macro, alias - Config: api-key, language, profile, telemetry, env, project - Other: providers, notifications, changelog, templates, benchmark, migrate, reset Update tests: flexible assertions for expanded command surface	2026-04-03 19:52:40 +09:00
Jobdori	80ad9f4195	feat(tools): replace AskUserQuestion + RemoteTrigger stubs with real implementations - AskUserQuestion: interactive stdin/stdout prompt with numbered options - RemoteTrigger: real HTTP client (GET/POST/PUT/DELETE/PATCH/HEAD) with custom headers, body, 30s timeout, response truncation - All 480+ tests green	2026-04-03 19:37:34 +09:00
Jobdori	1cfd78ac61	feat: bash validation module + output truncation parity - Add bash_validation.rs with 9 submodules (1004 lines): readOnlyValidation, destructiveCommandWarning, modeValidation, sedValidation, pathValidation, commandSemantics, bashPermissions, bashSecurity, shouldUseSandbox - Wire into runtime lib.rs - Add MAX_OUTPUT_BYTES (16KB) truncation to bash.rs - Add 4 truncation tests, all passing - Full test suite: 270+ green	2026-04-03 19:31:49 +09:00
Jobdori	ddae15dede	fix(enforcer): defer to caller prompt flow when active mode is Prompt The PermissionEnforcer was hard-denying tool calls that needed user approval because it passes no prompter to authorize(). When the active permission mode is Prompt, the enforcer now returns Allowed and defers to the CLI's interactive approval flow. Fixes: mock_parity_harness bash_permission_prompt_approved scenario	2026-04-03 18:39:14 +09:00
Jobdori	8cc7d4c641	chore: additional AI slop cleanup and enforcer wiring from sessions 1/5 Session 1 (ses_2ad65873): with_enforcer builders + 2 regression tests Session 5 (ses_2ad67e8e): continued AI slop cleanup pass — redundant comments, unused_self suppressions, unreachable! tightening Session cleanup (ses_2ad6b26c): Python placeholder centralization Workspace tests: 363+ passed, 0 failed.	2026-04-03 18:35:27 +09:00
Jobdori	618a79a9f4	feat: ultraclaw session outputs — registry tests, MCP bridge, PARITY.md, cleanup Ultraclaw mode results from 10 parallel opencode sessions: - PARITY.md: Updated both copies with all 9 landed lanes, commit hashes, line counts, and test counts. All checklist items marked complete. - MCP bridge: McpToolRegistry.call_tool now wired to real McpServerManager via async JSON-RPC (discover_tools -> tools/call -> shutdown) - Registry tests: Added coverage for TaskRegistry, TeamRegistry, CronRegistry, PermissionEnforcer, LspRegistry (branch-focused tests) - Permissions refactor: Simplified authorize_with_context, extracted helpers, added characterization tests (185 runtime tests pass) - AI slop cleanup: Removed redundant comments, unused_self suppressions, tightened unreachable branches - CLI fixes: Minor adjustments in main.rs and hooks.rs All 363+ tests pass. Workspace compiles clean.	2026-04-03 18:23:03 +09:00
Jobdori	f25363e45d	fix(tools): wire PermissionEnforcer into execute_tool dispatch path The review correctly identified that enforce_permission_check() was defined but never called. This commit: - Adds enforcer: Option<PermissionEnforcer> field to GlobalToolRegistry and SubagentToolExecutor - Adds set_enforcer() method for runtime configuration - Gates both execute() paths through enforce_permission_check() when an enforcer is configured - Default: None (Allow-all, matching existing behavior) Resolves the dead-code finding from ultraclaw review sessions 3 and 8.	2026-04-03 18:18:19 +09:00
Jobdori	66283f4dc9	feat(runtime+tools): PermissionEnforcer — permission mode enforcement layer Add PermissionEnforcer in crates/runtime/src/permission_enforcer.rs and wire enforce_permission_check() into crates/tools/src/lib.rs. Runtime additions: - PermissionEnforcer: wraps PermissionPolicy with enforcement API - check(tool, input): validates tool against active mode via policy.authorize() - check_file_write(path, workspace_root): workspace boundary enforcement - ReadOnly: deny all writes - WorkspaceWrite: allow within workspace, deny outside - DangerFullAccess/Allow: permit all - Prompt: deny (no prompter available) - check_bash(command): read-only command heuristic (60+ safe commands) - Detects -i/--in-place/redirect operators as non-read-only - is_within_workspace(): string-prefix boundary check - is_read_only_command(): conservative allowlist of safe CLI commands Tool wiring: - enforce_permission_check() public API for gating execute_tool() calls - Maps EnforcementResult::Denied to Err(reason) for tool dispatch 9 new tests covering all permission modes + workspace boundary + bash heuristic.	2026-04-03 17:55:04 +09:00
Jobdori	2d665039f8	feat(runtime+tools): LspRegistry — LSP client dispatch for tool surface Add LspRegistry in crates/runtime/src/lsp_client.rs and wire it into run_lsp() tool handler in crates/tools/src/lib.rs. Runtime additions: - LspRegistry: register/get servers by language, find server by file extension, manage diagnostics, dispatch LSP actions - LspAction enum (Diagnostics/Hover/Definition/References/Completion/Symbols/Format) - LspServerStatus enum (Connected/Disconnected/Starting/Error) - Diagnostic/Location/Hover/CompletionItem/Symbol types for structured responses - Action dispatch validates server status and path requirements Tool wiring: - run_lsp() maps LspInput to LspRegistry.dispatch() - Supports dynamic server lookup by file extension (rust/ts/js/py/go/java/c/cpp/rb/lua) - Caches diagnostics across servers 8 new tests covering registration, lookup, diagnostics, and dispatch paths. Bridges to existing LSP process manager for actual JSON-RPC execution.	2026-04-03 17:46:13 +09:00
Jobdori	730667f433	feat(runtime+tools): McpToolRegistry — MCP lifecycle bridge for tool surface Add McpToolRegistry in crates/runtime/src/mcp_tool_bridge.rs and wire it into all 4 MCP tool handlers in crates/tools/src/lib.rs. Runtime additions: - McpToolRegistry: register/get/list servers, list/read resources, call tools, set auth status, disconnect - McpConnectionStatus enum (Disconnected/Connecting/Connected/AuthRequired/Error) - Connection-state validation (reject ops on disconnected servers) - Resource URI lookup, tool name validation before dispatch Tool wiring: - ListMcpResources: queries registry for server resources - ReadMcpResource: looks up specific resource by URI - McpAuth: returns server auth/connection status - MCP (tool proxy): validates + dispatches tool calls through registry 8 new tests covering all lifecycle paths + error cases. Bridges to existing McpServerManager for actual JSON-RPC execution.	2026-04-03 17:39:35 +09:00

1 2 3 4 5 ...