Lock down CLI-to-mock behavioral parity for Anthropic flows

This adds a deterministic mock Anthropic-compatible /v1/messages service, a clean-environment CLI harness, and repo docs so the first parity milestone can be validated without live network dependencies. Constraint: First milestone must prove Rust claw can connect from a clean environment and cover streaming, tool assembly, and permission/tool flow Constraint: No new third-party dependencies; reuse the existing Rust workspace stack Rejected: Record/replay live Anthropic traffic | nondeterministic and unsuitable for repeatable CI coverage Confidence: high Scope-risk: moderate Reversibility: clean Directive: Keep scenario markers and expected tool payload shapes synchronized between the mock service and the harness tests Tested: cargo fmt --all Tested: cargo clippy --workspace --all-targets -- -D warnings Tested: cargo test --workspace Tested: ./scripts/run_mock_parity_harness.sh Not-tested: Live Anthropic responses beyond the five scripted harness scenarios
2026-07-18 21:08:28 +08:00 · 2026-04-03 01:15:52 +00:00
parent 1abd951e57
commit c2f1304a01
10 changed files with 1115 additions and 2 deletions
@@ -2,6 +2,12 @@

 Last updated: 2026-04-03 (`03bd7f0`)

+## Mock parity harness — milestone 1
+
+- [x] Deterministic Anthropic-compatible mock service (`rust/crates/mock-anthropic-service`)
+- [x] Reproducible clean-environment CLI harness (`rust/crates/rusty-claude-cli/tests/mock_parity_harness.rs`)
+- [x] Scripted scenarios: `streaming_text`, `read_file_roundtrip`, `grep_chunk_assembly`, `write_file_allowed`, `write_file_denied`
+
 ## Tool Surface: 40/40 (spec parity)

 ### Real Implementations (behavioral parity — varying depth)
@@ -90,7 +96,7 @@ Last updated: 2026-04-03 (`03bd7f0`)
 - [ ] Output truncation (large stdout/file content)
 - [ ] Session compaction behavior matching
 - [ ] Token counting / cost tracking accuracy
- [ ] Streaming response support
+- [x] Streaming response support validated by the mock parity harness

 ## Migration Readiness