mirror of
https://github.com/instructkr/claw-code.git
synced 2026-04-27 00:34:59 +08:00
Compare commits
54 Commits
claw-code-
...
feat/jobdo
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
d9fbe1ef83 | ||
|
|
cb8839e050 | ||
|
|
41b0006eea | ||
|
|
762e9bb212 | ||
|
|
5e29430d4f | ||
|
|
0d8adceb67 | ||
|
|
9eba71da81 | ||
|
|
ef5aae3ddd | ||
|
|
f05bc037de | ||
|
|
2fcb85ce4e | ||
|
|
f1103332d0 | ||
|
|
186d42f979 | ||
|
|
5f8d1b92a6 | ||
|
|
84466bbb6c | ||
|
|
fbcbe9d8d5 | ||
|
|
dd0993c157 | ||
|
|
b903e1605f | ||
|
|
de368a2615 | ||
|
|
af306d489e | ||
|
|
fef249d9e7 | ||
|
|
7724bf98fd | ||
|
|
70b2f6a66f | ||
|
|
1d155e4304 | ||
|
|
0b5dffb9da | ||
|
|
932710a626 | ||
|
|
3262cb3a87 | ||
|
|
8247d7d2eb | ||
|
|
517d7e224e | ||
|
|
c73423871b | ||
|
|
373dd9b848 | ||
|
|
11f9e8a5a2 | ||
|
|
97c4b130dc | ||
|
|
290ab7e41f | ||
|
|
ded0c5bbc1 | ||
|
|
40c17d8f2a | ||
|
|
b048de8899 | ||
|
|
5a18e3aa1a | ||
|
|
7fb95e95f6 | ||
|
|
60925fa9f7 | ||
|
|
01dca90e95 | ||
|
|
524edb2b2e | ||
|
|
455bdec06c | ||
|
|
85de7f9814 | ||
|
|
178c8fac28 | ||
|
|
d453eedae6 | ||
|
|
79a9f0e6f6 | ||
|
|
4813a2b351 | ||
|
|
3f4d46d7b4 | ||
|
|
6a76cc7c08 | ||
|
|
527c0f971c | ||
|
|
504d238af1 | ||
|
|
41a6091355 | ||
|
|
bc94870a54 | ||
|
|
ee3aa29a5e |
3
.gitignore
vendored
3
.gitignore
vendored
@@ -8,5 +8,8 @@ archive/
|
||||
# Claw Code local artifacts
|
||||
.claw/settings.local.json
|
||||
.claw/sessions/
|
||||
# #160/#166: default session storage directory (flush-transcript output,
|
||||
# dogfood runs, etc.). Claws specifying --directory elsewhere are fine.
|
||||
.port_sessions/
|
||||
.clawhip/
|
||||
status-help.txt
|
||||
|
||||
204
CLAUDE.md
204
CLAUDE.md
@@ -1,21 +1,195 @@
|
||||
# CLAUDE.md
|
||||
# CLAUDE.md — Python Reference Implementation
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
**This file guides work on `src/` and `tests/` — the Python reference harness for claw-code protocol.**
|
||||
|
||||
## Detected stack
|
||||
- Languages: Rust.
|
||||
- Frameworks: none detected from the supported starter markers.
|
||||
The production CLI lives in `rust/`; this directory (`src/`, `tests/`, `.py` files) is a **protocol validation and dogfood surface**.
|
||||
|
||||
## Verification
|
||||
- Run Rust verification from `rust/`: `cargo fmt`, `cargo clippy --workspace --all-targets -- -D warnings`, `cargo test --workspace`
|
||||
- `src/` and `tests/` are both present; update both surfaces together when behavior changes.
|
||||
## What this Python harness does
|
||||
|
||||
**Machine-first orchestration layer** — proves that the claw-code JSON protocol is:
|
||||
- Deterministic and recoverable (every output is reproducible)
|
||||
- Self-describing (SCHEMAS.md documents every field)
|
||||
- Clawable (external agents can build ONE error handler for all commands)
|
||||
|
||||
## Stack
|
||||
- **Language:** Python 3.13+
|
||||
- **Dependencies:** minimal (no frameworks; pure stdlibs + attrs/dataclasses)
|
||||
- **Test runner:** pytest
|
||||
- **Protocol contract:** SCHEMAS.md (machine-readable JSON envelope)
|
||||
|
||||
## Quick start
|
||||
|
||||
```bash
|
||||
# 1. Install dependencies (if not already in venv)
|
||||
python3 -m venv .venv && source .venv/bin/activate
|
||||
# (dependencies minimal; standard library mostly)
|
||||
|
||||
# 2. Run tests
|
||||
python3 -m pytest tests/ -q
|
||||
|
||||
# 3. Try a command
|
||||
python3 -m src.main bootstrap "hello" --output-format json | python3 -m json.tool
|
||||
```
|
||||
|
||||
## Verification workflow
|
||||
|
||||
```bash
|
||||
# Unit tests (fast)
|
||||
python3 -m pytest tests/ -q 2>&1 | tail -3
|
||||
|
||||
# Type checking (optional but recommended)
|
||||
python3 -m mypy src/ --ignore-missing-imports 2>&1 | tail -5
|
||||
```
|
||||
|
||||
## Repository shape
|
||||
- `rust/` contains the Rust workspace and active CLI/runtime implementation.
|
||||
- `src/` contains source files that should stay consistent with generated guidance and tests.
|
||||
- `tests/` contains validation surfaces that should be reviewed alongside code changes.
|
||||
|
||||
## Working agreement
|
||||
- Prefer small, reviewable changes and keep generated bootstrap files aligned with actual repo workflows.
|
||||
- Keep shared defaults in `.claude.json`; reserve `.claude/settings.local.json` for machine-local overrides.
|
||||
- Do not overwrite existing `CLAUDE.md` content automatically; update it intentionally when repo workflows change.
|
||||
- **`src/`** — Python reference harness implementing SCHEMAS.md protocol
|
||||
- `main.py` — CLI entry point; all 14 clawable commands
|
||||
- `query_engine.py` — core TurnResult / QueryEngineConfig
|
||||
- `runtime.py` — PortRuntime; turn loop + cancellation (#164 Stage A/B)
|
||||
- `session_store.py` — session persistence
|
||||
- `transcript.py` — turn transcript assembly
|
||||
- `commands.py`, `tools.py` — simulated command/tool trees
|
||||
- `models.py` — PermissionDenial, UsageSummary, etc.
|
||||
|
||||
- **`tests/`** — comprehensive protocol validation (22 baseline → 192 passing as of 2026-04-22)
|
||||
- `test_cli_parity_audit.py` — proves all 14 clawable commands accept --output-format
|
||||
- `test_json_envelope_field_consistency.py` — validates SCHEMAS.md contract
|
||||
- `test_cancel_observed_field.py` — #164 Stage B: cancellation observability + safe-to-reuse semantics
|
||||
- `test_run_turn_loop_*.py` — turn loop behavior (timeout, cancellation, continuation, permissions)
|
||||
- `test_submit_message_*.py` — budget, cancellation contracts
|
||||
- `test_*_cli.py` — command-specific JSON output validation
|
||||
|
||||
- **`SCHEMAS.md`** — canonical JSON contract
|
||||
- Common fields (all envelopes): timestamp, command, exit_code, output_format, schema_version
|
||||
- Error envelope shape
|
||||
- Not-found envelope shape
|
||||
- Per-command success schemas (14 commands documented)
|
||||
- Turn Result fields (including cancel_observed as of #164 Stage B)
|
||||
|
||||
- **`.gitignore`** — excludes `.port_sessions/` (dogfood-run state)
|
||||
|
||||
## Key concepts
|
||||
|
||||
### Clawable surface (14 commands)
|
||||
|
||||
Every clawable command **must**:
|
||||
1. Accept `--output-format {text,json}`
|
||||
2. Return JSON envelopes matching SCHEMAS.md
|
||||
3. Use common fields (timestamp, command, exit_code, output_format, schema_version)
|
||||
4. Exit 0 on success, 1 on error/not-found, 2 on timeout
|
||||
|
||||
**Commands:** list-sessions, delete-session, load-session, flush-transcript, show-command, show-tool, exec-command, exec-tool, route, bootstrap, command-graph, tool-pool, bootstrap-graph, turn-loop
|
||||
|
||||
**Validation:** `test_cli_parity_audit.py` auto-tests all 14 for --output-format acceptance.
|
||||
|
||||
### OPT_OUT surfaces (12 commands)
|
||||
|
||||
Explicitly exempt from --output-format requirement (for now):
|
||||
- Rich-Markdown reports: summary, manifest, parity-audit, setup-report
|
||||
- List commands with query filters: subsystems, commands, tools
|
||||
- Simulation/debug: remote-mode, ssh-mode, teleport-mode, direct-connect-mode, deep-link-mode
|
||||
|
||||
**Future work:** audit OPT_OUT surfaces for JSON promotion (post-#164).
|
||||
|
||||
### Protocol layers
|
||||
|
||||
**Coverage (#167–#170):** All clawable commands emit JSON
|
||||
**Enforcement (#171):** Parity CI prevents new commands skipping JSON
|
||||
**Documentation (#172):** SCHEMAS.md locks field contract
|
||||
**Alignment (#173):** Test framework validates docs ↔ code match
|
||||
**Field evolution (#164 Stage B):** cancel_observed proves protocol extensibility
|
||||
|
||||
## Testing & coverage
|
||||
|
||||
### Run full suite
|
||||
```bash
|
||||
python3 -m pytest tests/ -q
|
||||
```
|
||||
|
||||
### Run one test file
|
||||
```bash
|
||||
python3 -m pytest tests/test_cancel_observed_field.py -v
|
||||
```
|
||||
|
||||
### Run one test
|
||||
```bash
|
||||
python3 -m pytest tests/test_cancel_observed_field.py::TestCancelObservedField::test_default_value_is_false -v
|
||||
```
|
||||
|
||||
### Check coverage (optional)
|
||||
```bash
|
||||
python3 -m pip install coverage # if not already installed
|
||||
python3 -m coverage run -m pytest tests/
|
||||
python3 -m coverage report --skip-covered
|
||||
```
|
||||
|
||||
Target: >90% line coverage for src/ (currently ~85%).
|
||||
|
||||
## Common workflows
|
||||
|
||||
### Add a new clawable command
|
||||
|
||||
1. Add parser in `main.py` (argparse)
|
||||
2. Add `--output-format` flag
|
||||
3. Emit JSON envelope using `wrap_json_envelope(data, command_name)`
|
||||
4. Add command to CLAWABLE_SURFACES in test_cli_parity_audit.py
|
||||
5. Document in SCHEMAS.md (schema + example)
|
||||
6. Write test in tests/test_*_cli.py or tests/test_json_envelope_field_consistency.py
|
||||
7. Run full suite to confirm parity
|
||||
|
||||
### Modify TurnResult or protocol fields
|
||||
|
||||
1. Update dataclass in `query_engine.py`
|
||||
2. Update SCHEMAS.md with new field + rationale
|
||||
3. Write test in `tests/test_json_envelope_field_consistency.py` that validates field presence
|
||||
4. Update all places that construct TurnResult (grep for `TurnResult(`)
|
||||
5. Update bootstrap/turn-loop JSON builders in main.py
|
||||
6. Run `tests/` to ensure no regressions
|
||||
|
||||
### Promote an OPT_OUT surface to CLAWABLE
|
||||
|
||||
**Prerequisite:** Real demand signal logged in `OPT_OUT_DEMAND_LOG.md` (threshold: 2+ independent signals per surface). Speculative promotions are not allowed.
|
||||
|
||||
Once demand is evidenced:
|
||||
1. Add --output-format flag to argparse
|
||||
2. Emit wrap_json_envelope() output in JSON path
|
||||
3. Move command from OPT_OUT_SURFACES to CLAWABLE_SURFACES
|
||||
4. Document in SCHEMAS.md
|
||||
5. Write test for JSON output
|
||||
6. Run parity audit to confirm no regressions
|
||||
7. Update `OPT_OUT_DEMAND_LOG.md` to mark signal as resolved
|
||||
|
||||
### File a demand signal (when a claw actually needs JSON from an OPT_OUT surface)
|
||||
|
||||
1. Open `OPT_OUT_DEMAND_LOG.md`
|
||||
2. Find the surface's entry under Group A/B/C
|
||||
3. Append a dated entry with Source, Use Case, and Markdown-alternative-checked explanation
|
||||
4. If this is the 2nd signal for the same surface, file a promotion pinpoint in ROADMAP.md
|
||||
|
||||
## Dogfood principles
|
||||
|
||||
The Python harness is continuously dogfood-tested:
|
||||
- Every cycle ships to `main` with detailed commit messages
|
||||
- New tests are written before/alongside implementation
|
||||
- Test suite must pass before pushing (zero-regression principle)
|
||||
- Commits grouped by pinpoint (#159, #160, ..., #174)
|
||||
- Failure modes classified per exit code: 0=success, 1=error, 2=timeout
|
||||
|
||||
## Protocol governance
|
||||
|
||||
- **SCHEMAS.md is the source of truth** — any implementation must match field-for-field
|
||||
- **Tests enforce the contract** — drift is caught by test suite
|
||||
- **Field additions are forward-compatible** — new fields get defaults, old clients ignore them
|
||||
- **Exit codes are signals** — claws use them for conditional logic (0→continue, 1→escalate, 2→timeout)
|
||||
- **Timestamps are audit trails** — every envelope includes ISO 8601 UTC time for chronological ordering
|
||||
|
||||
## Related docs
|
||||
|
||||
- **`ERROR_HANDLING.md`** — Unified error-handling pattern for claws (one handler for all 14 clawable commands)
|
||||
- **`SCHEMAS.md`** — JSON protocol specification (read before implementing)
|
||||
- **`OPT_OUT_AUDIT.md`** — Governance for the 12 non-clawable surfaces
|
||||
- **`OPT_OUT_DEMAND_LOG.md`** — Active survey recording real demand signals (evidence base for decisions)
|
||||
- **`ROADMAP.md`** — macro roadmap and macro pain points
|
||||
- **`PHILOSOPHY.md`** — system design intent
|
||||
- **`PARITY.md`** — status of Python ↔ Rust protocol equivalence
|
||||
|
||||
489
ERROR_HANDLING.md
Normal file
489
ERROR_HANDLING.md
Normal file
@@ -0,0 +1,489 @@
|
||||
# Error Handling for Claw Code Claws
|
||||
|
||||
**Purpose:** Build a unified error handler for orchestration code using claw-code as a library or subprocess.
|
||||
|
||||
After cycles #178–#179 (parser-front-door hole closure), claw-code's error interface is deterministic, machine-readable, and clawable: **one error handler for all 14 clawable commands.**
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference: Exit Codes and Envelopes
|
||||
|
||||
Every clawable command returns JSON on stdout when `--output-format json` is requested.
|
||||
|
||||
**IMPORTANT:** The exit code contract below applies **only when `--output-format json` is explicitly set**. Text mode follows argparse conventions and may return different exit codes (e.g., `2` for argparse parse errors). Claws consuming claw-code as a subprocess MUST always pass `--output-format json` to get the documented contract.
|
||||
|
||||
| Exit Code | Meaning | Response Format | Example |
|
||||
|---|---|---|---|
|
||||
| **0** | Success | `{success fields}` | `{"session_id": "...", "loaded": true}` |
|
||||
| **1** | Error / Not Found | `{error: {kind, message, ...}}` | `{"error": {"kind": "session_not_found", ...}}` |
|
||||
| **2** | Timeout | `{final_stop_reason: "timeout", final_cancel_observed: ...}` | `{"final_stop_reason": "timeout", ...}` |
|
||||
|
||||
### Text mode vs JSON mode exit codes
|
||||
|
||||
| Scenario | Text mode exit | JSON mode exit | Why |
|
||||
|---|---|---|---|
|
||||
| Unknown subcommand | 2 (argparse default) | 1 (parse error envelope) | argparse defaults to 2; JSON mode normalizes to contract |
|
||||
| Missing required arg | 2 (argparse default) | 1 (parse error envelope) | Same reason |
|
||||
| Session not found | 1 | 1 | Application-level error, same in both |
|
||||
| Command executed OK | 0 | 0 | Success path, identical |
|
||||
| Turn-loop timeout | 2 | 2 | Identical (#161 implementation) |
|
||||
|
||||
**Practical rule for claws:** always pass `--output-format json`. This eliminates text-mode surprises and gives you the documented exit-code contract for every error path.
|
||||
|
||||
---
|
||||
|
||||
## One-Handler Pattern
|
||||
|
||||
Build a single error-recovery function that works for all 14 clawable commands:
|
||||
|
||||
```python
|
||||
import subprocess
|
||||
import json
|
||||
import sys
|
||||
from typing import Any
|
||||
|
||||
def run_claw_command(command: list[str], timeout_seconds: float = 30.0) -> dict[str, Any]:
|
||||
"""
|
||||
Run a clawable claw-code command and handle errors uniformly.
|
||||
|
||||
Args:
|
||||
command: Full command list, e.g. ["claw", "load-session", "id", "--output-format", "json"]
|
||||
timeout_seconds: Wall-clock timeout
|
||||
|
||||
Returns:
|
||||
Parsed JSON result from stdout
|
||||
|
||||
Raises:
|
||||
ClawError: Classified by error.kind (parse, session_not_found, runtime, timeout, etc.)
|
||||
"""
|
||||
try:
|
||||
result = subprocess.run(
|
||||
command,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=timeout_seconds,
|
||||
)
|
||||
except subprocess.TimeoutExpired:
|
||||
raise ClawError(
|
||||
kind='subprocess_timeout',
|
||||
message=f'Command exceeded {timeout_seconds}s wall-clock timeout',
|
||||
retryable=True, # Caller's decision; subprocess timeout != engine timeout
|
||||
)
|
||||
|
||||
# Parse JSON (valid for all success/error/timeout paths in claw-code)
|
||||
try:
|
||||
envelope = json.loads(result.stdout)
|
||||
except json.JSONDecodeError as err:
|
||||
raise ClawError(
|
||||
kind='parse_failure',
|
||||
message=f'Command output is not JSON: {err}',
|
||||
hint='Check that --output-format json is being passed',
|
||||
retryable=False,
|
||||
)
|
||||
|
||||
# Classify by exit code and error.kind
|
||||
match (result.returncode, envelope.get('error', {}).get('kind')):
|
||||
case (0, _):
|
||||
# Success
|
||||
return envelope
|
||||
|
||||
case (1, 'parse'):
|
||||
# #179: argparse error — typically a typo or missing required argument
|
||||
raise ClawError(
|
||||
kind='parse',
|
||||
message=envelope['error']['message'],
|
||||
hint=envelope['error'].get('hint'),
|
||||
retryable=False, # Typos don't fix themselves
|
||||
)
|
||||
|
||||
case (1, 'session_not_found'):
|
||||
# Common: load-session on nonexistent ID
|
||||
raise ClawError(
|
||||
kind='session_not_found',
|
||||
message=envelope['error']['message'],
|
||||
session_id=envelope.get('session_id'),
|
||||
retryable=False, # Session won't appear on retry
|
||||
)
|
||||
|
||||
case (1, 'filesystem'):
|
||||
# Directory missing, permission denied, disk full
|
||||
raise ClawError(
|
||||
kind='filesystem',
|
||||
message=envelope['error']['message'],
|
||||
retryable=True, # Might be transient (disk space, NFS flake)
|
||||
)
|
||||
|
||||
case (1, 'runtime'):
|
||||
# Generic engine error (unexpected exception, malformed input, etc.)
|
||||
raise ClawError(
|
||||
kind='runtime',
|
||||
message=envelope['error']['message'],
|
||||
retryable=envelope['error'].get('retryable', False),
|
||||
)
|
||||
|
||||
case (1, _):
|
||||
# Catch-all for any new error.kind values
|
||||
raise ClawError(
|
||||
kind=envelope['error']['kind'],
|
||||
message=envelope['error']['message'],
|
||||
retryable=envelope['error'].get('retryable', False),
|
||||
)
|
||||
|
||||
case (2, _):
|
||||
# Timeout (engine was asked to cancel and had fair chance to observe)
|
||||
cancel_observed = envelope.get('final_cancel_observed', False)
|
||||
raise ClawError(
|
||||
kind='timeout',
|
||||
message=f'Turn exceeded timeout (cancel_observed={cancel_observed})',
|
||||
cancel_observed=cancel_observed,
|
||||
retryable=True, # Caller can retry with a fresh session
|
||||
safe_to_reuse_session=(cancel_observed is True),
|
||||
)
|
||||
|
||||
case (exit_code, _):
|
||||
# Unexpected exit code
|
||||
raise ClawError(
|
||||
kind='unexpected_exit_code',
|
||||
message=f'Unexpected exit code {exit_code}',
|
||||
retryable=False,
|
||||
)
|
||||
|
||||
|
||||
class ClawError(Exception):
|
||||
"""Unified error type for claw-code commands."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
kind: str,
|
||||
message: str,
|
||||
hint: str | None = None,
|
||||
retryable: bool = False,
|
||||
cancel_observed: bool = False,
|
||||
safe_to_reuse_session: bool = False,
|
||||
session_id: str | None = None,
|
||||
):
|
||||
self.kind = kind
|
||||
self.message = message
|
||||
self.hint = hint
|
||||
self.retryable = retryable
|
||||
self.cancel_observed = cancel_observed
|
||||
self.safe_to_reuse_session = safe_to_reuse_session
|
||||
self.session_id = session_id
|
||||
super().__init__(self.message)
|
||||
|
||||
def __str__(self) -> str:
|
||||
parts = [f"{self.kind}: {self.message}"]
|
||||
if self.hint:
|
||||
parts.append(f"Hint: {self.hint}")
|
||||
if self.retryable:
|
||||
parts.append("(retryable)")
|
||||
if self.cancel_observed:
|
||||
parts.append(f"(safe_to_reuse_session={self.safe_to_reuse_session})")
|
||||
return "\n".join(parts)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Practical Recovery Patterns
|
||||
|
||||
### Pattern 1: Retry on transient errors
|
||||
|
||||
```python
|
||||
from time import sleep
|
||||
|
||||
def run_with_retry(
|
||||
command: list[str],
|
||||
max_attempts: int = 3,
|
||||
backoff_seconds: float = 0.5,
|
||||
) -> dict:
|
||||
"""Retry on transient errors (filesystem, timeout)."""
|
||||
for attempt in range(1, max_attempts + 1):
|
||||
try:
|
||||
return run_claw_command(command)
|
||||
except ClawError as err:
|
||||
if not err.retryable:
|
||||
raise # Non-transient; fail fast
|
||||
|
||||
if attempt == max_attempts:
|
||||
raise # Last attempt; propagate
|
||||
|
||||
print(f"Attempt {attempt} failed ({err.kind}); retrying in {backoff_seconds}s...", file=sys.stderr)
|
||||
sleep(backoff_seconds)
|
||||
backoff_seconds *= 1.5 # exponential backoff
|
||||
|
||||
raise RuntimeError("Unreachable")
|
||||
```
|
||||
|
||||
### Pattern 2: Reuse session after timeout (if safe)
|
||||
|
||||
```python
|
||||
def run_with_timeout_recovery(
|
||||
command: list[str],
|
||||
timeout_seconds: float = 30.0,
|
||||
fallback_timeout: float = 60.0,
|
||||
) -> dict:
|
||||
"""
|
||||
On timeout, check cancel_observed. If True, the session is safe for retry.
|
||||
If False, the session is potentially wedged; use a fresh one.
|
||||
"""
|
||||
try:
|
||||
return run_claw_command(command, timeout_seconds=timeout_seconds)
|
||||
except ClawError as err:
|
||||
if err.kind != 'timeout':
|
||||
raise
|
||||
|
||||
if err.safe_to_reuse_session:
|
||||
# Engine saw the cancel signal; safe to reuse this session with a larger timeout
|
||||
print(f"Timeout observed (cancel_observed=true); retrying with {fallback_timeout}s...", file=sys.stderr)
|
||||
return run_claw_command(command, timeout_seconds=fallback_timeout)
|
||||
else:
|
||||
# Engine didn't see the cancel signal; session may be wedged
|
||||
print(f"Timeout not observed (cancel_observed=false); session is potentially wedged", file=sys.stderr)
|
||||
raise # Caller should allocate a fresh session
|
||||
```
|
||||
|
||||
### Pattern 3: Detect parse errors (typos in command-line construction)
|
||||
|
||||
```python
|
||||
def validate_command_before_dispatch(command: list[str]) -> None:
|
||||
"""
|
||||
Dry-run with --help to detect obvious syntax errors before dispatching work.
|
||||
|
||||
This is cheap (no API call) and catches typos like:
|
||||
- Unknown subcommand: `claw typo-command`
|
||||
- Unknown flag: `claw bootstrap --invalid-flag`
|
||||
- Missing required argument: `claw load-session` (no session_id)
|
||||
"""
|
||||
help_cmd = command + ['--help']
|
||||
try:
|
||||
result = subprocess.run(help_cmd, capture_output=True, timeout=2.0)
|
||||
if result.returncode != 0:
|
||||
print(f"Warning: {' '.join(help_cmd)} returned {result.returncode}", file=sys.stderr)
|
||||
print("(This doesn't prove the command is invalid, just that --help failed)", file=sys.stderr)
|
||||
except subprocess.TimeoutExpired:
|
||||
pass # --help shouldn't hang, but don't block on it
|
||||
```
|
||||
|
||||
### Pattern 4: Log and forward errors to observability
|
||||
|
||||
```python
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
def run_claw_with_logging(command: list[str]) -> dict:
|
||||
"""Run command and log errors for observability."""
|
||||
try:
|
||||
result = run_claw_command(command)
|
||||
logger.info(f"Claw command succeeded: {' '.join(command)}")
|
||||
return result
|
||||
except ClawError as err:
|
||||
logger.error(
|
||||
"Claw command failed",
|
||||
extra={
|
||||
'command': ' '.join(command),
|
||||
'error_kind': err.kind,
|
||||
'error_message': err.message,
|
||||
'retryable': err.retryable,
|
||||
'cancel_observed': err.cancel_observed,
|
||||
},
|
||||
)
|
||||
raise
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Kinds (Enumeration)
|
||||
|
||||
After cycles #178–#179, the complete set of `error.kind` values is:
|
||||
|
||||
| Kind | Exit Code | Meaning | Retryable | Notes |
|
||||
|---|---|---|---|---|
|
||||
| **parse** | 1 | Argparse error (unknown command, missing arg, invalid flag) | No | Real error message included (#179); valid choices list for discoverability |
|
||||
| **session_not_found** | 1 | load-session target doesn't exist | No | session_id and directory included in envelope |
|
||||
| **filesystem** | 1 | Directory missing, permission denied, disk full | Yes | Transient issues (disk space, NFS flake) can be retried |
|
||||
| **runtime** | 1 | Engine error (unexpected exception, malformed input) | Depends | `error.retryable` field in envelope specifies |
|
||||
| **timeout** | 2 | Engine timeout with cooperative cancellation | Yes* | `cancel_observed` field signals session safety (#164) |
|
||||
|
||||
*Retry safety depends on `cancel_observed`:
|
||||
- `cancel_observed=true` → session is safe to reuse
|
||||
- `cancel_observed=false` → session may be wedged; allocate fresh one
|
||||
|
||||
---
|
||||
|
||||
## What We Did to Make This Work
|
||||
|
||||
### Cycle #178: Parse-Error Envelope
|
||||
|
||||
**Problem:** `claw nonexistent --output-format json` returned argparse help text on stderr instead of an envelope.
|
||||
**Solution:** Catch argparse `SystemExit` in JSON mode and emit a structured error envelope.
|
||||
**Benefit:** Claws no longer need to parse human help text to understand parse errors.
|
||||
|
||||
### Cycle #179: Stderr Hygiene + Real Error Message
|
||||
|
||||
**Problem:** Even after #178, argparse usage was leaking to stderr AND the envelope message was generic ("invalid command or argument").
|
||||
**Solution:** Monkey-patch `parser.error()` in JSON mode to raise an internal exception, preserving argparse's real message verbatim. Suppress stderr entirely in JSON mode.
|
||||
**Benefit:** Claws see one stream (stdout), one envelope, and real error context (e.g., "invalid choice: typo (choose from ...)") for discoverability.
|
||||
|
||||
### Contract: #164 Stage B (`cancel_observed` field)
|
||||
|
||||
**Problem:** Timeout results didn't signal whether the engine actually observed the cancellation request.
|
||||
**Solution:** Add `cancel_observed: bool` field to timeout TurnResult; signal true iff the engine had a fair chance to observe the cancel event.
|
||||
**Benefit:** Claws can decide "retry with fresh session" vs "reuse this session with larger timeout" based on a single boolean.
|
||||
|
||||
---
|
||||
|
||||
## Common Mistakes to Avoid
|
||||
|
||||
❌ **Don't parse exit code alone**
|
||||
```python
|
||||
# BAD: Exit code 1 could mean parse error, not-found, filesystem, or runtime
|
||||
if result.returncode == 1:
|
||||
# What should I do? Unclear.
|
||||
pass
|
||||
```
|
||||
|
||||
✅ **Do parse error.kind**
|
||||
```python
|
||||
# GOOD: error.kind tells you exactly how to recover
|
||||
match envelope['error']['kind']:
|
||||
case 'parse': ...
|
||||
case 'session_not_found': ...
|
||||
case 'filesystem': ...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
❌ **Don't capture both stdout and stderr and assume they're separate concerns**
|
||||
```python
|
||||
# BAD (pre-#179): Capture stdout + stderr, then parse stdout as JSON
|
||||
# But stderr might contain argparse noise that you have to string-match
|
||||
result = subprocess.run(..., capture_output=True, text=True)
|
||||
if "invalid choice" in result.stderr:
|
||||
# ... custom error handling
|
||||
```
|
||||
|
||||
✅ **Do silence stderr in JSON mode**
|
||||
```python
|
||||
# GOOD (post-#179): In JSON mode, stderr is guaranteed silent
|
||||
# Envelope on stdout is your single source of truth
|
||||
result = subprocess.run(..., capture_output=True, text=True)
|
||||
envelope = json.loads(result.stdout) # Always valid in JSON mode
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
❌ **Don't retry on parse errors**
|
||||
```python
|
||||
# BAD: Typos don't fix themselves
|
||||
error_kind = envelope['error']['kind']
|
||||
if error_kind == 'parse':
|
||||
retry() # Will fail again
|
||||
```
|
||||
|
||||
✅ **Do check retryable before retrying**
|
||||
```python
|
||||
# GOOD: Let the error tell you
|
||||
error = envelope['error']
|
||||
if error.get('retryable', False):
|
||||
retry()
|
||||
else:
|
||||
raise
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
❌ **Don't reuse a session after timeout without checking cancel_observed**
|
||||
```python
|
||||
# BAD: Reuse session = potential wedge
|
||||
result = run_claw_command(...) # times out
|
||||
# ... later, reuse same session
|
||||
result = run_claw_command(...) # might be stuck in the previous turn
|
||||
```
|
||||
|
||||
✅ **Do allocate a fresh session if cancel_observed=false**
|
||||
```python
|
||||
# GOOD: Allocate fresh session if wedge is suspected
|
||||
try:
|
||||
result = run_claw_command(...)
|
||||
except ClawError as err:
|
||||
if err.cancel_observed:
|
||||
# Safe to reuse
|
||||
result = run_claw_command(...)
|
||||
else:
|
||||
# Allocate fresh session
|
||||
fresh_session = create_session()
|
||||
result = run_claw_command_in_session(fresh_session, ...)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Testing Your Error Handler
|
||||
|
||||
```python
|
||||
def test_error_handler_parse_error():
|
||||
"""Verify parse errors are caught and classified."""
|
||||
try:
|
||||
run_claw_command(['claw', 'nonexistent', '--output-format', 'json'])
|
||||
assert False, "Should have raised ClawError"
|
||||
except ClawError as err:
|
||||
assert err.kind == 'parse'
|
||||
assert 'invalid choice' in err.message.lower()
|
||||
assert err.retryable is False
|
||||
|
||||
def test_error_handler_timeout_safe():
|
||||
"""Verify timeout with cancel_observed=true marks session as safe."""
|
||||
# Requires a live claw-code server; mock this test
|
||||
try:
|
||||
run_claw_command(
|
||||
['claw', 'turn-loop', '"x"', '--timeout-seconds', '0.0001'],
|
||||
timeout_seconds=2.0,
|
||||
)
|
||||
assert False, "Should have raised ClawError"
|
||||
except ClawError as err:
|
||||
assert err.kind == 'timeout'
|
||||
assert err.safe_to_reuse_session is True # cancel_observed=true
|
||||
|
||||
def test_error_handler_not_found():
|
||||
"""Verify session_not_found is clearly classified."""
|
||||
try:
|
||||
run_claw_command(['claw', 'load-session', 'nonexistent', '--output-format', 'json'])
|
||||
assert False, "Should have raised ClawError"
|
||||
except ClawError as err:
|
||||
assert err.kind == 'session_not_found'
|
||||
assert err.retryable is False
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Appendix: SCHEMAS.md Error Shape
|
||||
|
||||
For reference, the canonical JSON error envelope shape (SCHEMAS.md):
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T11:40:00Z",
|
||||
"command": "load-session",
|
||||
"exit_code": 1,
|
||||
"output_format": "json",
|
||||
"schema_version": "1.0",
|
||||
"error": {
|
||||
"kind": "session_not_found",
|
||||
"operation": "session_store.load_session",
|
||||
"target": "nonexistent",
|
||||
"retryable": false,
|
||||
"message": "session 'nonexistent' not found in .port_sessions",
|
||||
"hint": "use 'list-sessions' to see available sessions"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
All commands that emit errors follow this shape (with error.kind varying). See `SCHEMAS.md` for the complete contract.
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
After cycles #178–#179, **one error handler works for all 14 clawable commands.** No more string-matching, no more stderr parsing, no more exit-code ambiguity. Just parse the JSON, check `error.kind`, and decide: retry, escalate, or reuse session (if safe).
|
||||
|
||||
The handler itself is ~80 lines of Python; the patterns are reusable across any language that can speak JSON.
|
||||
151
OPT_OUT_AUDIT.md
Normal file
151
OPT_OUT_AUDIT.md
Normal file
@@ -0,0 +1,151 @@
|
||||
# OPT_OUT Surface Audit Roadmap
|
||||
|
||||
**Status:** Pre-audit (decision table ready, survey pending)
|
||||
|
||||
This document governs the audit and potential promotion of 12 OPT_OUT surfaces (commands that currently do **not** support `--output-format json`).
|
||||
|
||||
## OPT_OUT Classification Rationale
|
||||
|
||||
A surface is classified as OPT_OUT when:
|
||||
1. **Human-first by nature:** Rich Markdown prose / diagrams / structured text where JSON would be information loss
|
||||
2. **Query-filtered alternative exists:** Commands with internal `--query` / `--limit` don't need JSON (users already have escape hatch)
|
||||
3. **Simulation/debug only:** Not meant for production orchestration (e.g., mode simulators)
|
||||
4. **Future JSON work is planned:** Documented in ROADMAP with clear upgrade path
|
||||
|
||||
---
|
||||
|
||||
## OPT_OUT Surfaces (12 Total)
|
||||
|
||||
### Group A: Rich-Markdown Reports (4 commands)
|
||||
|
||||
**Rationale:** These emit structured narrative prose. JSON would require lossy serialization.
|
||||
|
||||
| Command | Output | Current use | JSON case |
|
||||
|---|---|---|---|
|
||||
| `summary` | Multi-section workspace summary (Markdown) | Human readability | Not applicable; Markdown is the output |
|
||||
| `manifest` | Workspace manifest with project tree (Markdown) | Human readability | Not applicable; Markdown is the output |
|
||||
| `parity-audit` | TypeScript/Python port comparison report (Markdown) | Human readability | Not applicable; Markdown is the output |
|
||||
| `setup-report` | Preflight + startup diagnostics (Markdown) | Human readability | Not applicable; Markdown is the output |
|
||||
|
||||
**Audit decision:** These likely remain OPT_OUT long-term (Markdown-as-output is intentional). If JSON version needed in future, would be a separate `--output-format json` path generating structured data (project summary object, manifest array, audit deltas, setup checklist) — but that's a **new contract**, not an addition to existing Markdown surfaces.
|
||||
|
||||
**Pinpoint:** #175 (deferred) — audit whether `summary`/`manifest` should emit JSON structured versions *in parallel* with Markdown, or if Markdown-only is the right UX.
|
||||
|
||||
---
|
||||
|
||||
### Group B: List Commands with Query Filters (3 commands)
|
||||
|
||||
**Rationale:** These already support `--query` and `--limit` for filtering. JSON output would be redundant; users can pipe to `jq`.
|
||||
|
||||
| Command | Filtering | Current output | JSON case |
|
||||
|---|---|---|---|
|
||||
| `subsystems` | `--limit` | Human-readable list | Use `--query` to filter, users can parse if needed |
|
||||
| `commands` | `--query`, `--limit`, `--no-plugin-commands`, `--no-skill-commands` | Human-readable list | Use `--query` to filter, users can parse if needed |
|
||||
| `tools` | `--query`, `--limit`, `--simple-mode` | Human-readable list | Use `--query` to filter, users can parse if needed |
|
||||
|
||||
**Audit decision:** `--query` / `--limit` are already the machine-friendly escape hatch. These commands are **intentionally** list-filter-based (not orchestration-primary). Promoting to CLAWABLE would require:
|
||||
1. Formalizing what the structured output *is* (command array? tool array?)
|
||||
2. Versioning the schema per command
|
||||
3. Updating tests to validate per-command schemas
|
||||
|
||||
**Cost-benefit:** Low. Users who need structured data can already use `--query` to narrow results, then parse. Effort to promote > value.
|
||||
|
||||
**Pinpoint:** #176 (backlog) — audit `--query` UX; consider if a `--query-json` escape hatch (output JSON of matching items) is worth the schema tax.
|
||||
|
||||
---
|
||||
|
||||
### Group C: Simulation / Debug Surfaces (5 commands)
|
||||
|
||||
**Rationale:** These are intentionally **not production-orchestrated**. They simulate behavior, test modes, or debug scenarios. JSON output doesn't add value.
|
||||
|
||||
| Command | Purpose | Output | Use case |
|
||||
|---|---|---|---|
|
||||
| `remote-mode` | Simulate remote execution | Text (mock session) | Testing harness behavior under remote constraints |
|
||||
| `ssh-mode` | Simulate SSH execution | Text (mock SSH session) | Testing harness behavior over SSH-like transport |
|
||||
| `teleport-mode` | Simulate teleport hop | Text (mock hop session) | Testing harness behavior with teleport bouncing |
|
||||
| `direct-connect-mode` | Simulate direct network | Text (mock session) | Testing harness behavior with direct connectivity |
|
||||
| `deep-link-mode` | Simulate deep-link invocation | Text (mock deep-link) | Testing harness behavior from URL/deeplink |
|
||||
|
||||
**Audit decision:** These are **intentionally simulation-only**. Promoting to CLAWABLE means:
|
||||
1. "This simulated mode is now a valid orchestration surface"
|
||||
2. Need to define what JSON output *means* (mock session state? simulation log?)
|
||||
3. Need versioning + test coverage
|
||||
|
||||
**Cost-benefit:** Very low. These are debugging tools, not orchestration endpoints. Effort to promote >> value.
|
||||
|
||||
**Pinpoint:** #177 (backlog) — decide if mode simulators should ever be CLAWABLE (probably no).
|
||||
|
||||
---
|
||||
|
||||
## Audit Workflow (Future Cycles)
|
||||
|
||||
### For each surface:
|
||||
1. **Survey:** Check if any external claw actually uses --output-format with this surface
|
||||
2. **Cost estimate:** How much schema work + testing?
|
||||
3. **Value estimate:** How much demand for JSON version?
|
||||
4. **Decision:** CLAWABLE, remain OPT_OUT, or new pinpoint?
|
||||
|
||||
### Promotion criteria (if promoting to CLAWABLE):
|
||||
|
||||
A surface moves from OPT_OUT → CLAWABLE **only if**:
|
||||
- ✅ Clear use case for JSON (not just "hypothetically could be JSON")
|
||||
- ✅ Schema is simple and stable (not 20+ fields)
|
||||
- ✅ At least one external claw has requested it
|
||||
- ✅ Tests can be added without major refactor
|
||||
- ✅ Maintainability burden is worth the value
|
||||
|
||||
### Demote criteria (if staying OPT_OUT):
|
||||
|
||||
A surface stays OPT_OUT **if**:
|
||||
- ✅ JSON would be information loss (Markdown reports)
|
||||
- ✅ Equivalent filtering already exists (`--query` / `--limit`)
|
||||
- ✅ Use case is simulation/debug, not production
|
||||
- ✅ Promotion effort > value to users
|
||||
|
||||
---
|
||||
|
||||
## Post-Audit Outcomes
|
||||
|
||||
### Likely scenario (high confidence)
|
||||
|
||||
**Group A (Markdown reports):** Remain OPT_OUT
|
||||
- `summary`, `manifest`, `parity-audit`, `setup-report` are **intentionally** human-first
|
||||
- If JSON-like structure is needed in future, would be separate `*-json` commands or distinct `--output-format`, not added to Markdown surfaces
|
||||
|
||||
**Group B (List filters):** Remain OPT_OUT
|
||||
- `subsystems`, `commands`, `tools` have `--query` / `--limit` as query layer
|
||||
- Users who need structured data already have escape hatch
|
||||
|
||||
**Group C (Mode simulators):** Remain OPT_OUT
|
||||
- `remote-mode`, `ssh-mode`, etc. are debug tools, not orchestration endpoints
|
||||
- No demand for JSON version; promotion would be forced, not driven
|
||||
|
||||
**Result:** OPT_OUT audit concludes that 12/12 surfaces should **remain OPT_OUT** (no promotions).
|
||||
|
||||
### If demand emerges
|
||||
|
||||
If external claws report needing JSON from any OPT_OUT surface:
|
||||
1. File pinpoint with use case + rationale
|
||||
2. Estimate cost + value
|
||||
3. If value > cost, promote to CLAWABLE with full test coverage
|
||||
4. Update SCHEMAS.md
|
||||
5. Update CLAUDE.md
|
||||
|
||||
---
|
||||
|
||||
## Timeline
|
||||
|
||||
- **Post-#174 (now):** OPT_OUT audit documented (this file)
|
||||
- **Cycles #19–#21 (deferred):** Survey period — collect data on external demand
|
||||
- **Cycle #22 (deferred):** Final audit decision + any promotions
|
||||
- **Post-audit:** Move to protocol maintenance mode (new commands/fields/surfaces)
|
||||
|
||||
---
|
||||
|
||||
## Related
|
||||
|
||||
- **OPT_OUT_DEMAND_LOG.md** — Active survey recording real demand signals (evidentiary base for any promotion decision)
|
||||
- **SCHEMAS.md** — Clawable surface contracts
|
||||
- **CLAUDE.md** — Development guidance
|
||||
- **test_cli_parity_audit.py** — Parametrized tests for CLAWABLE_SURFACES enforcement
|
||||
- **ROADMAP.md** — Macro phases (this audit is Phase 3 before Phase 2 closure)
|
||||
167
OPT_OUT_DEMAND_LOG.md
Normal file
167
OPT_OUT_DEMAND_LOG.md
Normal file
@@ -0,0 +1,167 @@
|
||||
# OPT_OUT Demand Log
|
||||
|
||||
**Purpose:** Record real demand signals for promoting OPT_OUT surfaces to CLAWABLE. Without this log, the audit criteria in `OPT_OUT_AUDIT.md` have no evidentiary base.
|
||||
|
||||
**Status:** Active survey window (post-#178/#179, cycles #21+)
|
||||
|
||||
## How to file a demand signal
|
||||
|
||||
When any external claw, operator, or downstream consumer actually needs JSON output from one of the 12 OPT_OUT surfaces, add an entry below. **Speculation, "could be useful someday," and internal hypotheticals do NOT count.**
|
||||
|
||||
A valid signal requires:
|
||||
- **Source:** Who/what asked (human, automation, agent session, external tool)
|
||||
- **Surface:** Which OPT_OUT command (from the 12)
|
||||
- **Use case:** The concrete orchestration problem they're trying to solve
|
||||
- **Would-parse-Markdown alternative checked?** Why the existing OPT_OUT output is insufficient
|
||||
- **Date:** When the signal was received
|
||||
|
||||
## Promotion thresholds
|
||||
|
||||
Per `OPT_OUT_AUDIT.md` criteria:
|
||||
- **2+ independent signals** for the same surface within a survey window → file promotion pinpoint
|
||||
- **1 signal + existing stable schema** → file pinpoint for discussion
|
||||
- **0 signals** → surface stays OPT_OUT (documented rationale in audit file)
|
||||
|
||||
The threshold is intentionally high. Single-use hacks can be served via one-off Markdown parsing; schema promotion is expensive (docs, tests, maintenance).
|
||||
|
||||
---
|
||||
|
||||
## Demand Signals Received
|
||||
|
||||
### Group A: Rich-Markdown Reports
|
||||
|
||||
#### `summary`
|
||||
**Signals received: 0**
|
||||
|
||||
Notes: No demand recorded. Markdown output is intentional and useful for human review.
|
||||
|
||||
#### `manifest`
|
||||
**Signals received: 0**
|
||||
|
||||
Notes: No demand recorded.
|
||||
|
||||
#### `parity-audit`
|
||||
**Signals received: 0**
|
||||
|
||||
Notes: No demand recorded. Report consumers are humans reviewing porting progress, not automation.
|
||||
|
||||
#### `setup-report`
|
||||
**Signals received: 0**
|
||||
|
||||
Notes: No demand recorded.
|
||||
|
||||
---
|
||||
|
||||
### Group B: List Commands with Query Filters
|
||||
|
||||
#### `subsystems`
|
||||
**Signals received: 0**
|
||||
|
||||
Notes: `--limit` already provides filtering. No claws requesting JSON.
|
||||
|
||||
#### `commands`
|
||||
**Signals received: 0**
|
||||
|
||||
Notes: `--query`, `--limit`, `--no-plugin-commands`, `--no-skill-commands` already allow filtering. No demand recorded.
|
||||
|
||||
#### `tools`
|
||||
**Signals received: 0**
|
||||
|
||||
Notes: `--query`, `--limit`, `--simple-mode` provide filtering. No demand recorded.
|
||||
|
||||
---
|
||||
|
||||
### Group C: Simulation / Debug Surfaces
|
||||
|
||||
#### `remote-mode`
|
||||
**Signals received: 0**
|
||||
|
||||
Notes: Simulation-only. No production orchestration need.
|
||||
|
||||
#### `ssh-mode`
|
||||
**Signals received: 0**
|
||||
|
||||
Notes: Simulation-only.
|
||||
|
||||
#### `teleport-mode`
|
||||
**Signals received: 0**
|
||||
|
||||
Notes: Simulation-only.
|
||||
|
||||
#### `direct-connect-mode`
|
||||
**Signals received: 0**
|
||||
|
||||
Notes: Simulation-only.
|
||||
|
||||
#### `deep-link-mode`
|
||||
**Signals received: 0**
|
||||
|
||||
Notes: Simulation-only.
|
||||
|
||||
---
|
||||
|
||||
## Survey Window Status
|
||||
|
||||
| Cycle | Date | New Signals | Running Total | Action |
|
||||
|---|---|---|---|---|
|
||||
| #21 | 2026-04-22 | 0 | 0 | Survey opened; log established |
|
||||
|
||||
**Current assessment:** Zero demand for any OPT_OUT surface promotion. This is consistent with `OPT_OUT_AUDIT.md` prediction that all 12 likely stay OPT_OUT long-term.
|
||||
|
||||
---
|
||||
|
||||
## Signal Entry Template
|
||||
|
||||
```
|
||||
### <surface-name>
|
||||
**Signal received: [N]**
|
||||
|
||||
Entry N (YYYY-MM-DD):
|
||||
- Source: <who/what>
|
||||
- Use case: <concrete orchestration problem>
|
||||
- Markdown-alternative-checked: <yes/no + why insufficient>
|
||||
- Follow-up: <filed pinpoint / discussion thread / closed>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Decision Framework
|
||||
|
||||
At cycle #22 (or whenever survey window closes):
|
||||
|
||||
### If 0 signals total (likely):
|
||||
- Move all 12 surfaces to `PERMANENTLY_OPT_OUT` or similar
|
||||
- Remove `OPT_OUT_SURFACES` from `test_cli_parity_audit.py` (everything is explicitly non-goal)
|
||||
- Update `CLAUDE.md` to reflect maintainership mode
|
||||
- Close `OPT_OUT_AUDIT.md` with "audit complete, no promotions"
|
||||
|
||||
### If 1–2 signals on isolated surfaces:
|
||||
- File individual promotion pinpoints per surface with demand evidence
|
||||
- Each goes through standard #171/#172/#173 loop (parity audit, SCHEMAS.md, consistency test)
|
||||
|
||||
### If high demand (3+ signals):
|
||||
- Reopen audit: is the OPT_OUT classification actually correct?
|
||||
- Review whether protocol expansion is warranted
|
||||
|
||||
---
|
||||
|
||||
## Related Files
|
||||
|
||||
- **`OPT_OUT_AUDIT.md`** — Audit criteria, decision table, rationale by group
|
||||
- **`SCHEMAS.md`** — JSON contract for the 14 CLAWABLE surfaces
|
||||
- **`tests/test_cli_parity_audit.py`** — Machine enforcement of CLAWABLE/OPT_OUT classification
|
||||
- **`CLAUDE.md`** — Development posture (maintainership mode)
|
||||
|
||||
---
|
||||
|
||||
## Philosophy
|
||||
|
||||
**Prevent speculative expansion.** The discipline of requiring real signals before promotion protects the protocol from schema bloat. Every new CLAWABLE surface adds:
|
||||
- A SCHEMAS.md section (maintenance burden)
|
||||
- Test coverage (test suite tax)
|
||||
- Documentation (cognitive load for new developers)
|
||||
- Version compatibility (schema_version bump risk)
|
||||
|
||||
If a claw can't articulate *why* it needs JSON for `summary` beyond "it would be nice," then JSON for `summary` is not needed. The Markdown output is a feature, not a gap.
|
||||
|
||||
The audit log closes the loop on "governed non-goals": OPT_OUT surfaces are intentionally not clawable until proven otherwise by evidence.
|
||||
@@ -5,6 +5,8 @@
|
||||
·
|
||||
<a href="./USAGE.md">Usage</a>
|
||||
·
|
||||
<a href="./ERROR_HANDLING.md">Error Handling</a>
|
||||
·
|
||||
<a href="./rust/README.md">Rust workspace</a>
|
||||
·
|
||||
<a href="./PARITY.md">Parity</a>
|
||||
@@ -40,9 +42,11 @@ The canonical implementation lives in [`rust/`](./rust), and the current source
|
||||
|
||||
- **`rust/`** — canonical Rust workspace and the `claw` CLI binary
|
||||
- **`USAGE.md`** — task-oriented usage guide for the current product surface
|
||||
- **`ERROR_HANDLING.md`** — unified error-handling pattern for orchestration code
|
||||
- **`PARITY.md`** — Rust-port parity status and migration notes
|
||||
- **`ROADMAP.md`** — active roadmap and cleanup backlog
|
||||
- **`PHILOSOPHY.md`** — project intent and system-design framing
|
||||
- **`SCHEMAS.md`** — JSON protocol contract (Python harness reference)
|
||||
- **`src/` + `tests/`** — companion Python/reference workspace and audit helpers; not the primary runtime surface
|
||||
|
||||
## Quick start
|
||||
|
||||
1377
ROADMAP.md
1377
ROADMAP.md
File diff suppressed because it is too large
Load Diff
454
SCHEMAS.md
Normal file
454
SCHEMAS.md
Normal file
@@ -0,0 +1,454 @@
|
||||
# JSON Envelope Schemas — Clawable CLI Contract
|
||||
|
||||
This document locks the field-level contract for all clawable-surface commands. Every command accepting `--output-format json` must conform to the envelope shapes below.
|
||||
|
||||
**Target audience:** Claws building orchestrators, automation, or monitoring against claw-code's JSON output.
|
||||
|
||||
---
|
||||
|
||||
## Common Fields (All Envelopes)
|
||||
|
||||
Every command response, success or error, carries:
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "list-sessions",
|
||||
"exit_code": 0,
|
||||
"output_format": "json",
|
||||
"schema_version": "1.0"
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Type | Required | Notes |
|
||||
|---|---|---|---|
|
||||
| `timestamp` | ISO 8601 UTC | Yes | Time command completed |
|
||||
| `command` | string | Yes | argv[1] (e.g. "list-sessions") |
|
||||
| `exit_code` | int (0/1/2) | Yes | 0=success, 1=error/not-found, 2=timeout |
|
||||
| `output_format` | string | Yes | Always "json" (for symmetry with text mode) |
|
||||
| `schema_version` | string | Yes | "1.0" (bump for breaking changes) |
|
||||
|
||||
---
|
||||
|
||||
## Turn Result Fields (Multi-Turn Sessions)
|
||||
|
||||
When a command's response includes a `turn` object (e.g., in `bootstrap` or `turn-loop`), it carries:
|
||||
|
||||
| Field | Type | Required | Notes |
|
||||
|---|---|---|---|
|
||||
| `prompt` | string | Yes | User input for this turn |
|
||||
| `output` | string | Yes | Assistant response |
|
||||
| `stop_reason` | enum | Yes | One of: `completed`, `timeout`, `cancelled`, `max_budget_reached`, `max_turns_reached` |
|
||||
| `cancel_observed` | bool | Yes | #164 Stage B: cancellation was signaled and observed (#161/#164) |
|
||||
|
||||
---
|
||||
|
||||
## Error Envelope
|
||||
|
||||
When a command fails (exit code 1), responses carry:
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "exec-command",
|
||||
"exit_code": 1,
|
||||
"error": {
|
||||
"kind": "filesystem",
|
||||
"operation": "write",
|
||||
"target": "/tmp/nonexistent/out.md",
|
||||
"retryable": true,
|
||||
"message": "No such file or directory",
|
||||
"hint": "intermediate directory does not exist; try mkdir -p /tmp/nonexistent"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Type | Required | Notes |
|
||||
|---|---|---|---|
|
||||
| `error.kind` | enum | Yes | One of: `filesystem`, `auth`, `session`, `parse`, `runtime`, `mcp`, `delivery`, `usage`, `policy`, `unknown` |
|
||||
| `error.operation` | string | Yes | Syscall/method that failed (e.g. "write", "open", "resolve_session") |
|
||||
| `error.target` | string | Yes | Resource that failed (path, session-id, server-name, etc.) |
|
||||
| `error.retryable` | bool | Yes | Whether caller can safely retry without intervention |
|
||||
| `error.message` | string | Yes | Platform error message (e.g. errno text) |
|
||||
| `error.hint` | string | No | Optional actionable next step |
|
||||
|
||||
---
|
||||
|
||||
## Not-Found Envelope
|
||||
|
||||
When an entity does not exist (exit code 1, but not a failure):
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "load-session",
|
||||
"exit_code": 1,
|
||||
"name": "does-not-exist",
|
||||
"found": false,
|
||||
"error": {
|
||||
"kind": "session_not_found",
|
||||
"message": "session 'does-not-exist' not found in .claw/sessions/",
|
||||
"retryable": false
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Type | Required | Notes |
|
||||
|---|---|---|---|
|
||||
| `name` | string | Yes | Entity name/id that was looked up |
|
||||
| `found` | bool | Yes | Always `false` for not-found |
|
||||
| `error.kind` | enum | Yes | One of: `command_not_found`, `tool_not_found`, `session_not_found` |
|
||||
| `error.message` | string | Yes | User-visible explanation |
|
||||
| `error.retryable` | bool | Yes | Usually `false` (entity will not magically appear) |
|
||||
|
||||
---
|
||||
|
||||
## Per-Command Success Schemas
|
||||
|
||||
### `list-sessions`
|
||||
|
||||
**Status**: ✅ Implemented (closed #251 cycle #45, 2026-04-23).
|
||||
|
||||
**Actual binary envelope** (as of #251 fix):
|
||||
```json
|
||||
{
|
||||
"command": "list-sessions",
|
||||
"sessions": [
|
||||
{
|
||||
"id": "session-1775777421902-1",
|
||||
"path": "/path/to/.claw/sessions/session-1775777421902-1.jsonl",
|
||||
"updated_at_ms": 1775777421902,
|
||||
"message_count": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Aspirational (future) shape**:
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "list-sessions",
|
||||
"exit_code": 0,
|
||||
"output_format": "json",
|
||||
"schema_version": "1.0",
|
||||
"directory": ".claw/sessions",
|
||||
"sessions_count": 2,
|
||||
"sessions": [
|
||||
{
|
||||
"session_id": "sess_abc123",
|
||||
"created_at": "2026-04-21T15:30:00Z",
|
||||
"last_modified": "2026-04-22T09:45:00Z",
|
||||
"prompt_count": 5,
|
||||
"stopped": false
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Gap**: Current impl lacks `timestamp`, `exit_code`, `output_format`, `schema_version`, `directory`, `sessions_count` (derivable), and the session object uses `id`/`updated_at_ms`/`message_count` instead of `session_id`/`last_modified`/`prompt_count`. Follow-up #250 Option B to align field names and add common-envelope fields.
|
||||
|
||||
### `delete-session`
|
||||
|
||||
**Status**: ⚠️ Stub only (closed #251 dispatch-order fix; full impl deferred).
|
||||
|
||||
**Actual binary envelope** (as of #251 fix):
|
||||
```json
|
||||
{
|
||||
"type": "error",
|
||||
"command": "delete-session",
|
||||
"error": "not_yet_implemented",
|
||||
"kind": "not_yet_implemented"
|
||||
}
|
||||
```
|
||||
|
||||
Exit code: 1. No credentials required. The stub ensures the verb does NOT fall through to Prompt/auth (the #251 fix), but the actual delete operation is not yet wired.
|
||||
|
||||
**Aspirational (future) shape**:
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "delete-session",
|
||||
"exit_code": 0,
|
||||
"session_id": "sess_abc123",
|
||||
"deleted": true,
|
||||
"directory": ".claw/sessions"
|
||||
}
|
||||
```
|
||||
|
||||
### `load-session`
|
||||
|
||||
**Status**: ✅ Implemented (closed #251 cycle #45, 2026-04-23).
|
||||
|
||||
**Actual binary envelope** (as of #251 fix):
|
||||
```json
|
||||
{
|
||||
"command": "load-session",
|
||||
"session": {
|
||||
"id": "session-abc123",
|
||||
"path": "/path/to/.claw/sessions/session-abc123.jsonl",
|
||||
"messages": 5
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
For nonexistent sessions, emits a local `session_not_found` error (NOT `missing_credentials`):
|
||||
```json
|
||||
{
|
||||
"error": "session not found: nonexistent",
|
||||
"kind": "session_not_found",
|
||||
"type": "error",
|
||||
"hint": "Hint: managed sessions live in .claw/sessions/<hash>/ ..."
|
||||
}
|
||||
```
|
||||
|
||||
**Aspirational (future) shape**:
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "load-session",
|
||||
"exit_code": 0,
|
||||
"session_id": "sess_abc123",
|
||||
"loaded": true,
|
||||
"directory": ".claw/sessions",
|
||||
"path": ".claw/sessions/sess_abc123.jsonl"
|
||||
}
|
||||
```
|
||||
|
||||
**Gap**: Current impl uses nested `session: {...}` instead of flat fields, and omits common-envelope fields. Follow-up #250 Option B to align.
|
||||
|
||||
### `flush-transcript`
|
||||
|
||||
**Status**: ⚠️ Stub only (closed #251 dispatch-order fix; full impl deferred).
|
||||
|
||||
**Actual binary envelope** (as of #251 fix):
|
||||
```json
|
||||
{
|
||||
"type": "error",
|
||||
"command": "flush-transcript",
|
||||
"error": "not_yet_implemented",
|
||||
"kind": "not_yet_implemented"
|
||||
}
|
||||
```
|
||||
|
||||
Exit code: 1. No credentials required. Like `delete-session`, this stub resolves the #251 dispatch-order bug but the actual flush operation is not yet wired.
|
||||
|
||||
**Aspirational (future) shape**:
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "flush-transcript",
|
||||
"exit_code": 0,
|
||||
"session_id": "sess_abc123",
|
||||
"path": ".claw/sessions/sess_abc123.jsonl",
|
||||
"flushed": true,
|
||||
"messages_count": 12,
|
||||
"input_tokens": 4500,
|
||||
"output_tokens": 1200
|
||||
}
|
||||
```
|
||||
|
||||
### `show-command`
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "show-command",
|
||||
"exit_code": 0,
|
||||
"name": "add-dir",
|
||||
"found": true,
|
||||
"source_hint": "commands/add-dir/add-dir.tsx",
|
||||
"responsibility": "creates a new directory in the worktree"
|
||||
}
|
||||
```
|
||||
|
||||
### `show-tool`
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "show-tool",
|
||||
"exit_code": 0,
|
||||
"name": "BashTool",
|
||||
"found": true,
|
||||
"source_hint": "tools/BashTool/BashTool.tsx"
|
||||
}
|
||||
```
|
||||
|
||||
### `exec-command`
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "exec-command",
|
||||
"exit_code": 0,
|
||||
"name": "add-dir",
|
||||
"prompt": "create src/util/",
|
||||
"handled": true,
|
||||
"message": "created directory",
|
||||
"source_hint": "commands/add-dir/add-dir.tsx"
|
||||
}
|
||||
```
|
||||
|
||||
### `exec-tool`
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "exec-tool",
|
||||
"exit_code": 0,
|
||||
"name": "BashTool",
|
||||
"payload": "cargo build",
|
||||
"handled": true,
|
||||
"message": "exit code 0",
|
||||
"source_hint": "tools/BashTool/BashTool.tsx"
|
||||
}
|
||||
```
|
||||
|
||||
### `route`
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "route",
|
||||
"exit_code": 0,
|
||||
"prompt": "add a test",
|
||||
"limit": 10,
|
||||
"match_count": 3,
|
||||
"matches": [
|
||||
{
|
||||
"kind": "command",
|
||||
"name": "add-file",
|
||||
"score": 0.92,
|
||||
"source_hint": "commands/add-file/add-file.tsx"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### `bootstrap`
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "bootstrap",
|
||||
"exit_code": 0,
|
||||
"prompt": "hello",
|
||||
"setup": {
|
||||
"python_version": "3.13.12",
|
||||
"implementation": "CPython",
|
||||
"platform_name": "darwin",
|
||||
"test_command": "pytest"
|
||||
},
|
||||
"routed_matches": [
|
||||
{"kind": "command", "name": "init", "score": 0.85, "source_hint": "..."}
|
||||
],
|
||||
"turn": {
|
||||
"prompt": "hello",
|
||||
"output": "...",
|
||||
"stop_reason": "completed"
|
||||
},
|
||||
"persisted_session_path": ".claw/sessions/sess_abc.jsonl"
|
||||
}
|
||||
```
|
||||
|
||||
### `command-graph`
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "command-graph",
|
||||
"exit_code": 0,
|
||||
"builtins_count": 185,
|
||||
"plugin_like_count": 20,
|
||||
"skill_like_count": 2,
|
||||
"total_count": 207,
|
||||
"builtins": [
|
||||
{"name": "add-dir", "source_hint": "commands/add-dir/add-dir.tsx"}
|
||||
],
|
||||
"plugin_like": [],
|
||||
"skill_like": []
|
||||
}
|
||||
```
|
||||
|
||||
### `tool-pool`
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "tool-pool",
|
||||
"exit_code": 0,
|
||||
"simple_mode": false,
|
||||
"include_mcp": true,
|
||||
"tool_count": 184,
|
||||
"tools": [
|
||||
{"name": "BashTool", "source_hint": "tools/BashTool/BashTool.tsx"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### `bootstrap-graph`
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-22T10:10:00Z",
|
||||
"command": "bootstrap-graph",
|
||||
"exit_code": 0,
|
||||
"stages": ["stage 1", "stage 2", "..."],
|
||||
"note": "bootstrap-graph is markdown-only in this version"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Versioning & Compatibility
|
||||
|
||||
- **schema_version = "1.0":** Current as of 2026-04-22. Covers all 13 clawable commands.
|
||||
- **Breaking changes** (e.g. renaming a field) bump schema_version to "2.0".
|
||||
- **Additive changes** (e.g. new optional field) stay at "1.0" and are backward compatible.
|
||||
- Downstream claws **must** check `schema_version` before relying on field presence.
|
||||
|
||||
---
|
||||
|
||||
## Regression Testing
|
||||
|
||||
Each command is covered by:
|
||||
1. **Fixture file** (golden JSON snapshot under `tests/fixtures/json/<command>.json`)
|
||||
2. **Parametrised test** in `test_cli_parity_audit.py::TestJsonOutputContractEndToEnd`
|
||||
3. **Field consistency test** (new, tracked as ROADMAP #172)
|
||||
|
||||
To update a fixture after a intentional schema change:
|
||||
```bash
|
||||
claw <command> --output-format json <args> > tests/fixtures/json/<command>.json
|
||||
# Review the diff, commit
|
||||
git add tests/fixtures/json/<command>.json
|
||||
```
|
||||
|
||||
To verify no regressions:
|
||||
```bash
|
||||
cargo test --release test_json_envelope_field_consistency
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Design Notes
|
||||
|
||||
**Why common fields on every response?**
|
||||
- Downstream claws can build one error handler that works for all commands
|
||||
- Timestamp + command + exit_code give context without scraping argv or timestamps from command output
|
||||
- `schema_version` signals compatibility for future upgrades
|
||||
|
||||
**Why both "found" and "error" on not-found?**
|
||||
- Exit code 1 covers both "entity missing" and "operation failed"
|
||||
- `found=false` distinguishes not-found from error without string matching
|
||||
- `error.kind` and `error.retryable` let automation decide: retry a temporary miss vs escalate a permanent refusal
|
||||
|
||||
**Why "operation" and "target" in error?**
|
||||
- Claws can aggregate failures by operation type (e.g. "how many `write` ops failed?")
|
||||
- Claws can implement per-target retry policy (e.g. "skip missing files, retry networking")
|
||||
- Pure text errors ("No such file") do not provide enough structure for pattern matching
|
||||
|
||||
**Why "handled" vs "found"?**
|
||||
- `show-command` reports `found: bool` (inventory signal: "does this exist?")
|
||||
- `exec-command` reports `handled: bool` (operational signal: "was this work performed?")
|
||||
- The names matter: a command can be found but not handled (e.g. too large for context window), or handled silently (no output message)
|
||||
9
USAGE.md
9
USAGE.md
@@ -2,6 +2,9 @@
|
||||
|
||||
This guide covers the current Rust workspace under `rust/` and the `claw` CLI binary. If you are brand new, make the doctor health check your first run: start `claw`, then run `/doctor`.
|
||||
|
||||
> [!TIP]
|
||||
> **Building orchestration code that calls `claw` as a subprocess?** See [`ERROR_HANDLING.md`](./ERROR_HANDLING.md) for the unified error-handling pattern (one handler for all 14 clawable commands, exit codes, JSON envelope contract, and recovery strategies).
|
||||
|
||||
## Quick-start health check
|
||||
|
||||
Run this before prompts, sessions, or automation:
|
||||
@@ -95,11 +98,17 @@ cd rust
|
||||
|
||||
### JSON output for scripting
|
||||
|
||||
All clawable commands support `--output-format json` for machine-readable output. Every invocation returns a consistent JSON envelope with `exit_code`, `command`, `timestamp`, and either `{success fields}` or `{error: {kind, message, ...}}`.
|
||||
|
||||
```bash
|
||||
cd rust
|
||||
./target/debug/claw --output-format json prompt "status"
|
||||
./target/debug/claw --output-format json load-session my-session-id
|
||||
./target/debug/claw --output-format json turn-loop "analyze logs" --max-turns 1
|
||||
```
|
||||
|
||||
**Building a dispatcher or orchestration script?** See [`ERROR_HANDLING.md`](./ERROR_HANDLING.md) for the unified error-handling pattern. One code example works for all 14 clawable commands: parse the exit code, classify by `error.kind`, apply recovery strategies (retry, timeout recovery, validation, logging). Use that pattern instead of reimplementing error handling per command.
|
||||
|
||||
### Inspect worker state
|
||||
|
||||
The `claw state` command reads `.claw/worker-state.json`, which is written by the interactive REPL or a one-shot prompt when a worker executes a task. This file contains the worker ID, session reference, model, and permission mode.
|
||||
|
||||
@@ -213,7 +213,16 @@ fn main() {
|
||||
// #77: classify error by prefix so downstream claws can route without
|
||||
// regex-scraping the prose. Split short-reason from hint-runbook.
|
||||
let kind = classify_error_kind(&message);
|
||||
let (short_reason, hint) = split_error_hint(&message);
|
||||
let (short_reason, mut hint) = split_error_hint(&message);
|
||||
// #247: JSON envelope was losing the `Run claw --help for usage.`
|
||||
// trailer that text-mode stderr includes. When the error is a
|
||||
// cli_parse and the message itself carried no embedded hint,
|
||||
// synthesize the trailer so typed-error consumers get the same
|
||||
// actionable pointer that text-mode users see. Cross-channel
|
||||
// consistency is a §4.44 typed-envelope contract requirement.
|
||||
if hint.is_none() && kind == "cli_parse" && !short_reason.contains("`claw --help`") {
|
||||
hint = Some("Run `claw --help` for usage.".to_string());
|
||||
}
|
||||
eprintln!(
|
||||
"{}",
|
||||
serde_json::json!({
|
||||
@@ -264,6 +273,12 @@ fn classify_error_kind(message: &str) -> &'static str {
|
||||
"no_managed_sessions"
|
||||
} else if message.contains("unrecognized argument") || message.contains("unknown option") {
|
||||
"cli_parse"
|
||||
} else if message.contains("prompt subcommand requires") {
|
||||
// #247: `claw prompt` with no argument — a parse error, not `unknown`.
|
||||
"cli_parse"
|
||||
} else if message.starts_with("empty prompt:") {
|
||||
// #247: `claw ""` or `claw " "` — a parse error, not `unknown`.
|
||||
"cli_parse"
|
||||
} else if message.contains("invalid model syntax") {
|
||||
"invalid_model_syntax"
|
||||
} else if message.contains("is not yet implemented") {
|
||||
@@ -402,7 +417,10 @@ fn run() -> Result<(), Box<dyn std::error::Error>> {
|
||||
cli.set_reasoning_effort(reasoning_effort);
|
||||
cli.run_turn_with_output(&effective_prompt, output_format, compact)?;
|
||||
}
|
||||
CliAction::Doctor { output_format } => run_doctor(output_format)?,
|
||||
CliAction::Doctor { output_format } => {
|
||||
run_stale_base_preflight(None);
|
||||
run_doctor(output_format)?
|
||||
}
|
||||
CliAction::Acp { output_format } => print_acp_status(output_format)?,
|
||||
CliAction::State { output_format } => run_worker_state(output_format)?,
|
||||
CliAction::Init { output_format } => run_init(output_format)?,
|
||||
@@ -10434,6 +10452,32 @@ mod tests {
|
||||
assert_eq!(classify_error_kind("something completely unknown"), "unknown");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_error_kind_covers_prompt_parse_errors_247() {
|
||||
// #247: prompt-related parse errors must classify as `cli_parse`,
|
||||
// not fall through to `unknown`. Regression guard for ROADMAP #247
|
||||
// (typed-error contract drift found in cycle #33 dogfood).
|
||||
assert_eq!(
|
||||
classify_error_kind("prompt subcommand requires a prompt string"),
|
||||
"cli_parse",
|
||||
"bare `claw prompt` must surface as cli_parse so typed-error consumers can dispatch"
|
||||
);
|
||||
assert_eq!(
|
||||
classify_error_kind(
|
||||
"empty prompt: provide a subcommand (run `claw --help`) or a non-empty prompt string"
|
||||
),
|
||||
"cli_parse",
|
||||
"`claw \"\"` must surface as cli_parse, not unknown"
|
||||
);
|
||||
// Sanity: the new patterns must be specific enough not to hijack
|
||||
// genuinely unknown errors that happen to contain the word `prompt`.
|
||||
assert_eq!(
|
||||
classify_error_kind("some random prompt-adjacent failure we don't recognize"),
|
||||
"unknown",
|
||||
"generic prompt-containing text should still fall through to unknown"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn split_error_hint_separates_reason_from_runbook() {
|
||||
// #77: short reason / hint separation for JSON error payloads
|
||||
|
||||
@@ -388,6 +388,114 @@ fn assert_json_command(current_dir: &Path, args: &[&str]) -> Value {
|
||||
assert_json_command_with_env(current_dir, args, &[])
|
||||
}
|
||||
|
||||
/// #247 regression helper: run claw expecting a non-zero exit and return
|
||||
/// the JSON error envelope parsed from stderr. Asserts exit != 0 and that
|
||||
/// the envelope includes `type: "error"` at the very least.
|
||||
fn assert_json_error_envelope(current_dir: &Path, args: &[&str]) -> Value {
|
||||
let output = run_claw(current_dir, args, &[]);
|
||||
assert!(
|
||||
!output.status.success(),
|
||||
"command unexpectedly succeeded; stdout:\n{}\nstderr:\n{}",
|
||||
String::from_utf8_lossy(&output.stdout),
|
||||
String::from_utf8_lossy(&output.stderr)
|
||||
);
|
||||
// The JSON envelope is written to stderr for error cases (see main.rs).
|
||||
let envelope: Value = serde_json::from_slice(&output.stderr).unwrap_or_else(|err| {
|
||||
panic!(
|
||||
"stderr should be a JSON error envelope but failed to parse: {err}\nstderr bytes:\n{}",
|
||||
String::from_utf8_lossy(&output.stderr)
|
||||
)
|
||||
});
|
||||
assert_eq!(
|
||||
envelope["type"], "error",
|
||||
"envelope should carry type=error"
|
||||
);
|
||||
envelope
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn prompt_subcommand_without_arg_emits_cli_parse_envelope_with_hint_247() {
|
||||
// #247: `claw prompt` with no argument must classify as `cli_parse`
|
||||
// (not `unknown`) and the JSON envelope must carry the same actionable
|
||||
// `Run claw --help for usage.` hint that text-mode stderr appends.
|
||||
let root = unique_temp_dir("247-prompt-no-arg");
|
||||
fs::create_dir_all(&root).expect("temp dir should exist");
|
||||
|
||||
let envelope = assert_json_error_envelope(&root, &["--output-format", "json", "prompt"]);
|
||||
assert_eq!(
|
||||
envelope["kind"], "cli_parse",
|
||||
"prompt subcommand without arg should classify as cli_parse, envelope: {envelope}"
|
||||
);
|
||||
assert_eq!(
|
||||
envelope["error"], "prompt subcommand requires a prompt string",
|
||||
"short reason should match the raw error, envelope: {envelope}"
|
||||
);
|
||||
assert_eq!(
|
||||
envelope["hint"],
|
||||
"Run `claw --help` for usage.",
|
||||
"JSON envelope must carry the same help-runbook hint as text mode, envelope: {envelope}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn empty_positional_arg_emits_cli_parse_envelope_247() {
|
||||
// #247: `claw ""` must classify as `cli_parse`, not `unknown`. The
|
||||
// message itself embeds a ``run `claw --help`` pointer so the explicit
|
||||
// hint field is allowed to remain null to avoid duplication — what
|
||||
// matters for the typed-error contract is that `kind == cli_parse`.
|
||||
let root = unique_temp_dir("247-empty-arg");
|
||||
fs::create_dir_all(&root).expect("temp dir should exist");
|
||||
|
||||
let envelope = assert_json_error_envelope(&root, &["--output-format", "json", ""]);
|
||||
assert_eq!(
|
||||
envelope["kind"], "cli_parse",
|
||||
"empty-prompt error should classify as cli_parse, envelope: {envelope}"
|
||||
);
|
||||
let short = envelope["error"]
|
||||
.as_str()
|
||||
.expect("error field should be a string");
|
||||
assert!(
|
||||
short.starts_with("empty prompt:"),
|
||||
"short reason should preserve the original empty-prompt message, got: {short}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn whitespace_only_positional_arg_emits_cli_parse_envelope_247() {
|
||||
// #247: same rule for `claw " "` — any whitespace-only prompt must
|
||||
// flow through the empty-prompt path and classify as `cli_parse`.
|
||||
let root = unique_temp_dir("247-whitespace-arg");
|
||||
fs::create_dir_all(&root).expect("temp dir should exist");
|
||||
|
||||
let envelope = assert_json_error_envelope(&root, &["--output-format", "json", " "]);
|
||||
assert_eq!(
|
||||
envelope["kind"], "cli_parse",
|
||||
"whitespace-only prompt should classify as cli_parse, envelope: {envelope}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn unrecognized_argument_still_classifies_as_cli_parse_247_regression_guard() {
|
||||
// #247 regression guard: the new empty-prompt / prompt-subcommand
|
||||
// patterns must NOT hijack the existing #77 unrecognized-argument
|
||||
// classification. `claw doctor --foo` must still surface as cli_parse
|
||||
// with the runbook hint present.
|
||||
let root = unique_temp_dir("247-unrecognized-arg");
|
||||
fs::create_dir_all(&root).expect("temp dir should exist");
|
||||
|
||||
let envelope =
|
||||
assert_json_error_envelope(&root, &["--output-format", "json", "doctor", "--foo"]);
|
||||
assert_eq!(
|
||||
envelope["kind"], "cli_parse",
|
||||
"unrecognized-argument must remain cli_parse, envelope: {envelope}"
|
||||
);
|
||||
assert_eq!(
|
||||
envelope["hint"],
|
||||
"Run `claw --help` for usage.",
|
||||
"unrecognized-argument hint should stay intact, envelope: {envelope}"
|
||||
);
|
||||
}
|
||||
|
||||
fn assert_json_command_with_env(current_dir: &Path, args: &[&str], envs: &[(&str, &str)]) -> Value {
|
||||
let output = run_claw(current_dir, args, envs);
|
||||
assert!(
|
||||
|
||||
@@ -5,7 +5,16 @@ from .parity_audit import ParityAuditResult, run_parity_audit
|
||||
from .port_manifest import PortManifest, build_port_manifest
|
||||
from .query_engine import QueryEnginePort, TurnResult
|
||||
from .runtime import PortRuntime, RuntimeSession
|
||||
from .session_store import StoredSession, load_session, save_session
|
||||
from .session_store import (
|
||||
SessionDeleteError,
|
||||
SessionNotFoundError,
|
||||
StoredSession,
|
||||
delete_session,
|
||||
list_sessions,
|
||||
load_session,
|
||||
save_session,
|
||||
session_exists,
|
||||
)
|
||||
from .system_init import build_system_init_message
|
||||
from .tools import PORTED_TOOLS, build_tool_backlog
|
||||
|
||||
@@ -15,6 +24,8 @@ __all__ = [
|
||||
'PortRuntime',
|
||||
'QueryEnginePort',
|
||||
'RuntimeSession',
|
||||
'SessionDeleteError',
|
||||
'SessionNotFoundError',
|
||||
'StoredSession',
|
||||
'TurnResult',
|
||||
'PORTED_COMMANDS',
|
||||
@@ -23,7 +34,10 @@ __all__ = [
|
||||
'build_port_manifest',
|
||||
'build_system_init_message',
|
||||
'build_tool_backlog',
|
||||
'delete_session',
|
||||
'list_sessions',
|
||||
'load_session',
|
||||
'run_parity_audit',
|
||||
'save_session',
|
||||
'session_exists',
|
||||
]
|
||||
|
||||
593
src/main.py
593
src/main.py
@@ -12,22 +12,48 @@ from .port_manifest import build_port_manifest
|
||||
from .query_engine import QueryEnginePort
|
||||
from .remote_runtime import run_remote_mode, run_ssh_mode, run_teleport_mode
|
||||
from .runtime import PortRuntime
|
||||
from .session_store import load_session
|
||||
from .session_store import (
|
||||
SessionDeleteError,
|
||||
SessionNotFoundError,
|
||||
delete_session,
|
||||
list_sessions,
|
||||
load_session,
|
||||
session_exists,
|
||||
)
|
||||
from .setup import run_setup
|
||||
from .tool_pool import assemble_tool_pool
|
||||
from .tools import execute_tool, get_tool, get_tools, render_tool_index
|
||||
|
||||
|
||||
def wrap_json_envelope(data: dict, command: str, exit_code: int = 0) -> dict:
|
||||
"""Wrap command output in canonical JSON envelope per SCHEMAS.md."""
|
||||
from datetime import datetime, timezone
|
||||
now_utc = datetime.now(timezone.utc).isoformat(timespec='seconds').replace('+00:00', 'Z')
|
||||
return {
|
||||
'timestamp': now_utc,
|
||||
'command': command,
|
||||
'exit_code': exit_code,
|
||||
'output_format': 'json',
|
||||
'schema_version': '1.0',
|
||||
**data,
|
||||
}
|
||||
|
||||
|
||||
def build_parser() -> argparse.ArgumentParser:
|
||||
parser = argparse.ArgumentParser(description='Python porting workspace for the Claude Code rewrite effort')
|
||||
# #180: Add --version flag to match canonical CLI contract
|
||||
parser.add_argument('--version', action='version', version='claw-code 1.0.0 (Python harness)')
|
||||
subparsers = parser.add_subparsers(dest='command', required=True)
|
||||
subparsers.add_parser('summary', help='render a Markdown summary of the Python porting workspace')
|
||||
subparsers.add_parser('manifest', help='print the current Python workspace manifest')
|
||||
subparsers.add_parser('parity-audit', help='compare the Python workspace against the local ignored TypeScript archive when available')
|
||||
subparsers.add_parser('setup-report', help='render the startup/prefetch setup report')
|
||||
subparsers.add_parser('command-graph', help='show command graph segmentation')
|
||||
subparsers.add_parser('tool-pool', help='show assembled tool pool with default settings')
|
||||
subparsers.add_parser('bootstrap-graph', help='show the mirrored bootstrap/runtime graph stages')
|
||||
command_graph_parser = subparsers.add_parser('command-graph', help='show command graph segmentation')
|
||||
command_graph_parser.add_argument('--output-format', choices=['text', 'json'], default='text')
|
||||
tool_pool_parser = subparsers.add_parser('tool-pool', help='show assembled tool pool with default settings')
|
||||
tool_pool_parser.add_argument('--output-format', choices=['text', 'json'], default='text')
|
||||
bootstrap_graph_parser = subparsers.add_parser('bootstrap-graph', help='show the mirrored bootstrap/runtime graph stages')
|
||||
bootstrap_graph_parser.add_argument('--output-format', choices=['text', 'json'], default='text')
|
||||
list_parser = subparsers.add_parser('subsystems', help='list the current Python modules in the workspace')
|
||||
list_parser.add_argument('--limit', type=int, default=32)
|
||||
|
||||
@@ -48,22 +74,104 @@ def build_parser() -> argparse.ArgumentParser:
|
||||
route_parser = subparsers.add_parser('route', help='route a prompt across mirrored command/tool inventories')
|
||||
route_parser.add_argument('prompt')
|
||||
route_parser.add_argument('--limit', type=int, default=5)
|
||||
# #168: parity with show-command/show-tool/session-lifecycle CLI family
|
||||
route_parser.add_argument('--output-format', choices=['text', 'json'], default='text')
|
||||
|
||||
bootstrap_parser = subparsers.add_parser('bootstrap', help='build a runtime-style session report from the mirrored inventories')
|
||||
bootstrap_parser.add_argument('prompt')
|
||||
bootstrap_parser.add_argument('--limit', type=int, default=5)
|
||||
# #168: parity with CLI family
|
||||
bootstrap_parser.add_argument('--output-format', choices=['text', 'json'], default='text')
|
||||
|
||||
loop_parser = subparsers.add_parser('turn-loop', help='run a small stateful turn loop for the mirrored runtime')
|
||||
loop_parser.add_argument('prompt')
|
||||
loop_parser.add_argument('--limit', type=int, default=5)
|
||||
loop_parser.add_argument('--max-turns', type=int, default=3)
|
||||
loop_parser.add_argument('--structured-output', action='store_true')
|
||||
loop_parser.add_argument(
|
||||
'--timeout-seconds',
|
||||
type=float,
|
||||
default=None,
|
||||
help='total wall-clock budget across all turns (#161). Default: unbounded.',
|
||||
)
|
||||
loop_parser.add_argument(
|
||||
'--continuation-prompt',
|
||||
default=None,
|
||||
help=(
|
||||
'prompt to submit on turns after the first (#163). Default: None '
|
||||
'(loop stops after turn 0). Replaces the deprecated implicit "[turn N]" '
|
||||
'suffix that used to pollute the transcript.'
|
||||
),
|
||||
)
|
||||
loop_parser.add_argument(
|
||||
'--output-format',
|
||||
choices=['text', 'json'],
|
||||
default='text',
|
||||
help='output format (#164 Stage B: JSON includes cancel_observed per turn)',
|
||||
)
|
||||
|
||||
flush_parser = subparsers.add_parser('flush-transcript', help='persist and flush a temporary session transcript')
|
||||
flush_parser = subparsers.add_parser(
|
||||
'flush-transcript',
|
||||
help='persist and flush a temporary session transcript (#160/#166: claw-native session API)',
|
||||
)
|
||||
flush_parser.add_argument('prompt')
|
||||
flush_parser.add_argument(
|
||||
'--directory', help='session storage directory (default: .port_sessions)'
|
||||
)
|
||||
flush_parser.add_argument(
|
||||
'--output-format',
|
||||
choices=['text', 'json'],
|
||||
default='text',
|
||||
help='output format',
|
||||
)
|
||||
flush_parser.add_argument(
|
||||
'--session-id',
|
||||
help='deterministic session ID (default: auto-generated UUID)',
|
||||
)
|
||||
|
||||
load_session_parser = subparsers.add_parser('load-session', help='load a previously persisted session')
|
||||
load_session_parser = subparsers.add_parser(
|
||||
'load-session',
|
||||
help='load a previously persisted session (#160/#165: claw-native session API)',
|
||||
)
|
||||
load_session_parser.add_argument('session_id')
|
||||
load_session_parser.add_argument(
|
||||
'--directory', help='session storage directory (default: .port_sessions)'
|
||||
)
|
||||
load_session_parser.add_argument(
|
||||
'--output-format',
|
||||
choices=['text', 'json'],
|
||||
default='text',
|
||||
help='output format',
|
||||
)
|
||||
|
||||
list_sessions_parser = subparsers.add_parser(
|
||||
'list-sessions',
|
||||
help='enumerate stored session IDs (#160: claw-native session API)',
|
||||
)
|
||||
list_sessions_parser.add_argument(
|
||||
'--directory', help='session storage directory (default: .port_sessions)'
|
||||
)
|
||||
list_sessions_parser.add_argument(
|
||||
'--output-format',
|
||||
choices=['text', 'json'],
|
||||
default='text',
|
||||
help='output format',
|
||||
)
|
||||
|
||||
delete_session_parser = subparsers.add_parser(
|
||||
'delete-session',
|
||||
help='delete a persisted session (#160: idempotent, race-safe)',
|
||||
)
|
||||
delete_session_parser.add_argument('session_id')
|
||||
delete_session_parser.add_argument(
|
||||
'--directory', help='session storage directory (default: .port_sessions)'
|
||||
)
|
||||
delete_session_parser.add_argument(
|
||||
'--output-format',
|
||||
choices=['text', 'json'],
|
||||
default='text',
|
||||
help='output format',
|
||||
)
|
||||
|
||||
remote_parser = subparsers.add_parser('remote-mode', help='simulate remote-control runtime branching')
|
||||
remote_parser.add_argument('target')
|
||||
@@ -78,22 +186,112 @@ def build_parser() -> argparse.ArgumentParser:
|
||||
|
||||
show_command = subparsers.add_parser('show-command', help='show one mirrored command entry by exact name')
|
||||
show_command.add_argument('name')
|
||||
show_command.add_argument('--output-format', choices=['text', 'json'], default='text')
|
||||
show_tool = subparsers.add_parser('show-tool', help='show one mirrored tool entry by exact name')
|
||||
show_tool.add_argument('name')
|
||||
show_tool.add_argument('--output-format', choices=['text', 'json'], default='text')
|
||||
|
||||
exec_command_parser = subparsers.add_parser('exec-command', help='execute a mirrored command shim by exact name')
|
||||
exec_command_parser.add_argument('name')
|
||||
exec_command_parser.add_argument('prompt')
|
||||
# #168: parity with CLI family
|
||||
exec_command_parser.add_argument('--output-format', choices=['text', 'json'], default='text')
|
||||
|
||||
exec_tool_parser = subparsers.add_parser('exec-tool', help='execute a mirrored tool shim by exact name')
|
||||
exec_tool_parser.add_argument('name')
|
||||
exec_tool_parser.add_argument('payload')
|
||||
# #168: parity with CLI family
|
||||
exec_tool_parser.add_argument('--output-format', choices=['text', 'json'], default='text')
|
||||
return parser
|
||||
|
||||
|
||||
class _ArgparseError(Exception):
|
||||
"""#179: internal exception capturing argparse's real error message.
|
||||
|
||||
Subclassed ArgumentParser raises this instead of printing + exiting,
|
||||
so JSON mode can preserve the actual error (e.g. 'the following arguments
|
||||
are required: session_id') in the envelope.
|
||||
"""
|
||||
def __init__(self, message: str) -> None:
|
||||
super().__init__(message)
|
||||
self.message = message
|
||||
|
||||
|
||||
def _emit_parse_error_envelope(argv: list[str], message: str) -> None:
|
||||
"""#178/#179: emit JSON envelope for argparse-level errors when --output-format json is requested.
|
||||
|
||||
Pre-scans argv for --output-format json. If found, prints a parse-error envelope
|
||||
to stdout (per SCHEMAS.md 'error' envelope shape) instead of letting argparse
|
||||
dump help text to stderr. This preserves the JSON contract for claws that can't
|
||||
parse argparse usage messages.
|
||||
|
||||
#179 update: `message` now carries argparse's actual error text, not a generic
|
||||
rejection string. Stderr is fully suppressed in JSON mode.
|
||||
"""
|
||||
import json
|
||||
# Extract the attempted command (argv[0] is the first positional)
|
||||
attempted = argv[0] if argv and not argv[0].startswith('-') else '<missing>'
|
||||
envelope = wrap_json_envelope(
|
||||
{
|
||||
'error': {
|
||||
'kind': 'parse',
|
||||
'operation': 'argparse',
|
||||
'target': attempted,
|
||||
'retryable': False,
|
||||
'message': message,
|
||||
'hint': 'run with no arguments to see available subcommands',
|
||||
},
|
||||
},
|
||||
command=attempted,
|
||||
exit_code=1,
|
||||
)
|
||||
print(json.dumps(envelope))
|
||||
|
||||
|
||||
def _wants_json_output(argv: list[str]) -> bool:
|
||||
"""#178: check if argv contains --output-format json anywhere (for parse-error routing)."""
|
||||
for i, arg in enumerate(argv):
|
||||
if arg == '--output-format' and i + 1 < len(argv) and argv[i + 1] == 'json':
|
||||
return True
|
||||
if arg == '--output-format=json':
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def main(argv: list[str] | None = None) -> int:
|
||||
import sys
|
||||
if argv is None:
|
||||
argv = sys.argv[1:]
|
||||
parser = build_parser()
|
||||
args = parser.parse_args(argv)
|
||||
json_mode = _wants_json_output(argv)
|
||||
# #178/#179: capture argparse errors with real message and emit JSON envelope
|
||||
# when --output-format json is requested. In JSON mode, stderr is silenced
|
||||
# so claws only see the envelope on stdout.
|
||||
if json_mode:
|
||||
# Monkey-patch parser.error to raise instead of print+exit. This preserves
|
||||
# the original error message text (e.g. 'argument X: invalid choice: ...').
|
||||
original_error = parser.error
|
||||
def _json_mode_error(message: str) -> None:
|
||||
raise _ArgparseError(message)
|
||||
parser.error = _json_mode_error # type: ignore[method-assign]
|
||||
# Also patch all subparsers
|
||||
for action in parser._actions:
|
||||
if hasattr(action, 'choices') and isinstance(action.choices, dict):
|
||||
for subp in action.choices.values():
|
||||
subp.error = _json_mode_error # type: ignore[method-assign]
|
||||
try:
|
||||
args = parser.parse_args(argv)
|
||||
except _ArgparseError as err:
|
||||
_emit_parse_error_envelope(argv, err.message)
|
||||
return 1
|
||||
except SystemExit as exc:
|
||||
# Defensive: if argparse exits via some other path (e.g. --help in JSON mode)
|
||||
if exc.code != 0:
|
||||
_emit_parse_error_envelope(argv, 'argparse exited with non-zero code')
|
||||
return 1
|
||||
raise
|
||||
else:
|
||||
args = parser.parse_args(argv)
|
||||
manifest = build_port_manifest()
|
||||
if args.command == 'summary':
|
||||
print(QueryEnginePort(manifest).render_summary())
|
||||
@@ -108,13 +306,44 @@ def main(argv: list[str] | None = None) -> int:
|
||||
print(run_setup().as_markdown())
|
||||
return 0
|
||||
if args.command == 'command-graph':
|
||||
print(build_command_graph().as_markdown())
|
||||
graph = build_command_graph()
|
||||
if args.output_format == 'json':
|
||||
import json
|
||||
envelope = {
|
||||
'builtins_count': len(graph.builtins),
|
||||
'plugin_like_count': len(graph.plugin_like),
|
||||
'skill_like_count': len(graph.skill_like),
|
||||
'total_count': len(graph.flattened()),
|
||||
'builtins': [{'name': m.name, 'source_hint': m.source_hint} for m in graph.builtins],
|
||||
'plugin_like': [{'name': m.name, 'source_hint': m.source_hint} for m in graph.plugin_like],
|
||||
'skill_like': [{'name': m.name, 'source_hint': m.source_hint} for m in graph.skill_like],
|
||||
}
|
||||
print(json.dumps(wrap_json_envelope(envelope, args.command)))
|
||||
else:
|
||||
print(graph.as_markdown())
|
||||
return 0
|
||||
if args.command == 'tool-pool':
|
||||
print(assemble_tool_pool().as_markdown())
|
||||
pool = assemble_tool_pool()
|
||||
if args.output_format == 'json':
|
||||
import json
|
||||
envelope = {
|
||||
'simple_mode': pool.simple_mode,
|
||||
'include_mcp': pool.include_mcp,
|
||||
'tool_count': len(pool.tools),
|
||||
'tools': [{'name': t.name, 'source_hint': t.source_hint} for t in pool.tools],
|
||||
}
|
||||
print(json.dumps(wrap_json_envelope(envelope, args.command)))
|
||||
else:
|
||||
print(pool.as_markdown())
|
||||
return 0
|
||||
if args.command == 'bootstrap-graph':
|
||||
print(build_bootstrap_graph().as_markdown())
|
||||
graph = build_bootstrap_graph()
|
||||
if args.output_format == 'json':
|
||||
import json
|
||||
envelope = {'stages': graph.as_markdown().split('\n'), 'note': 'bootstrap-graph is markdown-only in this version'}
|
||||
print(json.dumps(wrap_json_envelope(envelope, args.command)))
|
||||
else:
|
||||
print(graph.as_markdown())
|
||||
return 0
|
||||
if args.command == 'subsystems':
|
||||
for subsystem in manifest.top_level_modules[: args.limit]:
|
||||
@@ -141,6 +370,25 @@ def main(argv: list[str] | None = None) -> int:
|
||||
return 0
|
||||
if args.command == 'route':
|
||||
matches = PortRuntime().route_prompt(args.prompt, limit=args.limit)
|
||||
# #168: JSON envelope for machine parsing
|
||||
if args.output_format == 'json':
|
||||
import json
|
||||
envelope = {
|
||||
'prompt': args.prompt,
|
||||
'limit': args.limit,
|
||||
'match_count': len(matches),
|
||||
'matches': [
|
||||
{
|
||||
'kind': m.kind,
|
||||
'name': m.name,
|
||||
'score': m.score,
|
||||
'source_hint': m.source_hint,
|
||||
}
|
||||
for m in matches
|
||||
],
|
||||
}
|
||||
print(json.dumps(wrap_json_envelope(envelope, args.command)))
|
||||
return 0
|
||||
if not matches:
|
||||
print('No mirrored command/tool matches found.')
|
||||
return 0
|
||||
@@ -148,25 +396,220 @@ def main(argv: list[str] | None = None) -> int:
|
||||
print(f'{match.kind}\t{match.name}\t{match.score}\t{match.source_hint}')
|
||||
return 0
|
||||
if args.command == 'bootstrap':
|
||||
print(PortRuntime().bootstrap_session(args.prompt, limit=args.limit).as_markdown())
|
||||
session = PortRuntime().bootstrap_session(args.prompt, limit=args.limit)
|
||||
# #168: JSON envelope for machine parsing
|
||||
if args.output_format == 'json':
|
||||
import json
|
||||
envelope = {
|
||||
'prompt': session.prompt,
|
||||
'limit': args.limit,
|
||||
'setup': {
|
||||
'python_version': session.setup.python_version,
|
||||
'implementation': session.setup.implementation,
|
||||
'platform_name': session.setup.platform_name,
|
||||
'test_command': session.setup.test_command,
|
||||
},
|
||||
'routed_matches': [
|
||||
{
|
||||
'kind': m.kind,
|
||||
'name': m.name,
|
||||
'score': m.score,
|
||||
'source_hint': m.source_hint,
|
||||
}
|
||||
for m in session.routed_matches
|
||||
],
|
||||
'command_execution_messages': list(session.command_execution_messages),
|
||||
'tool_execution_messages': list(session.tool_execution_messages),
|
||||
'turn': {
|
||||
'prompt': session.turn_result.prompt,
|
||||
'output': session.turn_result.output,
|
||||
'stop_reason': session.turn_result.stop_reason,
|
||||
'cancel_observed': session.turn_result.cancel_observed,
|
||||
},
|
||||
'persisted_session_path': session.persisted_session_path,
|
||||
}
|
||||
print(json.dumps(wrap_json_envelope(envelope, args.command)))
|
||||
return 0
|
||||
print(session.as_markdown())
|
||||
return 0
|
||||
if args.command == 'turn-loop':
|
||||
results = PortRuntime().run_turn_loop(args.prompt, limit=args.limit, max_turns=args.max_turns, structured_output=args.structured_output)
|
||||
results = PortRuntime().run_turn_loop(
|
||||
args.prompt,
|
||||
limit=args.limit,
|
||||
max_turns=args.max_turns,
|
||||
structured_output=args.structured_output,
|
||||
timeout_seconds=args.timeout_seconds,
|
||||
continuation_prompt=args.continuation_prompt,
|
||||
)
|
||||
# Exit 2 when a timeout terminated the loop so claws can distinguish
|
||||
# 'ran to completion' from 'hit wall-clock budget'.
|
||||
loop_exit_code = 2 if results and results[-1].stop_reason == 'timeout' else 0
|
||||
if args.output_format == 'json':
|
||||
# #164 Stage B + #173: JSON envelope with per-turn cancel_observed
|
||||
# Promotes turn-loop from OPT_OUT to CLAWABLE surface.
|
||||
import json
|
||||
envelope = {
|
||||
'prompt': args.prompt,
|
||||
'max_turns': args.max_turns,
|
||||
'turns_completed': len(results),
|
||||
'timeout_seconds': args.timeout_seconds,
|
||||
'continuation_prompt': args.continuation_prompt,
|
||||
'turns': [
|
||||
{
|
||||
'prompt': r.prompt,
|
||||
'output': r.output,
|
||||
'stop_reason': r.stop_reason,
|
||||
'cancel_observed': r.cancel_observed,
|
||||
'matched_commands': list(r.matched_commands),
|
||||
'matched_tools': list(r.matched_tools),
|
||||
}
|
||||
for r in results
|
||||
],
|
||||
'final_stop_reason': results[-1].stop_reason if results else None,
|
||||
'final_cancel_observed': results[-1].cancel_observed if results else False,
|
||||
}
|
||||
print(json.dumps(wrap_json_envelope(envelope, args.command, exit_code=loop_exit_code)))
|
||||
return loop_exit_code
|
||||
for idx, result in enumerate(results, start=1):
|
||||
print(f'## Turn {idx}')
|
||||
print(result.output)
|
||||
print(f'stop_reason={result.stop_reason}')
|
||||
return 0
|
||||
return loop_exit_code
|
||||
if args.command == 'flush-transcript':
|
||||
from pathlib import Path as _Path
|
||||
engine = QueryEnginePort.from_workspace()
|
||||
# #166: allow deterministic session IDs for claw checkpointing/replay.
|
||||
# When unset, the engine's auto-generated UUID is used (backward compat).
|
||||
if args.session_id:
|
||||
engine.session_id = args.session_id
|
||||
engine.submit_message(args.prompt)
|
||||
path = engine.persist_session()
|
||||
print(path)
|
||||
print(f'flushed={engine.transcript_store.flushed}')
|
||||
directory = _Path(args.directory) if args.directory else None
|
||||
path = engine.persist_session(directory)
|
||||
if args.output_format == 'json':
|
||||
import json as _json
|
||||
_env = {
|
||||
'session_id': engine.session_id,
|
||||
'path': path,
|
||||
'flushed': engine.transcript_store.flushed,
|
||||
'messages_count': len(engine.mutable_messages),
|
||||
'input_tokens': engine.total_usage.input_tokens,
|
||||
'output_tokens': engine.total_usage.output_tokens,
|
||||
}
|
||||
print(_json.dumps(wrap_json_envelope(_env, args.command)))
|
||||
else:
|
||||
# #166: legacy text output preserved byte-for-byte for backward compat.
|
||||
print(path)
|
||||
print(f'flushed={engine.transcript_store.flushed}')
|
||||
return 0
|
||||
if args.command == 'load-session':
|
||||
session = load_session(args.session_id)
|
||||
print(f'{session.session_id}\n{len(session.messages)} messages\nin={session.input_tokens} out={session.output_tokens}')
|
||||
from pathlib import Path as _Path
|
||||
directory = _Path(args.directory) if args.directory else None
|
||||
# #165: catch typed SessionNotFoundError + surface a JSON error envelope
|
||||
# matching the delete-session contract shape. No more raw tracebacks.
|
||||
try:
|
||||
session = load_session(args.session_id, directory)
|
||||
except SessionNotFoundError as exc:
|
||||
if args.output_format == 'json':
|
||||
import json as _json
|
||||
resolved_dir = str(directory) if directory else '.port_sessions'
|
||||
_env = {
|
||||
'session_id': args.session_id,
|
||||
'loaded': False,
|
||||
'error': {
|
||||
'kind': 'session_not_found',
|
||||
'message': str(exc),
|
||||
'directory': resolved_dir,
|
||||
'retryable': False,
|
||||
},
|
||||
}
|
||||
print(_json.dumps(wrap_json_envelope(_env, args.command, exit_code=1)))
|
||||
else:
|
||||
print(f'error: {exc}')
|
||||
return 1
|
||||
except (OSError, ValueError) as exc:
|
||||
# Corrupted session file, IO error, JSON decode error — distinct
|
||||
# from 'not found'. Callers may retry here (fs glitch).
|
||||
if args.output_format == 'json':
|
||||
import json as _json
|
||||
resolved_dir = str(directory) if directory else '.port_sessions'
|
||||
_env = {
|
||||
'session_id': args.session_id,
|
||||
'loaded': False,
|
||||
'error': {
|
||||
'kind': 'session_load_failed',
|
||||
'message': str(exc),
|
||||
'directory': resolved_dir,
|
||||
'retryable': True,
|
||||
},
|
||||
}
|
||||
print(_json.dumps(wrap_json_envelope(_env, args.command, exit_code=1)))
|
||||
else:
|
||||
print(f'error: {exc}')
|
||||
return 1
|
||||
if args.output_format == 'json':
|
||||
import json as _json
|
||||
_env = {
|
||||
'session_id': session.session_id,
|
||||
'loaded': True,
|
||||
'messages_count': len(session.messages),
|
||||
'input_tokens': session.input_tokens,
|
||||
'output_tokens': session.output_tokens,
|
||||
}
|
||||
print(_json.dumps(wrap_json_envelope(_env, args.command)))
|
||||
else:
|
||||
print(f'{session.session_id}\n{len(session.messages)} messages\nin={session.input_tokens} out={session.output_tokens}')
|
||||
return 0
|
||||
if args.command == 'list-sessions':
|
||||
from pathlib import Path as _Path
|
||||
directory = _Path(args.directory) if args.directory else None
|
||||
ids = list_sessions(directory)
|
||||
if args.output_format == 'json':
|
||||
import json as _json
|
||||
_env = {'sessions': ids, 'count': len(ids)}
|
||||
print(_json.dumps(wrap_json_envelope(_env, args.command)))
|
||||
else:
|
||||
if not ids:
|
||||
print('(no sessions)')
|
||||
else:
|
||||
for sid in ids:
|
||||
print(sid)
|
||||
return 0
|
||||
if args.command == 'delete-session':
|
||||
from pathlib import Path as _Path
|
||||
directory = _Path(args.directory) if args.directory else None
|
||||
try:
|
||||
deleted = delete_session(args.session_id, directory)
|
||||
except SessionDeleteError as exc:
|
||||
if args.output_format == 'json':
|
||||
import json as _json
|
||||
_env = {
|
||||
'session_id': args.session_id,
|
||||
'deleted': False,
|
||||
'error': {
|
||||
'kind': 'session_delete_failed',
|
||||
'message': str(exc),
|
||||
'retryable': True,
|
||||
},
|
||||
}
|
||||
print(_json.dumps(wrap_json_envelope(_env, args.command, exit_code=1)))
|
||||
else:
|
||||
print(f'error: {exc}')
|
||||
return 1
|
||||
if args.output_format == 'json':
|
||||
import json as _json
|
||||
_env = {
|
||||
'session_id': args.session_id,
|
||||
'deleted': deleted,
|
||||
'status': 'deleted' if deleted else 'not_found',
|
||||
}
|
||||
print(_json.dumps(wrap_json_envelope(_env, args.command)))
|
||||
else:
|
||||
if deleted:
|
||||
print(f'deleted: {args.session_id}')
|
||||
else:
|
||||
print(f'not found: {args.session_id}')
|
||||
# Exit 0 for both cases — delete_session is idempotent,
|
||||
# not-found is success from a cleanup perspective
|
||||
return 0
|
||||
if args.command == 'remote-mode':
|
||||
print(run_remote_mode(args.target).as_text())
|
||||
@@ -186,25 +629,123 @@ def main(argv: list[str] | None = None) -> int:
|
||||
if args.command == 'show-command':
|
||||
module = get_command(args.name)
|
||||
if module is None:
|
||||
print(f'Command not found: {args.name}')
|
||||
if args.output_format == 'json':
|
||||
import json
|
||||
error_envelope = {
|
||||
'name': args.name,
|
||||
'found': False,
|
||||
'error': {
|
||||
'kind': 'command_not_found',
|
||||
'message': f'Unknown command: {args.name}',
|
||||
'retryable': False,
|
||||
},
|
||||
}
|
||||
print(json.dumps(wrap_json_envelope(error_envelope, args.command, exit_code=1)))
|
||||
else:
|
||||
print(f'Command not found: {args.name}')
|
||||
return 1
|
||||
print('\n'.join([module.name, module.source_hint, module.responsibility]))
|
||||
if args.output_format == 'json':
|
||||
import json
|
||||
output = {
|
||||
'name': module.name,
|
||||
'found': True,
|
||||
'source_hint': module.source_hint,
|
||||
'responsibility': module.responsibility,
|
||||
}
|
||||
print(json.dumps(wrap_json_envelope(output, args.command)))
|
||||
else:
|
||||
print('\n'.join([module.name, module.source_hint, module.responsibility]))
|
||||
return 0
|
||||
if args.command == 'show-tool':
|
||||
module = get_tool(args.name)
|
||||
if module is None:
|
||||
print(f'Tool not found: {args.name}')
|
||||
if args.output_format == 'json':
|
||||
import json
|
||||
error_envelope = {
|
||||
'name': args.name,
|
||||
'found': False,
|
||||
'error': {
|
||||
'kind': 'tool_not_found',
|
||||
'message': f'Unknown tool: {args.name}',
|
||||
'retryable': False,
|
||||
},
|
||||
}
|
||||
print(json.dumps(wrap_json_envelope(error_envelope, args.command, exit_code=1)))
|
||||
else:
|
||||
print(f'Tool not found: {args.name}')
|
||||
return 1
|
||||
print('\n'.join([module.name, module.source_hint, module.responsibility]))
|
||||
if args.output_format == 'json':
|
||||
import json
|
||||
output = {
|
||||
'name': module.name,
|
||||
'found': True,
|
||||
'source_hint': module.source_hint,
|
||||
'responsibility': module.responsibility,
|
||||
}
|
||||
print(json.dumps(wrap_json_envelope(output, args.command)))
|
||||
else:
|
||||
print('\n'.join([module.name, module.source_hint, module.responsibility]))
|
||||
return 0
|
||||
if args.command == 'exec-command':
|
||||
result = execute_command(args.name, args.prompt)
|
||||
print(result.message)
|
||||
return 0 if result.handled else 1
|
||||
# #168: JSON envelope with typed not-found error
|
||||
# #181: envelope exit_code must match process exit code
|
||||
exit_code = 0 if result.handled else 1
|
||||
if args.output_format == 'json':
|
||||
import json
|
||||
if not result.handled:
|
||||
envelope = {
|
||||
'name': args.name,
|
||||
'prompt': args.prompt,
|
||||
'handled': False,
|
||||
'error': {
|
||||
'kind': 'command_not_found',
|
||||
'message': result.message,
|
||||
'retryable': False,
|
||||
},
|
||||
}
|
||||
else:
|
||||
envelope = {
|
||||
'name': result.name,
|
||||
'prompt': result.prompt,
|
||||
'source_hint': result.source_hint,
|
||||
'handled': True,
|
||||
'message': result.message,
|
||||
}
|
||||
print(json.dumps(wrap_json_envelope(envelope, args.command, exit_code=exit_code)))
|
||||
else:
|
||||
print(result.message)
|
||||
return exit_code
|
||||
if args.command == 'exec-tool':
|
||||
result = execute_tool(args.name, args.payload)
|
||||
print(result.message)
|
||||
return 0 if result.handled else 1
|
||||
# #168: JSON envelope with typed not-found error
|
||||
# #181: envelope exit_code must match process exit code
|
||||
exit_code = 0 if result.handled else 1
|
||||
if args.output_format == 'json':
|
||||
import json
|
||||
if not result.handled:
|
||||
envelope = {
|
||||
'name': args.name,
|
||||
'payload': args.payload,
|
||||
'handled': False,
|
||||
'error': {
|
||||
'kind': 'tool_not_found',
|
||||
'message': result.message,
|
||||
'retryable': False,
|
||||
},
|
||||
}
|
||||
else:
|
||||
envelope = {
|
||||
'name': result.name,
|
||||
'payload': result.payload,
|
||||
'source_hint': result.source_hint,
|
||||
'handled': True,
|
||||
'message': result.message,
|
||||
}
|
||||
print(json.dumps(wrap_json_envelope(envelope, args.command, exit_code=exit_code)))
|
||||
else:
|
||||
print(result.message)
|
||||
return exit_code
|
||||
parser.error(f'unknown command: {args.command}')
|
||||
return 2
|
||||
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import threading
|
||||
from dataclasses import dataclass, field
|
||||
from uuid import uuid4
|
||||
|
||||
@@ -30,6 +31,7 @@ class TurnResult:
|
||||
permission_denials: tuple[PermissionDenial, ...]
|
||||
usage: UsageSummary
|
||||
stop_reason: str
|
||||
cancel_observed: bool = False
|
||||
|
||||
|
||||
@dataclass
|
||||
@@ -64,7 +66,59 @@ class QueryEnginePort:
|
||||
matched_commands: tuple[str, ...] = (),
|
||||
matched_tools: tuple[str, ...] = (),
|
||||
denied_tools: tuple[PermissionDenial, ...] = (),
|
||||
cancel_event: threading.Event | None = None,
|
||||
) -> TurnResult:
|
||||
"""Submit a prompt and return a TurnResult.
|
||||
|
||||
#164 Stage A: cooperative cancellation via cancel_event.
|
||||
|
||||
The cancel_event argument (added for #164) lets a caller request early
|
||||
termination at a safe point. When set before the pre-mutation commit
|
||||
stage, submit_message returns early with ``stop_reason='cancelled'``
|
||||
and the engine's state (mutable_messages, transcript_store,
|
||||
permission_denials, total_usage) is left **exactly as it was on
|
||||
entry**. This closes the #161 follow-up gap: before this change, a
|
||||
wedged provider thread could finish executing and silently mutate
|
||||
state after the caller had already observed ``stop_reason='timeout'``,
|
||||
giving the session a ghost turn the caller never acknowledged.
|
||||
|
||||
Contract:
|
||||
- cancel_event is None (default) — legacy behaviour, no checks.
|
||||
- cancel_event set **before** budget check — returns 'cancelled'
|
||||
immediately; no output synthesis, no projection, no mutation.
|
||||
- cancel_event set **between** budget check and commit — returns
|
||||
'cancelled' with state intact.
|
||||
- cancel_event set **after** commit — not observable; the turn is
|
||||
already committed and the caller sees 'completed'. Cancellation
|
||||
is a *safe point* mechanism, not preemption. This is the honest
|
||||
limit of cooperative cancellation in Python threading land.
|
||||
|
||||
Stop reason taxonomy after #164 Stage A:
|
||||
- 'completed' — turn committed, state mutated exactly once
|
||||
- 'max_budget_reached' — overflow, state unchanged (#162)
|
||||
- 'max_turns_reached' — capacity exceeded, state unchanged
|
||||
- 'cancelled' — cancel_event observed, state unchanged
|
||||
- 'timeout' — synthesised by runtime, not engine (#161)
|
||||
|
||||
Callers that care about deadline-driven cancellation (run_turn_loop)
|
||||
can now request cleanup by setting the event on timeout — the next
|
||||
submit_message on the same engine will observe it at the start and
|
||||
return 'cancelled' without touching state, even if the previous call
|
||||
is still wedged in provider IO.
|
||||
"""
|
||||
# #164 Stage A: earliest safe cancellation point. No output synthesis,
|
||||
# no budget projection, no mutation — just an immediate clean return.
|
||||
if cancel_event is not None and cancel_event.is_set():
|
||||
return TurnResult(
|
||||
prompt=prompt,
|
||||
output='',
|
||||
matched_commands=matched_commands,
|
||||
matched_tools=matched_tools,
|
||||
permission_denials=denied_tools,
|
||||
usage=self.total_usage, # unchanged
|
||||
stop_reason='cancelled',
|
||||
)
|
||||
|
||||
if len(self.mutable_messages) >= self.config.max_turns:
|
||||
output = f'Max turns reached before processing prompt: {prompt}'
|
||||
return TurnResult(
|
||||
@@ -85,9 +139,40 @@ class QueryEnginePort:
|
||||
]
|
||||
output = self._format_output(summary_lines)
|
||||
projected_usage = self.total_usage.add_turn(prompt, output)
|
||||
stop_reason = 'completed'
|
||||
|
||||
# #162: budget check must precede mutation. Previously this block set
|
||||
# stop_reason='max_budget_reached' but still appended the overflow turn
|
||||
# to mutable_messages / transcript_store / permission_denials, corrupting
|
||||
# the session for any caller that persisted it afterwards. The overflow
|
||||
# prompt was effectively committed even though the TurnResult signalled
|
||||
# rejection. Now we early-return with pre-mutation state intact so
|
||||
# callers can safely retry with a smaller prompt or a fresh budget.
|
||||
if projected_usage.input_tokens + projected_usage.output_tokens > self.config.max_budget_tokens:
|
||||
stop_reason = 'max_budget_reached'
|
||||
return TurnResult(
|
||||
prompt=prompt,
|
||||
output=output,
|
||||
matched_commands=matched_commands,
|
||||
matched_tools=matched_tools,
|
||||
permission_denials=denied_tools,
|
||||
usage=self.total_usage, # unchanged — overflow turn was rejected
|
||||
stop_reason='max_budget_reached',
|
||||
)
|
||||
|
||||
# #164 Stage A: second safe cancellation point. Projection is done
|
||||
# but nothing has been committed yet. If the caller cancelled while
|
||||
# we were building output / computing budget, honour it here — still
|
||||
# no mutation.
|
||||
if cancel_event is not None and cancel_event.is_set():
|
||||
return TurnResult(
|
||||
prompt=prompt,
|
||||
output=output,
|
||||
matched_commands=matched_commands,
|
||||
matched_tools=matched_tools,
|
||||
permission_denials=denied_tools,
|
||||
usage=self.total_usage, # unchanged
|
||||
stop_reason='cancelled',
|
||||
)
|
||||
|
||||
self.mutable_messages.append(prompt)
|
||||
self.transcript_store.append(prompt)
|
||||
self.permission_denials.extend(denied_tools)
|
||||
@@ -100,7 +185,7 @@ class QueryEnginePort:
|
||||
matched_tools=matched_tools,
|
||||
permission_denials=denied_tools,
|
||||
usage=self.total_usage,
|
||||
stop_reason=stop_reason,
|
||||
stop_reason='completed',
|
||||
)
|
||||
|
||||
def stream_submit_message(
|
||||
@@ -137,7 +222,19 @@ class QueryEnginePort:
|
||||
def flush_transcript(self) -> None:
|
||||
self.transcript_store.flush()
|
||||
|
||||
def persist_session(self) -> str:
|
||||
def persist_session(self, directory: 'Path | None' = None) -> str:
|
||||
"""Flush the transcript and save the session to disk.
|
||||
|
||||
Args:
|
||||
directory: Optional override for the storage directory. When None
|
||||
(default, for backward compat), uses the default location
|
||||
(``.port_sessions`` in CWD). When set, passes through to
|
||||
``save_session`` which already supports directory overrides.
|
||||
|
||||
#166: added directory parameter to match the session-lifecycle CLI
|
||||
surface established by #160/#165. Claws running out-of-tree can now
|
||||
redirect session creation to a workspace-specific dir without chdir.
|
||||
"""
|
||||
self.flush_transcript()
|
||||
path = save_session(
|
||||
StoredSession(
|
||||
@@ -145,7 +242,8 @@ class QueryEnginePort:
|
||||
messages=tuple(self.mutable_messages),
|
||||
input_tokens=self.total_usage.input_tokens,
|
||||
output_tokens=self.total_usage.output_tokens,
|
||||
)
|
||||
),
|
||||
directory,
|
||||
)
|
||||
return str(path)
|
||||
|
||||
|
||||
159
src/runtime.py
159
src/runtime.py
@@ -1,11 +1,14 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import threading
|
||||
import time
|
||||
from concurrent.futures import ThreadPoolExecutor, TimeoutError as FuturesTimeoutError
|
||||
from dataclasses import dataclass
|
||||
|
||||
from .commands import PORTED_COMMANDS
|
||||
from .context import PortContext, build_port_context, render_context
|
||||
from .history import HistoryLog
|
||||
from .models import PermissionDenial, PortingModule
|
||||
from .models import PermissionDenial, PortingModule, UsageSummary
|
||||
from .query_engine import QueryEngineConfig, QueryEnginePort, TurnResult
|
||||
from .setup import SetupReport, WorkspaceSetup, run_setup
|
||||
from .system_init import build_system_init_message
|
||||
@@ -151,21 +154,161 @@ class PortRuntime:
|
||||
persisted_session_path=persisted_session_path,
|
||||
)
|
||||
|
||||
def run_turn_loop(self, prompt: str, limit: int = 5, max_turns: int = 3, structured_output: bool = False) -> list[TurnResult]:
|
||||
def run_turn_loop(
|
||||
self,
|
||||
prompt: str,
|
||||
limit: int = 5,
|
||||
max_turns: int = 3,
|
||||
structured_output: bool = False,
|
||||
timeout_seconds: float | None = None,
|
||||
continuation_prompt: str | None = None,
|
||||
) -> list[TurnResult]:
|
||||
"""Run a multi-turn engine loop with optional wall-clock deadline.
|
||||
|
||||
Args:
|
||||
prompt: The initial prompt to submit.
|
||||
limit: Match routing limit.
|
||||
max_turns: Maximum number of turns before stopping.
|
||||
structured_output: Whether to request structured output.
|
||||
timeout_seconds: Total wall-clock budget across all turns. When the
|
||||
budget is exhausted mid-turn, a synthetic TurnResult with
|
||||
``stop_reason='timeout'`` is appended and the loop exits.
|
||||
``None`` (default) preserves legacy unbounded behaviour.
|
||||
continuation_prompt: What to send on turns after the first. When
|
||||
``None`` (default, #163), the loop stops after turn 0 and the
|
||||
caller decides how to continue. When set, the same text is
|
||||
submitted for every turn after the first, giving claws a clean
|
||||
hook for structured follow-ups (e.g. ``"Continue."``, a
|
||||
routing-planner instruction, or a tool-output cue). Previously
|
||||
the loop silently appended ``" [turn N]"`` to the original
|
||||
prompt, polluting the transcript with harness-generated
|
||||
annotation the model had no way to interpret.
|
||||
|
||||
Returns:
|
||||
A list of TurnResult objects. The final entry's ``stop_reason``
|
||||
distinguishes ``'completed'``, ``'max_turns_reached'``,
|
||||
``'max_budget_reached'``, or ``'timeout'``.
|
||||
|
||||
#161: prior to this change a hung ``engine.submit_message`` call would
|
||||
block the loop indefinitely with no cancellation path, forcing claws to
|
||||
rely on external watchdogs or OS-level kills. Callers can now enforce a
|
||||
deadline and receive a typed timeout signal instead.
|
||||
|
||||
#163: the old ``f'{prompt} [turn {turn + 1}]'`` suffix was never
|
||||
interpreted by the engine or any system prompt. It looked like a real
|
||||
user turn in ``mutable_messages`` and the transcript, making replay and
|
||||
analysis fragile. Removed entirely; callers supply ``continuation_prompt``
|
||||
for meaningful follow-ups or let the loop stop after turn 0.
|
||||
"""
|
||||
engine = QueryEnginePort.from_workspace()
|
||||
engine.config = QueryEngineConfig(max_turns=max_turns, structured_output=structured_output)
|
||||
matches = self.route_prompt(prompt, limit=limit)
|
||||
command_names = tuple(match.name for match in matches if match.kind == 'command')
|
||||
tool_names = tuple(match.name for match in matches if match.kind == 'tool')
|
||||
# #159: infer permission denials from the routed matches, not hardcoded empty tuple.
|
||||
# Multi-turn sessions must have the same security posture as bootstrap_session.
|
||||
denied_tools = tuple(self._infer_permission_denials(matches))
|
||||
results: list[TurnResult] = []
|
||||
for turn in range(max_turns):
|
||||
turn_prompt = prompt if turn == 0 else f'{prompt} [turn {turn + 1}]'
|
||||
result = engine.submit_message(turn_prompt, command_names, tool_names, ())
|
||||
results.append(result)
|
||||
if result.stop_reason != 'completed':
|
||||
break
|
||||
deadline = time.monotonic() + timeout_seconds if timeout_seconds is not None else None
|
||||
# #164 Stage A: shared cancel_event signals cooperative cancellation
|
||||
# across turns. On timeout we set() it so any still-running
|
||||
# submit_message call (or the next one on the same engine) observes
|
||||
# the cancel at a safe checkpoint and returns stop_reason='cancelled'
|
||||
# without mutating state. This closes the window where a wedged
|
||||
# provider thread could commit a ghost turn after the caller saw
|
||||
# 'timeout'.
|
||||
cancel_event = threading.Event() if deadline is not None else None
|
||||
|
||||
# ThreadPoolExecutor is reused across turns so we cancel cleanly on exit.
|
||||
executor = ThreadPoolExecutor(max_workers=1) if deadline is not None else None
|
||||
try:
|
||||
for turn in range(max_turns):
|
||||
# #163: no more f'{prompt} [turn N]' suffix injection.
|
||||
# On turn 0 submit the original prompt.
|
||||
# On turn > 0, submit the caller-supplied continuation prompt;
|
||||
# if the caller did not supply one, stop the loop cleanly instead
|
||||
# of fabricating a fake user turn.
|
||||
if turn == 0:
|
||||
turn_prompt = prompt
|
||||
elif continuation_prompt is not None:
|
||||
turn_prompt = continuation_prompt
|
||||
else:
|
||||
break
|
||||
|
||||
if deadline is None:
|
||||
# Legacy path: unbounded call, preserves existing behaviour exactly.
|
||||
# #159: pass inferred denied_tools (no longer hardcoded empty tuple)
|
||||
# #164: cancel_event is None on this path; submit_message skips
|
||||
# cancellation checks entirely (legacy zero-overhead behaviour).
|
||||
result = engine.submit_message(turn_prompt, command_names, tool_names, denied_tools)
|
||||
else:
|
||||
remaining = deadline - time.monotonic()
|
||||
if remaining <= 0:
|
||||
# #164: signal cancel for any in-flight/future submit_message
|
||||
# calls that share this engine. Safe because nothing has been
|
||||
# submitted yet this turn.
|
||||
assert cancel_event is not None
|
||||
cancel_event.set()
|
||||
results.append(self._build_timeout_result(
|
||||
turn_prompt, command_names, tool_names,
|
||||
cancel_observed=cancel_event.is_set()
|
||||
))
|
||||
break
|
||||
assert executor is not None
|
||||
future = executor.submit(
|
||||
engine.submit_message, turn_prompt, command_names, tool_names,
|
||||
denied_tools, cancel_event,
|
||||
)
|
||||
try:
|
||||
result = future.result(timeout=remaining)
|
||||
except FuturesTimeoutError:
|
||||
# #164 Stage A: explicitly signal cancel to the still-running
|
||||
# submit_message thread. The next time it hits a checkpoint
|
||||
# (entry or post-budget), it returns 'cancelled' without
|
||||
# mutating state instead of committing a ghost turn. This
|
||||
# upgrades #161's best-effort future.cancel() (which only
|
||||
# cancels pre-start futures) to cooperative mid-flight cancel.
|
||||
assert cancel_event is not None
|
||||
cancel_event.set()
|
||||
future.cancel()
|
||||
results.append(self._build_timeout_result(
|
||||
turn_prompt, command_names, tool_names,
|
||||
cancel_observed=cancel_event.is_set()
|
||||
))
|
||||
break
|
||||
|
||||
results.append(result)
|
||||
if result.stop_reason != 'completed':
|
||||
break
|
||||
finally:
|
||||
if executor is not None:
|
||||
# wait=False: don't let a hung thread block loop exit indefinitely.
|
||||
# The thread will be reaped when the interpreter shuts down or when
|
||||
# the engine call eventually returns.
|
||||
executor.shutdown(wait=False)
|
||||
return results
|
||||
|
||||
@staticmethod
|
||||
def _build_timeout_result(
|
||||
prompt: str,
|
||||
command_names: tuple[str, ...],
|
||||
tool_names: tuple[str, ...],
|
||||
cancel_observed: bool = False,
|
||||
) -> TurnResult:
|
||||
"""Synthesize a TurnResult representing a wall-clock timeout (#161).
|
||||
#164 Stage B: cancel_observed signals cancellation event was set.
|
||||
"""
|
||||
return TurnResult(
|
||||
prompt=prompt,
|
||||
output='Wall-clock timeout exceeded before turn completed.',
|
||||
matched_commands=command_names,
|
||||
matched_tools=tool_names,
|
||||
permission_denials=(),
|
||||
usage=UsageSummary(),
|
||||
stop_reason='timeout',
|
||||
cancel_observed=cancel_observed,
|
||||
)
|
||||
|
||||
def _infer_permission_denials(self, matches: list[RoutedMatch]) -> list[PermissionDenial]:
|
||||
denials: list[PermissionDenial] = []
|
||||
for match in matches:
|
||||
|
||||
@@ -26,10 +26,96 @@ def save_session(session: StoredSession, directory: Path | None = None) -> Path:
|
||||
|
||||
def load_session(session_id: str, directory: Path | None = None) -> StoredSession:
|
||||
target_dir = directory or DEFAULT_SESSION_DIR
|
||||
data = json.loads((target_dir / f'{session_id}.json').read_text())
|
||||
try:
|
||||
data = json.loads((target_dir / f'{session_id}.json').read_text())
|
||||
except FileNotFoundError:
|
||||
raise SessionNotFoundError(f'session {session_id!r} not found in {target_dir}') from None
|
||||
return StoredSession(
|
||||
session_id=data['session_id'],
|
||||
messages=tuple(data['messages']),
|
||||
input_tokens=data['input_tokens'],
|
||||
output_tokens=data['output_tokens'],
|
||||
)
|
||||
|
||||
|
||||
class SessionNotFoundError(KeyError):
|
||||
"""Raised when a session does not exist in the store."""
|
||||
pass
|
||||
|
||||
|
||||
def list_sessions(directory: Path | None = None) -> list[str]:
|
||||
"""List all stored session IDs in the target directory.
|
||||
|
||||
Args:
|
||||
directory: Target session directory. Defaults to DEFAULT_SESSION_DIR.
|
||||
|
||||
Returns:
|
||||
Sorted list of session IDs (JSON filenames without .json extension).
|
||||
"""
|
||||
target_dir = directory or DEFAULT_SESSION_DIR
|
||||
if not target_dir.exists():
|
||||
return []
|
||||
return sorted(p.stem for p in target_dir.glob('*.json'))
|
||||
|
||||
|
||||
def session_exists(session_id: str, directory: Path | None = None) -> bool:
|
||||
"""Check if a session exists without raising an error.
|
||||
|
||||
Args:
|
||||
session_id: The session ID to check.
|
||||
directory: Target session directory. Defaults to DEFAULT_SESSION_DIR.
|
||||
|
||||
Returns:
|
||||
True if the session file exists, False otherwise.
|
||||
"""
|
||||
target_dir = directory or DEFAULT_SESSION_DIR
|
||||
return (target_dir / f'{session_id}.json').exists()
|
||||
|
||||
|
||||
class SessionDeleteError(OSError):
|
||||
"""Raised when a session file exists but cannot be removed (permission, IO error).
|
||||
|
||||
Distinct from SessionNotFoundError: this means the session was present but
|
||||
deletion failed mid-operation. Callers can retry or escalate.
|
||||
"""
|
||||
pass
|
||||
|
||||
|
||||
def delete_session(session_id: str, directory: Path | None = None) -> bool:
|
||||
"""Delete a session file from the store.
|
||||
|
||||
Contract:
|
||||
- **Idempotent**: `delete_session(x)` followed by `delete_session(x)` is safe.
|
||||
Second call returns False (not found), does not raise.
|
||||
- **Race-safe**: Uses `missing_ok=True` on unlink to avoid TOCTOU between
|
||||
exists-check and unlink. Concurrent deletion by another process is
|
||||
treated as a no-op success (returns False for the losing caller).
|
||||
- **Partial-failure surfaced**: If the file exists but cannot be removed
|
||||
(permission denied, filesystem error, directory instead of file), raises
|
||||
`SessionDeleteError` wrapping the underlying OSError. The session store
|
||||
may be in an inconsistent state; caller should retry or escalate.
|
||||
|
||||
Args:
|
||||
session_id: The session ID to delete.
|
||||
directory: Target session directory. Defaults to DEFAULT_SESSION_DIR.
|
||||
|
||||
Returns:
|
||||
True if this call deleted the session file.
|
||||
False if the session did not exist (either never existed or was already deleted).
|
||||
|
||||
Raises:
|
||||
SessionDeleteError: if the session existed but deletion failed.
|
||||
"""
|
||||
target_dir = directory or DEFAULT_SESSION_DIR
|
||||
path = target_dir / f'{session_id}.json'
|
||||
try:
|
||||
# Python 3.8+: missing_ok=True avoids TOCTOU race
|
||||
path.unlink(missing_ok=False)
|
||||
return True
|
||||
except FileNotFoundError:
|
||||
# Either never existed or was concurrently deleted — both are no-ops
|
||||
return False
|
||||
except (PermissionError, IsADirectoryError, OSError) as exc:
|
||||
raise SessionDeleteError(
|
||||
f'session {session_id!r} exists in {target_dir} but could not be deleted: {exc}'
|
||||
) from exc
|
||||
|
||||
199
tests/test_cancel_observed_field.py
Normal file
199
tests/test_cancel_observed_field.py
Normal file
@@ -0,0 +1,199 @@
|
||||
"""#164 Stage B — cancel_observed field coverage.
|
||||
|
||||
Validates that the TurnResult.cancel_observed field correctly signals
|
||||
whether cancellation was observed during turn execution.
|
||||
|
||||
Test coverage:
|
||||
1. Normal completion: cancel_observed=False (no timeout occurred)
|
||||
2. Timeout with cancel signaled: cancel_observed=True
|
||||
3. bootstrap JSON output exposes the field
|
||||
4. turn-loop JSON output exposes cancel_observed per turn
|
||||
5. Safe-to-reuse: after timeout with cancel_observed=True,
|
||||
engine can accept fresh messages without state corruption
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
from src.query_engine import QueryEnginePort, TurnResult
|
||||
from src.runtime import PortRuntime
|
||||
|
||||
|
||||
CLI = [sys.executable, '-m', 'src.main']
|
||||
REPO_ROOT = Path(__file__).resolve().parent.parent
|
||||
|
||||
|
||||
class TestCancelObservedField:
|
||||
"""TurnResult.cancel_observed correctly signals cancellation observation."""
|
||||
|
||||
def test_default_value_is_false(self) -> None:
|
||||
"""New TurnResult defaults to cancel_observed=False (backward compat)."""
|
||||
from src.models import UsageSummary
|
||||
result = TurnResult(
|
||||
prompt='test',
|
||||
output='ok',
|
||||
matched_commands=(),
|
||||
matched_tools=(),
|
||||
permission_denials=(),
|
||||
usage=UsageSummary(),
|
||||
stop_reason='completed',
|
||||
)
|
||||
assert result.cancel_observed is False
|
||||
|
||||
def test_explicit_true_preserved(self) -> None:
|
||||
"""cancel_observed=True is preserved through construction."""
|
||||
from src.models import UsageSummary
|
||||
result = TurnResult(
|
||||
prompt='test',
|
||||
output='timed out',
|
||||
matched_commands=(),
|
||||
matched_tools=(),
|
||||
permission_denials=(),
|
||||
usage=UsageSummary(),
|
||||
stop_reason='timeout',
|
||||
cancel_observed=True,
|
||||
)
|
||||
assert result.cancel_observed is True
|
||||
|
||||
def test_normal_completion_cancel_observed_false(self) -> None:
|
||||
"""Normal turn completion → cancel_observed=False."""
|
||||
runtime = PortRuntime()
|
||||
results = runtime.run_turn_loop('hello', max_turns=1)
|
||||
assert len(results) >= 1
|
||||
assert results[0].cancel_observed is False
|
||||
|
||||
def test_bootstrap_json_includes_cancel_observed(self) -> None:
|
||||
"""bootstrap JSON envelope includes cancel_observed in turn result."""
|
||||
result = subprocess.run(
|
||||
CLI + ['bootstrap', 'hello', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 0
|
||||
envelope = json.loads(result.stdout)
|
||||
assert 'turn' in envelope
|
||||
assert 'cancel_observed' in envelope['turn'], (
|
||||
f"bootstrap turn must include cancel_observed (SCHEMAS.md contract). "
|
||||
f"Got keys: {list(envelope['turn'].keys())}"
|
||||
)
|
||||
# Normal completion → False
|
||||
assert envelope['turn']['cancel_observed'] is False
|
||||
|
||||
def test_turn_loop_json_per_turn_cancel_observed(self) -> None:
|
||||
"""turn-loop JSON envelope includes cancel_observed per turn (#164 Stage B closure)."""
|
||||
result = subprocess.run(
|
||||
CLI + ['turn-loop', 'hello', '--max-turns', '1', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 0, f"stderr: {result.stderr}"
|
||||
envelope = json.loads(result.stdout)
|
||||
# Common fields from wrap_json_envelope
|
||||
assert envelope['command'] == 'turn-loop'
|
||||
assert envelope['schema_version'] == '1.0'
|
||||
# Turn-loop-specific fields
|
||||
assert 'turns' in envelope
|
||||
assert len(envelope['turns']) >= 1
|
||||
for idx, turn in enumerate(envelope['turns']):
|
||||
assert 'cancel_observed' in turn, (
|
||||
f"Turn {idx} missing cancel_observed: {list(turn.keys())}"
|
||||
)
|
||||
# final_cancel_observed convenience field
|
||||
assert 'final_cancel_observed' in envelope
|
||||
assert isinstance(envelope['final_cancel_observed'], bool)
|
||||
|
||||
|
||||
class TestCancelObservedSafeReuseSemantics:
|
||||
"""After timeout with cancel_observed=True, engine state is safe to reuse."""
|
||||
|
||||
def test_timeout_result_cancel_observed_true_when_signaled(self) -> None:
|
||||
"""#164 Stage B: timeout path passes cancel_event.is_set() to result."""
|
||||
# Force a timeout with max_turns=3 and timeout=0.0001 (instant)
|
||||
runtime = PortRuntime()
|
||||
results = runtime.run_turn_loop(
|
||||
'hello', max_turns=3, timeout_seconds=0.0001,
|
||||
continuation_prompt='keep going',
|
||||
)
|
||||
# Last result should be timeout (pre-start path since timeout is instant)
|
||||
assert results, 'timeout path should still produce a result'
|
||||
last = results[-1]
|
||||
assert last.stop_reason == 'timeout'
|
||||
# cancel_observed=True because the timeout path explicitly sets cancel_event
|
||||
assert last.cancel_observed is True, (
|
||||
f"timeout path must signal cancel_observed=True; got {last.cancel_observed}. "
|
||||
f"stop_reason={last.stop_reason}"
|
||||
)
|
||||
|
||||
def test_engine_messages_not_corrupted_by_timeout(self) -> None:
|
||||
"""After timeout with cancel_observed, engine.mutable_messages is consistent.
|
||||
|
||||
#164 Stage B contract: safe-to-reuse means after a timeout-with-cancel,
|
||||
the engine has not committed a ghost turn and can accept fresh input.
|
||||
"""
|
||||
engine = QueryEnginePort.from_workspace()
|
||||
# Track initial state
|
||||
initial_message_count = len(engine.mutable_messages)
|
||||
|
||||
# Simulate a direct submit_message call with cancellation
|
||||
import threading
|
||||
cancel_event = threading.Event()
|
||||
cancel_event.set() # Pre-set: first checkpoint fires
|
||||
result = engine.submit_message(
|
||||
'test', ('cmd1',), ('tool1',),
|
||||
denied_tools=(), cancel_event=cancel_event,
|
||||
)
|
||||
|
||||
# Cancelled turn should not commit mutation
|
||||
assert result.stop_reason == 'cancelled', (
|
||||
f"expected cancelled; got {result.stop_reason}"
|
||||
)
|
||||
# mutable_messages should not have grown
|
||||
assert len(engine.mutable_messages) == initial_message_count, (
|
||||
f"engine.mutable_messages grew after cancelled turn "
|
||||
f"(was {initial_message_count}, now {len(engine.mutable_messages)})"
|
||||
)
|
||||
|
||||
# Engine should accept a fresh message now
|
||||
fresh = engine.submit_message('fresh prompt', ('cmd1',), ('tool1',))
|
||||
assert fresh.stop_reason in ('completed', 'max_budget_reached'), (
|
||||
f"expected engine reusable; got {fresh.stop_reason}"
|
||||
)
|
||||
|
||||
|
||||
class TestCancelObservedSchemaCompliance:
|
||||
"""SCHEMAS.md contract for cancel_observed field."""
|
||||
|
||||
def test_cancel_observed_is_bool_not_nullable(self) -> None:
|
||||
"""cancel_observed is always bool (never null/missing) per SCHEMAS.md."""
|
||||
result = subprocess.run(
|
||||
CLI + ['bootstrap', 'test', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
envelope = json.loads(result.stdout)
|
||||
cancel_observed = envelope['turn']['cancel_observed']
|
||||
assert isinstance(cancel_observed, bool), (
|
||||
f"cancel_observed must be bool; got {type(cancel_observed)}"
|
||||
)
|
||||
|
||||
def test_turn_loop_envelope_has_final_cancel_observed(self) -> None:
|
||||
"""turn-loop JSON exposes final_cancel_observed convenience field."""
|
||||
result = subprocess.run(
|
||||
CLI + ['turn-loop', 'test', '--max-turns', '1', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 0
|
||||
envelope = json.loads(result.stdout)
|
||||
assert 'final_cancel_observed' in envelope
|
||||
assert isinstance(envelope['final_cancel_observed'], bool)
|
||||
333
tests/test_cli_parity_audit.py
Normal file
333
tests/test_cli_parity_audit.py
Normal file
@@ -0,0 +1,333 @@
|
||||
"""Cross-surface CLI parity audit (ROADMAP #171).
|
||||
|
||||
Prevents future drift of the unified JSON envelope contract across
|
||||
claw-code's CLI surface. Instead of requiring humans to notice when
|
||||
a new command skips --output-format, this test introspects the parser
|
||||
at runtime and verifies every command in the declared clawable-surface
|
||||
list supports --output-format {text,json}.
|
||||
|
||||
When a new clawable-surface command is added:
|
||||
1. Implement --output-format on the subparser (normal feature work).
|
||||
2. Add the command name to CLAWABLE_SURFACES below.
|
||||
3. This test passes automatically.
|
||||
|
||||
When a developer adds a new clawable-surface command but forgets
|
||||
--output-format, the test fails with a concrete message pointing at
|
||||
the missing flag. Claws no longer need to eyeball parity; the contract
|
||||
is enforced at test time.
|
||||
|
||||
Three classes of commands:
|
||||
- CLAWABLE_SURFACES: MUST accept --output-format (inspect/lifecycle/exec/diagnostic)
|
||||
- OPT_OUT_SURFACES: explicitly exempt (simulation/mode commands, human-first diagnostic)
|
||||
- Any command in parser not listed in either: test FAILS with classification request
|
||||
|
||||
This is operationalised parity — a machine-first CLI enforced by a
|
||||
machine-first test.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
from src.main import build_parser # noqa: E402
|
||||
|
||||
|
||||
# Commands that MUST accept --output-format {text,json}.
|
||||
# These are the machine-first surfaces — session lifecycle, execution,
|
||||
# inspect, diagnostic inventory.
|
||||
CLAWABLE_SURFACES = frozenset({
|
||||
# Session lifecycle (#160, #165, #166)
|
||||
'list-sessions',
|
||||
'delete-session',
|
||||
'load-session',
|
||||
'flush-transcript',
|
||||
# Inspect (#167)
|
||||
'show-command',
|
||||
'show-tool',
|
||||
# Execution/work-verb (#168)
|
||||
'exec-command',
|
||||
'exec-tool',
|
||||
'route',
|
||||
'bootstrap',
|
||||
# Diagnostic inventory (#169, #170)
|
||||
'command-graph',
|
||||
'tool-pool',
|
||||
'bootstrap-graph',
|
||||
# Turn-loop with JSON output (#164 Stage B, #174)
|
||||
'turn-loop',
|
||||
})
|
||||
|
||||
# Commands explicitly exempt from --output-format requirement.
|
||||
# Rationale must be explicit — either the command is human-first
|
||||
# (rich Markdown docs/reports), simulation-only, or has a dedicated
|
||||
# JSON mode flag under a different name.
|
||||
OPT_OUT_SURFACES = frozenset({
|
||||
# Rich-Markdown report commands (planned future: JSON schema)
|
||||
'summary', # full workspace summary (Markdown)
|
||||
'manifest', # workspace manifest (Markdown)
|
||||
'parity-audit', # TypeScript archive comparison (Markdown)
|
||||
'setup-report', # startup/prefetch report (Markdown)
|
||||
# List commands with their own query/filter surface (not JSON yet)
|
||||
'subsystems', # use --limit
|
||||
'commands', # use --query / --limit / --no-plugin-commands
|
||||
'tools', # use --query / --limit / --simple-mode
|
||||
# Simulation/debug surfaces (not claw-orchestrated)
|
||||
'remote-mode',
|
||||
'ssh-mode',
|
||||
'teleport-mode',
|
||||
'direct-connect-mode',
|
||||
'deep-link-mode',
|
||||
})
|
||||
|
||||
|
||||
def _discover_subcommands_and_flags() -> dict[str, frozenset[str]]:
|
||||
"""Introspect the argparse tree to discover every subcommand and its flags.
|
||||
|
||||
Returns:
|
||||
{subcommand_name: frozenset of option strings including --output-format
|
||||
if registered}
|
||||
"""
|
||||
parser = build_parser()
|
||||
subcommand_flags: dict[str, frozenset[str]] = {}
|
||||
for action in parser._actions:
|
||||
if not hasattr(action, 'choices') or not action.choices:
|
||||
continue
|
||||
if action.dest != 'command':
|
||||
continue
|
||||
for name, subp in action.choices.items():
|
||||
flags: set[str] = set()
|
||||
for a in subp._actions:
|
||||
if a.option_strings:
|
||||
flags.update(a.option_strings)
|
||||
subcommand_flags[name] = frozenset(flags)
|
||||
return subcommand_flags
|
||||
|
||||
|
||||
class TestClawableSurfaceParity:
|
||||
"""Every clawable-surface command MUST accept --output-format {text,json}.
|
||||
|
||||
This is the invariant that codifies 'claws can treat the CLI as a
|
||||
unified protocol without special-casing'.
|
||||
"""
|
||||
|
||||
def test_all_clawable_surfaces_accept_output_format(self) -> None:
|
||||
"""All commands in CLAWABLE_SURFACES must have --output-format registered."""
|
||||
subcommand_flags = _discover_subcommands_and_flags()
|
||||
missing = []
|
||||
for cmd in CLAWABLE_SURFACES:
|
||||
if cmd not in subcommand_flags:
|
||||
missing.append(f'{cmd}: not registered in parser')
|
||||
elif '--output-format' not in subcommand_flags[cmd]:
|
||||
missing.append(f'{cmd}: missing --output-format flag')
|
||||
assert not missing, (
|
||||
'Clawable-surface parity violation. Every command in '
|
||||
'CLAWABLE_SURFACES must accept --output-format. Failures:\n'
|
||||
+ '\n'.join(f' - {m}' for m in missing)
|
||||
)
|
||||
|
||||
@pytest.mark.parametrize('cmd_name', sorted(CLAWABLE_SURFACES))
|
||||
def test_clawable_surface_output_format_choices(self, cmd_name: str) -> None:
|
||||
"""Every clawable surface must accept exactly {text, json} choices."""
|
||||
parser = build_parser()
|
||||
for action in parser._actions:
|
||||
if not hasattr(action, 'choices') or not action.choices:
|
||||
continue
|
||||
if action.dest != 'command':
|
||||
continue
|
||||
if cmd_name not in action.choices:
|
||||
continue
|
||||
subp = action.choices[cmd_name]
|
||||
for a in subp._actions:
|
||||
if '--output-format' in a.option_strings:
|
||||
assert a.choices == ['text', 'json'], (
|
||||
f'{cmd_name}: --output-format choices are {a.choices}, '
|
||||
f'expected [text, json]'
|
||||
)
|
||||
assert a.default == 'text', (
|
||||
f'{cmd_name}: --output-format default is {a.default!r}, '
|
||||
f'expected \'text\' for backward compat'
|
||||
)
|
||||
return
|
||||
pytest.fail(f'{cmd_name}: no --output-format flag found')
|
||||
|
||||
|
||||
class TestCommandClassificationCoverage:
|
||||
"""Every registered subcommand must be classified as either CLAWABLE or OPT_OUT.
|
||||
|
||||
If a new command is added to the parser but forgotten in both sets, this
|
||||
test fails loudly — forcing an explicit classification decision.
|
||||
"""
|
||||
|
||||
def test_every_registered_command_is_classified(self) -> None:
|
||||
subcommand_flags = _discover_subcommands_and_flags()
|
||||
all_classified = CLAWABLE_SURFACES | OPT_OUT_SURFACES
|
||||
unclassified = set(subcommand_flags.keys()) - all_classified
|
||||
assert not unclassified, (
|
||||
'Unclassified subcommands detected. Every new command must be '
|
||||
'explicitly added to either CLAWABLE_SURFACES (must accept '
|
||||
'--output-format) or OPT_OUT_SURFACES (explicitly exempt with '
|
||||
'rationale). Unclassified:\n'
|
||||
+ '\n'.join(f' - {cmd}' for cmd in sorted(unclassified))
|
||||
)
|
||||
|
||||
def test_no_command_in_both_sets(self) -> None:
|
||||
"""Sanity: a command cannot be both clawable AND opt-out."""
|
||||
overlap = CLAWABLE_SURFACES & OPT_OUT_SURFACES
|
||||
assert not overlap, (
|
||||
f'Classification conflict: commands appear in both sets: {overlap}'
|
||||
)
|
||||
|
||||
def test_all_classified_commands_actually_exist(self) -> None:
|
||||
"""No typos — every command in our sets must actually be registered."""
|
||||
subcommand_flags = _discover_subcommands_and_flags()
|
||||
ghosts = (CLAWABLE_SURFACES | OPT_OUT_SURFACES) - set(subcommand_flags.keys())
|
||||
assert not ghosts, (
|
||||
f'Phantom commands in classification sets (not in parser): {ghosts}. '
|
||||
'Update CLAWABLE_SURFACES / OPT_OUT_SURFACES if commands were removed.'
|
||||
)
|
||||
|
||||
|
||||
class TestJsonOutputContractEndToEnd:
|
||||
"""Verify the contract AT RUNTIME — not just parser-level, but actual execution.
|
||||
|
||||
Each clawable command must, when invoked with --output-format json,
|
||||
produce parseable JSON on stdout (for success cases).
|
||||
"""
|
||||
|
||||
# Minimal invocation args for each clawable command (to hit success path)
|
||||
RUNTIME_INVOCATIONS = {
|
||||
'list-sessions': [],
|
||||
# delete-session/load-session: skip (need state setup, covered by dedicated tests)
|
||||
'show-command': ['add-dir'],
|
||||
'show-tool': ['BashTool'],
|
||||
'exec-command': ['add-dir', 'hi'],
|
||||
'exec-tool': ['BashTool', '{}'],
|
||||
'route': ['review'],
|
||||
'bootstrap': ['hello'],
|
||||
'command-graph': [],
|
||||
'tool-pool': [],
|
||||
'bootstrap-graph': [],
|
||||
# flush-transcript: skip (creates files, covered by dedicated tests)
|
||||
}
|
||||
|
||||
@pytest.mark.parametrize('cmd_name,cmd_args', sorted(RUNTIME_INVOCATIONS.items()))
|
||||
def test_command_emits_parseable_json(self, cmd_name: str, cmd_args: list[str]) -> None:
|
||||
"""End-to-end: invoking with --output-format json yields valid JSON."""
|
||||
import json
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', cmd_name, *cmd_args, '--output-format', 'json'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
# Accept exit 0 (success) or 1 (typed not-found) — both must still produce JSON
|
||||
assert result.returncode in (0, 1), (
|
||||
f'{cmd_name}: unexpected exit {result.returncode}\n'
|
||||
f'stderr: {result.stderr}\n'
|
||||
f'stdout: {result.stdout[:200]}'
|
||||
)
|
||||
try:
|
||||
json.loads(result.stdout)
|
||||
except json.JSONDecodeError as e:
|
||||
pytest.fail(
|
||||
f'{cmd_name} {cmd_args} --output-format json did not produce '
|
||||
f'parseable JSON: {e}\nOutput: {result.stdout[:200]}'
|
||||
)
|
||||
|
||||
|
||||
class TestOptOutSurfaceRejection:
|
||||
"""Cycle #30: OPT_OUT surfaces must REJECT --output-format, not silently accept.
|
||||
|
||||
OPT_OUT_AUDIT.md classifies 12 surfaces as intentionally exempt from the
|
||||
JSON envelope contract. This test LOCKS that rejection so accidental
|
||||
drift (e.g., a developer adds --output-format to summary without thinking)
|
||||
doesn't silently promote an OPT_OUT surface to CLAWABLE.
|
||||
|
||||
Relationship to existing tests:
|
||||
- test_clawable_surface_has_output_format: asserts CLAWABLE surfaces accept it
|
||||
- TestOptOutSurfaceRejection: asserts OPT_OUT surfaces REJECT it
|
||||
|
||||
Together, these two test classes form a complete parity check:
|
||||
every surface is either IN or OUT, and both cases are explicitly tested.
|
||||
|
||||
If an OPT_OUT surface is promoted to CLAWABLE intentionally:
|
||||
1. Move it from OPT_OUT_SURFACES to CLAWABLE_SURFACES
|
||||
2. Update OPT_OUT_AUDIT.md with promotion rationale
|
||||
3. Remove from this test's expected rejections
|
||||
4. Both sets of tests continue passing
|
||||
"""
|
||||
|
||||
@pytest.mark.parametrize('cmd_name', sorted(OPT_OUT_SURFACES))
|
||||
def test_opt_out_surface_rejects_output_format(self, cmd_name: str) -> None:
|
||||
"""OPT_OUT surfaces must NOT accept --output-format flag.
|
||||
|
||||
Passing --output-format to an OPT_OUT surface should produce an
|
||||
'unrecognized arguments' error from argparse.
|
||||
"""
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', cmd_name, '--output-format', 'json'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
# Should fail — argparse exit 2 in text mode, exit 1 in JSON mode
|
||||
# (both modes normalize to "unrecognized arguments" message)
|
||||
assert result.returncode != 0, (
|
||||
f'{cmd_name} unexpectedly accepted --output-format json. '
|
||||
f'If this is intentional (promotion to CLAWABLE), move from '
|
||||
f'OPT_OUT_SURFACES to CLAWABLE_SURFACES and update OPT_OUT_AUDIT.md. '
|
||||
f'Output: {result.stdout[:200]}\nStderr: {result.stderr[:200]}'
|
||||
)
|
||||
# Verify the error is specifically about --output-format
|
||||
error_text = result.stdout + result.stderr
|
||||
assert '--output-format' in error_text or 'unrecognized' in error_text, (
|
||||
f'{cmd_name} failed but error not about --output-format. '
|
||||
f'Something else is broken:\n'
|
||||
f'stdout: {result.stdout[:300]}\nstderr: {result.stderr[:300]}'
|
||||
)
|
||||
|
||||
def test_opt_out_set_matches_audit_document(self) -> None:
|
||||
"""OPT_OUT_SURFACES constant must exactly match OPT_OUT_AUDIT.md listing.
|
||||
|
||||
This test reads OPT_OUT_AUDIT.md and verifies the constant doesn't
|
||||
drift from the documentation.
|
||||
"""
|
||||
audit_path = Path(__file__).resolve().parent.parent / 'OPT_OUT_AUDIT.md'
|
||||
audit_text = audit_path.read_text()
|
||||
|
||||
# Expected 12 surfaces per audit doc
|
||||
expected_surfaces = {
|
||||
# Group A: Rich-Markdown Reports (4)
|
||||
'summary', 'manifest', 'parity-audit', 'setup-report',
|
||||
# Group B: List Commands (3)
|
||||
'subsystems', 'commands', 'tools',
|
||||
# Group C: Simulation/Debug (5)
|
||||
'remote-mode', 'ssh-mode', 'teleport-mode',
|
||||
'direct-connect-mode', 'deep-link-mode',
|
||||
}
|
||||
|
||||
assert OPT_OUT_SURFACES == expected_surfaces, (
|
||||
f'OPT_OUT_SURFACES drift from expected 12 surfaces per audit:\n'
|
||||
f' Expected: {sorted(expected_surfaces)}\n'
|
||||
f' Actual: {sorted(OPT_OUT_SURFACES)}'
|
||||
)
|
||||
|
||||
# Each surface should be mentioned in audit doc
|
||||
missing_from_audit = [s for s in OPT_OUT_SURFACES if s not in audit_text]
|
||||
assert not missing_from_audit, (
|
||||
f'OPT_OUT surfaces not mentioned in OPT_OUT_AUDIT.md: {missing_from_audit}'
|
||||
)
|
||||
|
||||
def test_opt_out_count_matches_declared(self) -> None:
|
||||
"""OPT_OUT_AUDIT.md declares '12 surfaces'. Constant must match."""
|
||||
assert len(OPT_OUT_SURFACES) == 12, (
|
||||
f'OPT_OUT_SURFACES has {len(OPT_OUT_SURFACES)} items, '
|
||||
f'but OPT_OUT_AUDIT.md declares 12 total surfaces. '
|
||||
f'Update either the audit doc or the constant.'
|
||||
)
|
||||
70
tests/test_command_graph_tool_pool_output_format.py
Normal file
70
tests/test_command_graph_tool_pool_output_format.py
Normal file
@@ -0,0 +1,70 @@
|
||||
"""Tests for --output-format on command-graph and tool-pool (ROADMAP #169).
|
||||
|
||||
Diagnostic inventory surfaces now speak the CLI family's JSON contract.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
|
||||
def _run(args: list[str]) -> subprocess.CompletedProcess:
|
||||
return subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', *args],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
|
||||
|
||||
class TestCommandGraphOutputFormat:
|
||||
def test_command_graph_json(self) -> None:
|
||||
result = _run(['command-graph', '--output-format', 'json'])
|
||||
assert result.returncode == 0, result.stderr
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
assert 'builtins_count' in envelope
|
||||
assert 'plugin_like_count' in envelope
|
||||
assert 'skill_like_count' in envelope
|
||||
assert 'total_count' in envelope
|
||||
assert envelope['total_count'] == (
|
||||
envelope['builtins_count'] + envelope['plugin_like_count'] + envelope['skill_like_count']
|
||||
)
|
||||
assert isinstance(envelope['builtins'], list)
|
||||
if envelope['builtins']:
|
||||
assert set(envelope['builtins'][0].keys()) == {'name', 'source_hint'}
|
||||
|
||||
def test_command_graph_text_backward_compat(self) -> None:
|
||||
result = _run(['command-graph'])
|
||||
assert result.returncode == 0
|
||||
assert '# Command Graph' in result.stdout
|
||||
assert 'Builtins:' in result.stdout
|
||||
# Not JSON
|
||||
assert not result.stdout.strip().startswith('{')
|
||||
|
||||
|
||||
class TestToolPoolOutputFormat:
|
||||
def test_tool_pool_json(self) -> None:
|
||||
result = _run(['tool-pool', '--output-format', 'json'])
|
||||
assert result.returncode == 0, result.stderr
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
assert 'simple_mode' in envelope
|
||||
assert 'include_mcp' in envelope
|
||||
assert 'tool_count' in envelope
|
||||
assert 'tools' in envelope
|
||||
assert envelope['tool_count'] == len(envelope['tools'])
|
||||
if envelope['tools']:
|
||||
assert set(envelope['tools'][0].keys()) == {'name', 'source_hint'}
|
||||
|
||||
def test_tool_pool_text_backward_compat(self) -> None:
|
||||
result = _run(['tool-pool'])
|
||||
assert result.returncode == 0
|
||||
assert '# Tool Pool' in result.stdout
|
||||
assert 'Simple mode:' in result.stdout
|
||||
assert not result.stdout.strip().startswith('{')
|
||||
242
tests/test_cross_channel_consistency.py
Normal file
242
tests/test_cross_channel_consistency.py
Normal file
@@ -0,0 +1,242 @@
|
||||
"""Cycle #27 cross-channel consistency audit (post-#181).
|
||||
|
||||
After #181 fix (envelope.exit_code must match process exit), this test
|
||||
class systematizes the three-layer protocol invariant framework:
|
||||
|
||||
1. Structural compliance: Does the envelope exist? (#178)
|
||||
2. Quality compliance: Is stderr silent + message truthful? (#179)
|
||||
3. Cross-channel consistency: Do multiple channels agree? (#181 + this)
|
||||
|
||||
This file captures cycle #27's proactive invariant audit proving that
|
||||
envelope fields match their corresponding reality channels:
|
||||
|
||||
- envelope.command ↔ argv dispatch
|
||||
- envelope.output_format ↔ --output-format flag
|
||||
- envelope.timestamp ↔ actual wall clock
|
||||
- envelope.found/handled/deleted ↔ operational truth (no error block mismatch)
|
||||
|
||||
All tests passing = no drift detected.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import subprocess
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
import sys
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
|
||||
def _run(args: list[str]) -> subprocess.CompletedProcess:
|
||||
"""Run claw-code command and capture output."""
|
||||
return subprocess.run(
|
||||
['python3', '-m', 'src.main'] + args,
|
||||
cwd=Path(__file__).parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
|
||||
|
||||
class TestCrossChannelConsistency:
|
||||
"""Cycle #27: envelope fields must match reality channels.
|
||||
|
||||
These are distinct from structural/quality tests. A command can
|
||||
emit structurally valid JSON with clean stderr but still lie about
|
||||
its own output_format or exit code (as #181 proved).
|
||||
"""
|
||||
|
||||
def test_envelope_command_matches_dispatch(self) -> None:
|
||||
"""Envelope.command must equal the dispatched subcommand."""
|
||||
commands_to_test = [
|
||||
'show-command',
|
||||
'show-tool',
|
||||
'list-sessions',
|
||||
'exec-command',
|
||||
'exec-tool',
|
||||
'delete-session',
|
||||
]
|
||||
failures = []
|
||||
for cmd in commands_to_test:
|
||||
# Dispatch varies by arity
|
||||
if cmd == 'show-command':
|
||||
args = [cmd, 'nonexistent', '--output-format', 'json']
|
||||
elif cmd == 'show-tool':
|
||||
args = [cmd, 'nonexistent', '--output-format', 'json']
|
||||
elif cmd == 'exec-command':
|
||||
args = [cmd, 'unknown', 'test', '--output-format', 'json']
|
||||
elif cmd == 'exec-tool':
|
||||
args = [cmd, 'unknown', '{}', '--output-format', 'json']
|
||||
else:
|
||||
args = [cmd, '--output-format', 'json']
|
||||
|
||||
result = _run(args)
|
||||
try:
|
||||
envelope = json.loads(result.stdout)
|
||||
except json.JSONDecodeError:
|
||||
failures.append(f'{cmd}: JSON parse error')
|
||||
continue
|
||||
|
||||
if envelope.get('command') != cmd:
|
||||
failures.append(
|
||||
f'{cmd}: envelope.command={envelope.get("command")}, '
|
||||
f'expected {cmd}'
|
||||
)
|
||||
assert not failures, (
|
||||
'Envelope.command must match dispatched subcommand:\n' +
|
||||
'\n'.join(failures)
|
||||
)
|
||||
|
||||
def test_envelope_output_format_matches_flag(self) -> None:
|
||||
"""Envelope.output_format must match --output-format flag."""
|
||||
result = _run(['list-sessions', '--output-format', 'json'])
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['output_format'] == 'json', (
|
||||
f'output_format mismatch: flag=json, envelope={envelope["output_format"]}'
|
||||
)
|
||||
|
||||
def test_envelope_timestamp_is_recent(self) -> None:
|
||||
"""Envelope.timestamp must be recent (generated at call time)."""
|
||||
result = _run(['list-sessions', '--output-format', 'json'])
|
||||
envelope = json.loads(result.stdout)
|
||||
ts_str = envelope.get('timestamp')
|
||||
assert ts_str, 'no timestamp field'
|
||||
|
||||
ts = datetime.fromisoformat(ts_str.replace('Z', '+00:00'))
|
||||
now = datetime.now(timezone.utc)
|
||||
delta = abs((now - ts).total_seconds())
|
||||
|
||||
assert delta < 5, f'timestamp off by {delta}s (should be <5s)'
|
||||
|
||||
def test_envelope_exit_code_matches_process_exit(self) -> None:
|
||||
"""Cycle #26/#181: envelope.exit_code == process exit code.
|
||||
|
||||
This is a critical invariant. Claws that trust the envelope
|
||||
field must get the truth, not a lie.
|
||||
"""
|
||||
cases = [
|
||||
(['show-command', 'nonexistent', '--output-format', 'json'], 1),
|
||||
(['show-tool', 'nonexistent', '--output-format', 'json'], 1),
|
||||
(['list-sessions', '--output-format', 'json'], 0),
|
||||
(['delete-session', 'any-id', '--output-format', 'json'], 0),
|
||||
]
|
||||
failures = []
|
||||
for args, expected_exit in cases:
|
||||
result = _run(args)
|
||||
if result.returncode != expected_exit:
|
||||
failures.append(
|
||||
f'{args[0]}: process exit {result.returncode}, '
|
||||
f'expected {expected_exit}'
|
||||
)
|
||||
continue
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
if envelope['exit_code'] != result.returncode:
|
||||
failures.append(
|
||||
f'{args[0]}: process exit {result.returncode}, '
|
||||
f'envelope.exit_code {envelope["exit_code"]}'
|
||||
)
|
||||
|
||||
assert not failures, (
|
||||
'Envelope.exit_code must match process exit:\n' +
|
||||
'\n'.join(failures)
|
||||
)
|
||||
|
||||
def test_envelope_boolean_fields_match_error_presence(self) -> None:
|
||||
"""found/handled/deleted fields must correlate with error block.
|
||||
|
||||
- If field is True, no error block should exist
|
||||
- If field is False + operational error, error block must exist
|
||||
- If field is False + idempotent (delete nonexistent), no error block
|
||||
"""
|
||||
cases = [
|
||||
# (args, bool_field, expected_value, expect_error_block)
|
||||
(['show-command', 'nonexistent', '--output-format', 'json'],
|
||||
'found', False, True),
|
||||
(['exec-command', 'unknown', 'test', '--output-format', 'json'],
|
||||
'handled', False, True),
|
||||
(['delete-session', 'any-id', '--output-format', 'json'],
|
||||
'deleted', False, False), # idempotent, no error
|
||||
]
|
||||
failures = []
|
||||
for args, field, expected_val, expect_error in cases:
|
||||
result = _run(args)
|
||||
envelope = json.loads(result.stdout)
|
||||
|
||||
actual_val = envelope.get(field)
|
||||
has_error = 'error' in envelope
|
||||
|
||||
if actual_val != expected_val:
|
||||
failures.append(
|
||||
f'{args[0]}: {field}={actual_val}, expected {expected_val}'
|
||||
)
|
||||
if expect_error and not has_error:
|
||||
failures.append(
|
||||
f'{args[0]}: expected error block, but none present'
|
||||
)
|
||||
elif not expect_error and has_error:
|
||||
failures.append(
|
||||
f'{args[0]}: unexpected error block present'
|
||||
)
|
||||
|
||||
assert not failures, (
|
||||
'Boolean fields must correlate with error block:\n' +
|
||||
'\n'.join(failures)
|
||||
)
|
||||
|
||||
|
||||
class TestTextVsJsonModeDivergence:
|
||||
"""Cycle #29: Document known text-mode vs JSON-mode exit code divergence.
|
||||
|
||||
ERROR_HANDLING.md specifies the exit code contract applies ONLY when
|
||||
--output-format json is set. Text mode follows argparse defaults (e.g.,
|
||||
exit 2 for parse errors) while JSON mode normalizes to the contract
|
||||
(exit 1 for parse errors).
|
||||
|
||||
This test class LOCKS the expected divergence so:
|
||||
1. Documentation stays aligned with implementation
|
||||
2. Future changes to text mode behavior are caught as intentional
|
||||
3. Claws consuming subprocess output can trust the docs
|
||||
"""
|
||||
|
||||
def test_unknown_command_text_mode_exits_2(self) -> None:
|
||||
"""Text mode: argparse default exit 2 for unknown subcommand."""
|
||||
result = _run(['nonexistent-cmd'])
|
||||
assert result.returncode == 2, (
|
||||
f'text mode should exit 2 (argparse default), got {result.returncode}'
|
||||
)
|
||||
|
||||
def test_unknown_command_json_mode_exits_1(self) -> None:
|
||||
"""JSON mode: normalized exit 1 for parse error (#178)."""
|
||||
result = _run(['nonexistent-cmd', '--output-format', 'json'])
|
||||
assert result.returncode == 1, (
|
||||
f'JSON mode should exit 1 (protocol contract), got {result.returncode}'
|
||||
)
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['error']['kind'] == 'parse'
|
||||
|
||||
def test_missing_required_arg_text_mode_exits_2(self) -> None:
|
||||
"""Text mode: argparse default exit 2 for missing required arg."""
|
||||
result = _run(['exec-command']) # missing name + prompt
|
||||
assert result.returncode == 2, (
|
||||
f'text mode should exit 2, got {result.returncode}'
|
||||
)
|
||||
|
||||
def test_missing_required_arg_json_mode_exits_1(self) -> None:
|
||||
"""JSON mode: normalized exit 1 for parse error."""
|
||||
result = _run(['exec-command', '--output-format', 'json'])
|
||||
assert result.returncode == 1, (
|
||||
f'JSON mode should exit 1, got {result.returncode}'
|
||||
)
|
||||
|
||||
def test_success_path_identical_in_both_modes(self) -> None:
|
||||
"""Success exit codes are identical in both modes."""
|
||||
text_result = _run(['list-sessions'])
|
||||
json_result = _run(['list-sessions', '--output-format', 'json'])
|
||||
assert text_result.returncode == json_result.returncode == 0, (
|
||||
f'success exit should be 0 in both modes: '
|
||||
f'text={text_result.returncode}, json={json_result.returncode}'
|
||||
)
|
||||
306
tests/test_exec_route_bootstrap_output_format.py
Normal file
306
tests/test_exec_route_bootstrap_output_format.py
Normal file
@@ -0,0 +1,306 @@
|
||||
"""Tests for --output-format on exec-command/exec-tool/route/bootstrap (ROADMAP #168).
|
||||
|
||||
Closes the final JSON-parity gap across the CLI family. After #160/#165/
|
||||
#166/#167, the session-lifecycle and inspect CLI commands all spoke JSON;
|
||||
this batch extends that contract to the exec, route, and bootstrap
|
||||
surfaces — the commands claws actually invoke to DO work, not just inspect
|
||||
state.
|
||||
|
||||
Verifies:
|
||||
- exec-command / exec-tool: JSON envelope with handled + source_hint on
|
||||
success; {name, handled:false, error:{kind,message,retryable}} on
|
||||
not-found
|
||||
- route: JSON envelope with match_count + matches list
|
||||
- bootstrap: JSON envelope with setup, routed_matches, turn, messages,
|
||||
persisted_session_path
|
||||
- All 4 preserve legacy text mode byte-identically
|
||||
- Exit codes unchanged (0 success, 1 exec-not-found)
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
|
||||
def _run(args: list[str]) -> subprocess.CompletedProcess:
|
||||
return subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', *args],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
|
||||
|
||||
class TestExecCommandOutputFormat:
|
||||
def test_exec_command_found_json(self) -> None:
|
||||
result = _run(['exec-command', 'add-dir', 'hello', '--output-format', 'json'])
|
||||
assert result.returncode == 0, result.stderr
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['handled'] is True
|
||||
assert envelope['name'] == 'add-dir'
|
||||
assert envelope['prompt'] == 'hello'
|
||||
assert 'source_hint' in envelope
|
||||
assert 'message' in envelope
|
||||
assert 'error' not in envelope
|
||||
|
||||
def test_exec_command_not_found_json(self) -> None:
|
||||
result = _run(['exec-command', 'nonexistent-cmd', 'hi', '--output-format', 'json'])
|
||||
assert result.returncode == 1
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['handled'] is False
|
||||
assert envelope['name'] == 'nonexistent-cmd'
|
||||
assert envelope['prompt'] == 'hi'
|
||||
assert envelope['error']['kind'] == 'command_not_found'
|
||||
assert envelope['error']['retryable'] is False
|
||||
assert 'source_hint' not in envelope
|
||||
|
||||
def test_exec_command_text_backward_compat(self) -> None:
|
||||
result = _run(['exec-command', 'add-dir', 'hello'])
|
||||
assert result.returncode == 0
|
||||
# Single line prose (unchanged from pre-#168)
|
||||
assert result.stdout.count('\n') == 1
|
||||
assert 'add-dir' in result.stdout
|
||||
|
||||
|
||||
class TestExecToolOutputFormat:
|
||||
def test_exec_tool_found_json(self) -> None:
|
||||
result = _run(['exec-tool', 'BashTool', '{"cmd":"ls"}', '--output-format', 'json'])
|
||||
assert result.returncode == 0, result.stderr
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['handled'] is True
|
||||
assert envelope['name'] == 'BashTool'
|
||||
assert envelope['payload'] == '{"cmd":"ls"}'
|
||||
assert 'source_hint' in envelope
|
||||
assert 'error' not in envelope
|
||||
|
||||
def test_exec_tool_not_found_json(self) -> None:
|
||||
result = _run(['exec-tool', 'NotATool', '{}', '--output-format', 'json'])
|
||||
assert result.returncode == 1
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['handled'] is False
|
||||
assert envelope['name'] == 'NotATool'
|
||||
assert envelope['error']['kind'] == 'tool_not_found'
|
||||
assert envelope['error']['retryable'] is False
|
||||
|
||||
def test_exec_tool_text_backward_compat(self) -> None:
|
||||
result = _run(['exec-tool', 'BashTool', '{}'])
|
||||
assert result.returncode == 0
|
||||
assert result.stdout.count('\n') == 1
|
||||
|
||||
|
||||
class TestRouteOutputFormat:
|
||||
def test_route_json_envelope(self) -> None:
|
||||
result = _run(['route', 'review mcp', '--limit', '3', '--output-format', 'json'])
|
||||
assert result.returncode == 0
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['prompt'] == 'review mcp'
|
||||
assert envelope['limit'] == 3
|
||||
assert 'match_count' in envelope
|
||||
assert 'matches' in envelope
|
||||
assert envelope['match_count'] == len(envelope['matches'])
|
||||
# Every match has required keys
|
||||
for m in envelope['matches']:
|
||||
assert set(m.keys()) == {'kind', 'name', 'score', 'source_hint'}
|
||||
assert m['kind'] in ('command', 'tool')
|
||||
|
||||
def test_route_json_no_matches(self) -> None:
|
||||
# Very unusual string should yield zero matches
|
||||
result = _run(['route', 'zzzzzzzzzqqqqq', '--output-format', 'json'])
|
||||
assert result.returncode == 0
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['match_count'] == 0
|
||||
assert envelope['matches'] == []
|
||||
|
||||
def test_route_text_backward_compat(self) -> None:
|
||||
"""Text mode tab-separated output unchanged from pre-#168."""
|
||||
result = _run(['route', 'review mcp', '--limit', '2'])
|
||||
assert result.returncode == 0
|
||||
# Each non-empty line has exactly 3 tabs (kind\tname\tscore\tsource_hint)
|
||||
for line in result.stdout.strip().split('\n'):
|
||||
if line:
|
||||
assert line.count('\t') == 3
|
||||
|
||||
|
||||
class TestBootstrapOutputFormat:
|
||||
def test_bootstrap_json_envelope(self) -> None:
|
||||
result = _run(['bootstrap', 'review MCP', '--limit', '2', '--output-format', 'json'])
|
||||
assert result.returncode == 0, result.stderr
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
# Required top-level keys
|
||||
required = {
|
||||
'prompt', 'limit', 'setup', 'routed_matches',
|
||||
'command_execution_messages', 'tool_execution_messages',
|
||||
'turn', 'persisted_session_path',
|
||||
}
|
||||
assert required.issubset(envelope.keys())
|
||||
# Setup sub-envelope
|
||||
assert 'python_version' in envelope['setup']
|
||||
assert 'platform_name' in envelope['setup']
|
||||
# Turn sub-envelope
|
||||
assert 'stop_reason' in envelope['turn']
|
||||
assert 'prompt' in envelope['turn']
|
||||
|
||||
def test_bootstrap_text_is_markdown(self) -> None:
|
||||
"""Text mode produces Markdown (unchanged from pre-#168)."""
|
||||
result = _run(['bootstrap', 'hello', '--limit', '2'])
|
||||
assert result.returncode == 0
|
||||
# Markdown headers
|
||||
assert '# Runtime Session' in result.stdout
|
||||
assert '## Setup' in result.stdout
|
||||
assert '## Routed Matches' in result.stdout
|
||||
|
||||
|
||||
class TestFamilyWideJsonParity:
|
||||
"""After #167 and #168, ALL inspect/exec/route/lifecycle commands
|
||||
support --output-format. Verify the full family is now parity-complete."""
|
||||
|
||||
FAMILY_SURFACES = [
|
||||
# (cmd_args, expected_to_parse_json)
|
||||
(['show-command', 'add-dir'], True),
|
||||
(['show-tool', 'BashTool'], True),
|
||||
(['exec-command', 'add-dir', 'hi'], True),
|
||||
(['exec-tool', 'BashTool', '{}'], True),
|
||||
(['route', 'review'], True),
|
||||
(['bootstrap', 'hello'], True),
|
||||
]
|
||||
|
||||
def test_all_family_commands_accept_output_format_json(self) -> None:
|
||||
"""Every family command accepts --output-format json and emits parseable JSON."""
|
||||
failures = []
|
||||
for args_base, should_parse in self.FAMILY_SURFACES:
|
||||
result = _run([*args_base, '--output-format', 'json'])
|
||||
if result.returncode not in (0, 1):
|
||||
failures.append(f'{args_base}: exit {result.returncode} — {result.stderr}')
|
||||
continue
|
||||
try:
|
||||
json.loads(result.stdout)
|
||||
except json.JSONDecodeError as e:
|
||||
failures.append(f'{args_base}: not parseable JSON ({e}): {result.stdout[:100]}')
|
||||
assert not failures, (
|
||||
'CLI family JSON parity gap:\n' + '\n'.join(failures)
|
||||
)
|
||||
|
||||
def test_all_family_commands_text_mode_unchanged(self) -> None:
|
||||
"""Omitting --output-format defaults to text for every family command."""
|
||||
# Sanity: just verify each runs without error in text mode
|
||||
for args_base, _ in self.FAMILY_SURFACES:
|
||||
result = _run(args_base)
|
||||
assert result.returncode in (0, 1), (
|
||||
f'{args_base} failed in text mode: {result.stderr}'
|
||||
)
|
||||
# Output should not be JSON-shaped (no leading {)
|
||||
assert not result.stdout.strip().startswith('{')
|
||||
|
||||
|
||||
class TestEnvelopeExitCodeMatchesProcessExit:
|
||||
"""#181: Envelope exit_code field must match actual process exit code.
|
||||
|
||||
Regression test for the protocol violation where exec-command/exec-tool
|
||||
not-found cases returned exit code 1 from the process but emitted
|
||||
envelopes with exit_code: 0 (default wrap_json_envelope). Claws reading
|
||||
the envelope would misclassify failures as successes.
|
||||
|
||||
Contract (from ERROR_HANDLING.md):
|
||||
- Exit code 0 = success
|
||||
- Exit code 1 = error/not-found
|
||||
- Envelope MUST reflect process exit
|
||||
"""
|
||||
|
||||
def test_exec_command_not_found_envelope_exit_matches(self) -> None:
|
||||
"""exec-command 'unknown-name' must have exit_code=1 in envelope."""
|
||||
result = _run(['exec-command', 'nonexistent-cmd-name', 'test-prompt', '--output-format', 'json'])
|
||||
assert result.returncode == 1, f'process exit should be 1, got {result.returncode}'
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['exit_code'] == 1, (
|
||||
f'envelope.exit_code mismatch: process=1, envelope={envelope["exit_code"]}'
|
||||
)
|
||||
assert envelope['handled'] is False
|
||||
assert envelope['error']['kind'] == 'command_not_found'
|
||||
|
||||
def test_exec_tool_not_found_envelope_exit_matches(self) -> None:
|
||||
"""exec-tool 'unknown-tool' must have exit_code=1 in envelope."""
|
||||
result = _run(['exec-tool', 'nonexistent-tool-name', '{}', '--output-format', 'json'])
|
||||
assert result.returncode == 1, f'process exit should be 1, got {result.returncode}'
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['exit_code'] == 1, (
|
||||
f'envelope.exit_code mismatch: process=1, envelope={envelope["exit_code"]}'
|
||||
)
|
||||
assert envelope['handled'] is False
|
||||
assert envelope['error']['kind'] == 'tool_not_found'
|
||||
|
||||
def test_all_commands_exit_code_invariant(self) -> None:
|
||||
"""Audit: for every clawable command, envelope.exit_code == process exit.
|
||||
|
||||
This is a stronger invariant than 'emits JSON'. Claws dispatching on
|
||||
the envelope's exit_code field must get the truth, not a lie.
|
||||
"""
|
||||
# Sample cases known to return non-zero
|
||||
cases = [
|
||||
# command, expected_exit, justification
|
||||
(['show-command', 'nonexistent-abc'], 1, 'not-found inventory lookup'),
|
||||
(['show-tool', 'nonexistent-xyz'], 1, 'not-found inventory lookup'),
|
||||
(['exec-command', 'nonexistent-1', 'test'], 1, 'not-found execution'),
|
||||
(['exec-tool', 'nonexistent-2', '{}'], 1, 'not-found execution'),
|
||||
]
|
||||
mismatches = []
|
||||
for args, expected_exit, reason in cases:
|
||||
result = _run([*args, '--output-format', 'json'])
|
||||
if result.returncode != expected_exit:
|
||||
mismatches.append(
|
||||
f'{args}: expected process exit {expected_exit} ({reason}), '
|
||||
f'got {result.returncode}'
|
||||
)
|
||||
continue
|
||||
try:
|
||||
envelope = json.loads(result.stdout)
|
||||
except json.JSONDecodeError as e:
|
||||
mismatches.append(f'{args}: JSON parse failed: {e}')
|
||||
continue
|
||||
if envelope.get('exit_code') != result.returncode:
|
||||
mismatches.append(
|
||||
f'{args}: envelope.exit_code={envelope.get("exit_code")} '
|
||||
f'!= process exit={result.returncode} ({reason})'
|
||||
)
|
||||
assert not mismatches, (
|
||||
'Envelope exit_code must match process exit code:\n' +
|
||||
'\n'.join(mismatches)
|
||||
)
|
||||
|
||||
|
||||
class TestMetadataFlags:
|
||||
"""Cycle #28: --version flag implementation (#180 gap closure)."""
|
||||
|
||||
def test_version_flag_returns_version_text(self) -> None:
|
||||
"""--version returns version string and exits successfully."""
|
||||
result = _run(['--version'])
|
||||
assert result.returncode == 0
|
||||
assert 'claw-code' in result.stdout
|
||||
assert '1.0.0' in result.stdout
|
||||
|
||||
def test_help_flag_returns_help_text(self) -> None:
|
||||
"""--help returns help text and exits successfully."""
|
||||
result = _run(['--help'])
|
||||
assert result.returncode == 0
|
||||
assert 'usage:' in result.stdout
|
||||
assert 'Python porting workspace' in result.stdout
|
||||
|
||||
def test_help_still_works_after_version_added(self) -> None:
|
||||
"""Verify -h and --help both work (no regression)."""
|
||||
result_short = _run(['-h'])
|
||||
result_long = _run(['--help'])
|
||||
assert result_short.returncode == 0
|
||||
assert result_long.returncode == 0
|
||||
assert 'usage:' in result_short.stdout
|
||||
assert 'usage:' in result_long.stdout
|
||||
206
tests/test_flush_transcript_cli.py
Normal file
206
tests/test_flush_transcript_cli.py
Normal file
@@ -0,0 +1,206 @@
|
||||
"""Tests for flush-transcript CLI parity with the #160/#165 lifecycle triplet (ROADMAP #166).
|
||||
|
||||
Verifies that session *creation* now accepts the same flag family as session
|
||||
management (list/delete/load):
|
||||
- --directory DIR (alternate storage location)
|
||||
- --output-format {text,json} (structured output)
|
||||
- --session-id ID (deterministic IDs for claw checkpointing)
|
||||
|
||||
Also verifies backward compat: default text output unchanged byte-for-byte.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
|
||||
_REPO_ROOT = Path(__file__).resolve().parent.parent
|
||||
|
||||
|
||||
def _run_cli(*args: str) -> subprocess.CompletedProcess[str]:
|
||||
return subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', *args],
|
||||
capture_output=True, text=True, cwd=str(_REPO_ROOT),
|
||||
)
|
||||
|
||||
|
||||
class TestDirectoryFlag:
|
||||
def test_flush_transcript_writes_to_custom_directory(self, tmp_path: Path) -> None:
|
||||
result = _run_cli(
|
||||
'flush-transcript', 'hello world',
|
||||
'--directory', str(tmp_path),
|
||||
)
|
||||
assert result.returncode == 0, result.stderr
|
||||
# Exactly one session file should exist in the directory
|
||||
files = list(tmp_path.glob('*.json'))
|
||||
assert len(files) == 1
|
||||
# And the legacy text output points to that file
|
||||
assert str(files[0]) in result.stdout
|
||||
|
||||
|
||||
class TestSessionIdFlag:
|
||||
def test_explicit_session_id_is_respected(self, tmp_path: Path) -> None:
|
||||
result = _run_cli(
|
||||
'flush-transcript', 'hello',
|
||||
'--directory', str(tmp_path),
|
||||
'--session-id', 'deterministic-id-42',
|
||||
)
|
||||
assert result.returncode == 0, result.stderr
|
||||
expected_path = tmp_path / 'deterministic-id-42.json'
|
||||
assert expected_path.exists(), (
|
||||
f'session file not created at deterministic path: {expected_path}'
|
||||
)
|
||||
# And it should contain the ID we asked for
|
||||
data = json.loads(expected_path.read_text())
|
||||
assert data['session_id'] == 'deterministic-id-42'
|
||||
|
||||
def test_auto_session_id_when_flag_omitted(self, tmp_path: Path) -> None:
|
||||
"""Without --session-id, engine still auto-generates a UUID (backward compat)."""
|
||||
result = _run_cli(
|
||||
'flush-transcript', 'hello',
|
||||
'--directory', str(tmp_path),
|
||||
)
|
||||
assert result.returncode == 0
|
||||
files = list(tmp_path.glob('*.json'))
|
||||
assert len(files) == 1
|
||||
# The filename (minus .json) should be a 32-char hex UUID
|
||||
stem = files[0].stem
|
||||
assert len(stem) == 32
|
||||
assert all(c in '0123456789abcdef' for c in stem)
|
||||
|
||||
|
||||
class TestOutputFormatFlag:
|
||||
def test_json_mode_emits_structured_envelope(self, tmp_path: Path) -> None:
|
||||
result = _run_cli(
|
||||
'flush-transcript', 'hello',
|
||||
'--directory', str(tmp_path),
|
||||
'--session-id', 'beta',
|
||||
'--output-format', 'json',
|
||||
)
|
||||
assert result.returncode == 0
|
||||
data = json.loads(result.stdout)
|
||||
assert data['session_id'] == 'beta'
|
||||
assert data['flushed'] is True
|
||||
assert data['path'].endswith('beta.json')
|
||||
# messages_count and token counts should be present and typed
|
||||
assert isinstance(data['messages_count'], int)
|
||||
assert isinstance(data['input_tokens'], int)
|
||||
assert isinstance(data['output_tokens'], int)
|
||||
|
||||
def test_text_mode_byte_identical_to_pre_166_output(self, tmp_path: Path) -> None:
|
||||
"""Legacy text output must not change — claws may be parsing it."""
|
||||
result = _run_cli(
|
||||
'flush-transcript', 'hello',
|
||||
'--directory', str(tmp_path),
|
||||
)
|
||||
assert result.returncode == 0
|
||||
lines = result.stdout.strip().split('\n')
|
||||
# Line 1: path ending in .json
|
||||
assert lines[0].endswith('.json')
|
||||
# Line 2: exact legacy format
|
||||
assert lines[1] == 'flushed=True'
|
||||
|
||||
|
||||
class TestBackwardCompat:
|
||||
def test_no_flags_default_behaviour(self, tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None:
|
||||
"""Running with no flags still works (default dir, text mode, auto UUID)."""
|
||||
import os
|
||||
env = os.environ.copy()
|
||||
env['PYTHONPATH'] = str(_REPO_ROOT)
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'flush-transcript', 'hello'],
|
||||
capture_output=True, text=True, cwd=str(tmp_path), env=env,
|
||||
)
|
||||
assert result.returncode == 0, result.stderr
|
||||
# Default dir is `.port_sessions` in CWD
|
||||
sessions_dir = tmp_path / '.port_sessions'
|
||||
assert sessions_dir.exists()
|
||||
assert len(list(sessions_dir.glob('*.json'))) == 1
|
||||
|
||||
|
||||
class TestLifecycleIntegration:
|
||||
"""#166's real value: the triplet + creation command are now a coherent family."""
|
||||
|
||||
def test_create_then_list_then_load_then_delete_roundtrip(
|
||||
self, tmp_path: Path,
|
||||
) -> None:
|
||||
"""End-to-end: flush → list → load → delete, all via the same --directory."""
|
||||
# 1. Create
|
||||
create_result = _run_cli(
|
||||
'flush-transcript', 'roundtrip test',
|
||||
'--directory', str(tmp_path),
|
||||
'--session-id', 'rt-session',
|
||||
'--output-format', 'json',
|
||||
)
|
||||
assert create_result.returncode == 0
|
||||
assert json.loads(create_result.stdout)['session_id'] == 'rt-session'
|
||||
|
||||
# 2. List
|
||||
list_result = _run_cli(
|
||||
'list-sessions',
|
||||
'--directory', str(tmp_path),
|
||||
'--output-format', 'json',
|
||||
)
|
||||
assert list_result.returncode == 0
|
||||
list_data = json.loads(list_result.stdout)
|
||||
assert 'rt-session' in list_data['sessions']
|
||||
|
||||
# 3. Load
|
||||
load_result = _run_cli(
|
||||
'load-session', 'rt-session',
|
||||
'--directory', str(tmp_path),
|
||||
'--output-format', 'json',
|
||||
)
|
||||
assert load_result.returncode == 0
|
||||
assert json.loads(load_result.stdout)['loaded'] is True
|
||||
|
||||
# 4. Delete
|
||||
delete_result = _run_cli(
|
||||
'delete-session', 'rt-session',
|
||||
'--directory', str(tmp_path),
|
||||
'--output-format', 'json',
|
||||
)
|
||||
assert delete_result.returncode == 0
|
||||
|
||||
# 5. Verify gone
|
||||
verify_result = _run_cli(
|
||||
'load-session', 'rt-session',
|
||||
'--directory', str(tmp_path),
|
||||
'--output-format', 'json',
|
||||
)
|
||||
assert verify_result.returncode == 1
|
||||
assert json.loads(verify_result.stdout)['error']['kind'] == 'session_not_found'
|
||||
|
||||
|
||||
class TestFullFamilyParity:
|
||||
"""All four session-lifecycle CLI commands accept the same core flag pair.
|
||||
|
||||
This is the #166 acceptance test: flush-transcript joins the family.
|
||||
"""
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
'command',
|
||||
['list-sessions', 'delete-session', 'load-session', 'flush-transcript'],
|
||||
)
|
||||
def test_all_four_accept_directory_flag(self, command: str) -> None:
|
||||
help_text = _run_cli(command, '--help').stdout
|
||||
assert '--directory' in help_text, (
|
||||
f'{command} missing --directory flag (#166 parity gap)'
|
||||
)
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
'command',
|
||||
['list-sessions', 'delete-session', 'load-session', 'flush-transcript'],
|
||||
)
|
||||
def test_all_four_accept_output_format_flag(self, command: str) -> None:
|
||||
help_text = _run_cli(command, '--help').stdout
|
||||
assert '--output-format' in help_text, (
|
||||
f'{command} missing --output-format flag (#166 parity gap)'
|
||||
)
|
||||
213
tests/test_json_envelope_field_consistency.py
Normal file
213
tests/test_json_envelope_field_consistency.py
Normal file
@@ -0,0 +1,213 @@
|
||||
"""JSON envelope field consistency validation (ROADMAP #173 prep).
|
||||
|
||||
This test suite validates that clawable-surface commands' JSON output
|
||||
follows the contract defined in SCHEMAS.md. Currently, commands emit
|
||||
command-specific envelopes without the canonical common fields
|
||||
(timestamp, command, exit_code, output_format, schema_version).
|
||||
|
||||
This test documents the current gap and validates the consistency
|
||||
of what IS there, providing a baseline for #173 (common field wrapping).
|
||||
|
||||
Phase 1 (this test): Validate consistency within each command's envelope.
|
||||
Phase 2 (future #173): Wrap all 13 commands with canonical common fields.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import pytest
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
from src.main import build_parser # noqa: E402
|
||||
|
||||
|
||||
# Expected fields for each clawable command's JSON envelope.
|
||||
# These are the command-specific fields (not including common fields yet).
|
||||
# Entries are (command_name, required_fields, optional_fields).
|
||||
ENVELOPE_CONTRACTS = {
|
||||
'list-sessions': (
|
||||
{'count', 'sessions'},
|
||||
set(),
|
||||
),
|
||||
'delete-session': (
|
||||
{'session_id', 'deleted', 'directory'},
|
||||
set(),
|
||||
),
|
||||
'load-session': (
|
||||
{'session_id', 'loaded', 'directory', 'path'},
|
||||
set(),
|
||||
),
|
||||
'flush-transcript': (
|
||||
{'session_id', 'path', 'flushed', 'messages_count', 'input_tokens', 'output_tokens'},
|
||||
set(),
|
||||
),
|
||||
'show-command': (
|
||||
{'name', 'found', 'source_hint', 'responsibility'},
|
||||
set(),
|
||||
),
|
||||
'show-tool': (
|
||||
{'name', 'found', 'source_hint'},
|
||||
set(),
|
||||
),
|
||||
'exec-command': (
|
||||
{'name', 'prompt', 'handled', 'message', 'source_hint'},
|
||||
set(),
|
||||
),
|
||||
'exec-tool': (
|
||||
{'name', 'payload', 'handled', 'message', 'source_hint'},
|
||||
set(),
|
||||
),
|
||||
'route': (
|
||||
{'prompt', 'limit', 'match_count', 'matches'},
|
||||
set(),
|
||||
),
|
||||
'bootstrap': (
|
||||
{'prompt', 'setup', 'routed_matches', 'turn', 'persisted_session_path'},
|
||||
set(),
|
||||
),
|
||||
'command-graph': (
|
||||
{'builtins_count', 'plugin_like_count', 'skill_like_count', 'total_count', 'builtins', 'plugin_like', 'skill_like'},
|
||||
set(),
|
||||
),
|
||||
'tool-pool': (
|
||||
{'simple_mode', 'include_mcp', 'tool_count', 'tools'},
|
||||
set(),
|
||||
),
|
||||
'bootstrap-graph': (
|
||||
{'stages', 'note'},
|
||||
set(),
|
||||
),
|
||||
}
|
||||
|
||||
|
||||
class TestJsonEnvelopeConsistency:
|
||||
"""Validate current command envelopes match their declared contracts.
|
||||
|
||||
This is a consistency check, not a conformance check. Once #173 adds
|
||||
common fields to all commands, these tests will auto-pass the common
|
||||
field assertions and verify command-specific fields stay consistent.
|
||||
"""
|
||||
|
||||
@pytest.mark.parametrize('cmd_name,contract', sorted(ENVELOPE_CONTRACTS.items()))
|
||||
def test_command_json_fields_present(self, cmd_name: str, contract: tuple[set[str], set[str]]) -> None:
|
||||
required, optional = contract
|
||||
"""Command's JSON envelope must include all required fields."""
|
||||
# Get minimal invocation args for this command
|
||||
test_invocations = {
|
||||
'list-sessions': [],
|
||||
'show-command': ['add-dir'],
|
||||
'show-tool': ['BashTool'],
|
||||
'exec-command': ['add-dir', 'hi'],
|
||||
'exec-tool': ['BashTool', '{}'],
|
||||
'route': ['review'],
|
||||
'bootstrap': ['hello'],
|
||||
'command-graph': [],
|
||||
'tool-pool': [],
|
||||
'bootstrap-graph': [],
|
||||
}
|
||||
|
||||
if cmd_name not in test_invocations:
|
||||
pytest.skip(f'{cmd_name} requires session setup; skipped')
|
||||
|
||||
cmd_args = test_invocations[cmd_name]
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', cmd_name, *cmd_args, '--output-format', 'json'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
|
||||
if result.returncode not in (0, 1):
|
||||
pytest.fail(f'{cmd_name}: unexpected exit {result.returncode}\nstderr: {result.stderr}')
|
||||
|
||||
try:
|
||||
envelope = json.loads(result.stdout)
|
||||
except json.JSONDecodeError as e:
|
||||
pytest.fail(f'{cmd_name}: invalid JSON: {e}\nOutput: {result.stdout[:200]}')
|
||||
|
||||
# Check required fields (command-specific)
|
||||
missing = required - set(envelope.keys())
|
||||
if missing:
|
||||
pytest.fail(
|
||||
f'{cmd_name} envelope missing required fields: {missing}\n'
|
||||
f'Expected: {required}\nGot: {set(envelope.keys())}'
|
||||
)
|
||||
|
||||
# Check that extra fields are accounted for (warn if unknown)
|
||||
known = required | optional
|
||||
extra = set(envelope.keys()) - known
|
||||
if extra:
|
||||
# Warn but don't fail — there may be new fields added
|
||||
pytest.warns(UserWarning, match=f'extra fields in {cmd_name}: {extra}')
|
||||
|
||||
def test_envelope_field_value_types(self) -> None:
|
||||
"""Smoke test: envelope fields have expected types (bool, int, str, list, dict, null)."""
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'list-sessions', '--output-format', 'json'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
|
||||
# Spot check a few fields
|
||||
assert isinstance(envelope.get('count'), int), 'count should be int'
|
||||
assert isinstance(envelope.get('sessions'), list), 'sessions should be list'
|
||||
|
||||
|
||||
class TestJsonEnvelopeCommonFieldPrep:
|
||||
"""Validation stubs for common fields (part of #173 implementation).
|
||||
|
||||
These tests will activate once wrap_json_envelope() is applied to all
|
||||
13 clawable commands. Currently they document the expected contract.
|
||||
"""
|
||||
|
||||
def test_all_envelopes_include_timestamp(self) -> None:
|
||||
"""Every clawable envelope must include ISO 8601 UTC timestamp."""
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'command-graph', '--output-format', 'json'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
envelope = json.loads(result.stdout)
|
||||
assert 'timestamp' in envelope, 'Missing timestamp field'
|
||||
# Verify ISO 8601 format (ends with Z for UTC)
|
||||
assert envelope['timestamp'].endswith('Z'), f'Timestamp not UTC: {envelope["timestamp"]}'
|
||||
|
||||
def test_all_envelopes_include_command(self) -> None:
|
||||
"""Every envelope must echo the command name."""
|
||||
test_cases = [
|
||||
('list-sessions', []),
|
||||
('command-graph', []),
|
||||
('bootstrap', ['hello']),
|
||||
]
|
||||
for cmd_name, cmd_args in test_cases:
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', cmd_name, *cmd_args, '--output-format', 'json'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope.get('command') == cmd_name, f'{cmd_name} envelope.command mismatch'
|
||||
|
||||
def test_all_envelopes_include_exit_code_and_schema_version(self) -> None:
|
||||
"""Every envelope must include exit_code and schema_version."""
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'tool-pool', '--output-format', 'json'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
envelope = json.loads(result.stdout)
|
||||
assert 'exit_code' in envelope, 'Missing exit_code'
|
||||
assert 'schema_version' in envelope, 'Missing schema_version'
|
||||
assert envelope['schema_version'] == '1.0', 'Wrong schema_version'
|
||||
183
tests/test_load_session_cli.py
Normal file
183
tests/test_load_session_cli.py
Normal file
@@ -0,0 +1,183 @@
|
||||
"""Tests for load-session CLI parity with list-sessions/delete-session (ROADMAP #165).
|
||||
|
||||
Verifies the session-lifecycle CLI triplet is now symmetric:
|
||||
- --directory DIR accepted (alternate storage locations reachable)
|
||||
- --output-format {text,json} accepted
|
||||
- Not-found emits typed JSON error envelope, never a Python traceback
|
||||
- Corrupted session file distinguished from not-found via 'kind'
|
||||
- Legacy text-mode output unchanged (backward compat)
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
from src.session_store import StoredSession, save_session # noqa: E402
|
||||
|
||||
|
||||
_REPO_ROOT = Path(__file__).resolve().parent.parent
|
||||
|
||||
|
||||
def _run_cli(
|
||||
*args: str, cwd: Path | None = None,
|
||||
) -> subprocess.CompletedProcess[str]:
|
||||
"""Always invoke the CLI with cwd=repo-root so ``python -m src.main``
|
||||
can resolve the ``src`` package, regardless of where the test's
|
||||
tmp_path is.
|
||||
"""
|
||||
return subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', *args],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
cwd=str(cwd) if cwd else str(_REPO_ROOT),
|
||||
)
|
||||
|
||||
|
||||
def _make_session(session_id: str) -> StoredSession:
|
||||
return StoredSession(
|
||||
session_id=session_id, messages=('hi',), input_tokens=1, output_tokens=2,
|
||||
)
|
||||
|
||||
|
||||
class TestDirectoryFlagParity:
|
||||
def test_load_session_accepts_directory_flag(self, tmp_path: Path) -> None:
|
||||
save_session(_make_session('alpha'), tmp_path)
|
||||
result = _run_cli('load-session', 'alpha', '--directory', str(tmp_path))
|
||||
assert result.returncode == 0, result.stderr
|
||||
assert 'alpha' in result.stdout
|
||||
|
||||
def test_load_session_without_directory_uses_cwd_default(
|
||||
self, tmp_path: Path,
|
||||
) -> None:
|
||||
"""When --directory is omitted, fall back to .port_sessions in CWD.
|
||||
|
||||
Subprocess CWD must still be able to import ``src.main``, so we use
|
||||
``cwd=tmp_path`` which means ``python -m src.main`` needs ``src/`` on
|
||||
sys.path. We set PYTHONPATH to the repo root via env.
|
||||
"""
|
||||
sessions_dir = tmp_path / '.port_sessions'
|
||||
sessions_dir.mkdir()
|
||||
save_session(_make_session('beta'), sessions_dir)
|
||||
import os
|
||||
env = os.environ.copy()
|
||||
env['PYTHONPATH'] = str(_REPO_ROOT)
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'load-session', 'beta'],
|
||||
capture_output=True, text=True, cwd=str(tmp_path), env=env,
|
||||
)
|
||||
assert result.returncode == 0, result.stderr
|
||||
assert 'beta' in result.stdout
|
||||
|
||||
|
||||
class TestOutputFormatFlagParity:
|
||||
def test_json_mode_on_success(self, tmp_path: Path) -> None:
|
||||
save_session(
|
||||
StoredSession(
|
||||
session_id='gamma', messages=('x', 'y'),
|
||||
input_tokens=5, output_tokens=7,
|
||||
),
|
||||
tmp_path,
|
||||
)
|
||||
result = _run_cli(
|
||||
'load-session', 'gamma',
|
||||
'--directory', str(tmp_path),
|
||||
'--output-format', 'json',
|
||||
)
|
||||
assert result.returncode == 0
|
||||
data = json.loads(result.stdout)
|
||||
# Verify common envelope fields (SCHEMAS.md contract)
|
||||
assert 'timestamp' in data
|
||||
assert data['command'] == 'load-session'
|
||||
assert data['exit_code'] == 0
|
||||
assert data['schema_version'] == '1.0'
|
||||
# Verify command-specific fields
|
||||
assert data['session_id'] == 'gamma'
|
||||
assert data['loaded'] is True
|
||||
assert data['messages_count'] == 2
|
||||
assert data['input_tokens'] == 5
|
||||
assert data['output_tokens'] == 7
|
||||
|
||||
def test_text_mode_unchanged_on_success(self, tmp_path: Path) -> None:
|
||||
"""Legacy text output must be byte-identical for backward compat."""
|
||||
save_session(_make_session('delta'), tmp_path)
|
||||
result = _run_cli('load-session', 'delta', '--directory', str(tmp_path))
|
||||
assert result.returncode == 0
|
||||
lines = result.stdout.strip().split('\n')
|
||||
assert lines == ['delta', '1 messages', 'in=1 out=2']
|
||||
|
||||
|
||||
class TestNotFoundTypedError:
|
||||
def test_not_found_json_envelope(self, tmp_path: Path) -> None:
|
||||
"""Not-found emits structured JSON, never a Python traceback."""
|
||||
result = _run_cli(
|
||||
'load-session', 'missing',
|
||||
'--directory', str(tmp_path),
|
||||
'--output-format', 'json',
|
||||
)
|
||||
assert result.returncode == 1
|
||||
assert 'Traceback' not in result.stderr, (
|
||||
'regression #165: raw traceback leaked to stderr'
|
||||
)
|
||||
assert 'SessionNotFoundError' not in result.stdout, (
|
||||
'regression #165: internal class name leaked into CLI output'
|
||||
)
|
||||
data = json.loads(result.stdout)
|
||||
assert data['session_id'] == 'missing'
|
||||
assert data['loaded'] is False
|
||||
assert data['error']['kind'] == 'session_not_found'
|
||||
assert data['error']['retryable'] is False
|
||||
# directory field is populated so claws know where we looked
|
||||
assert 'directory' in data['error']
|
||||
|
||||
def test_not_found_text_mode_no_traceback(self, tmp_path: Path) -> None:
|
||||
"""Text mode on not-found must not dump a Python stack either."""
|
||||
result = _run_cli(
|
||||
'load-session', 'missing', '--directory', str(tmp_path),
|
||||
)
|
||||
assert result.returncode == 1
|
||||
assert 'Traceback' not in result.stderr
|
||||
assert result.stdout.startswith('error:')
|
||||
|
||||
|
||||
class TestLoadFailedDistinctFromNotFound:
|
||||
def test_corrupted_session_file_surfaces_distinct_kind(
|
||||
self, tmp_path: Path,
|
||||
) -> None:
|
||||
"""A corrupted JSON file must emit kind='session_load_failed', not 'session_not_found'."""
|
||||
(tmp_path / 'broken.json').write_text('{ not valid json')
|
||||
result = _run_cli(
|
||||
'load-session', 'broken',
|
||||
'--directory', str(tmp_path),
|
||||
'--output-format', 'json',
|
||||
)
|
||||
assert result.returncode == 1
|
||||
data = json.loads(result.stdout)
|
||||
assert data['error']['kind'] == 'session_load_failed'
|
||||
assert data['error']['retryable'] is True, (
|
||||
'corrupted file is potentially retryable (fs glitch) unlike not-found'
|
||||
)
|
||||
|
||||
|
||||
class TestTripletParityConsistency:
|
||||
"""All three #160 CLI commands should accept the same flag pair."""
|
||||
|
||||
@pytest.mark.parametrize('command', ['list-sessions', 'delete-session', 'load-session'])
|
||||
def test_all_three_accept_directory_flag(self, command: str) -> None:
|
||||
help_text = _run_cli(command, '--help').stdout
|
||||
assert '--directory' in help_text, (
|
||||
f'{command} missing --directory flag (#165 parity gap)'
|
||||
)
|
||||
|
||||
@pytest.mark.parametrize('command', ['list-sessions', 'delete-session', 'load-session'])
|
||||
def test_all_three_accept_output_format_flag(self, command: str) -> None:
|
||||
help_text = _run_cli(command, '--help').stdout
|
||||
assert '--output-format' in help_text, (
|
||||
f'{command} missing --output-format flag (#165 parity gap)'
|
||||
)
|
||||
239
tests/test_parse_error_envelope.py
Normal file
239
tests/test_parse_error_envelope.py
Normal file
@@ -0,0 +1,239 @@
|
||||
"""#178 — argparse-level errors emit JSON envelope when --output-format json is requested.
|
||||
|
||||
Before #178:
|
||||
$ claw nonexistent --output-format json
|
||||
usage: main.py [-h] {summary,manifest,...} ...
|
||||
main.py: error: argument command: invalid choice: 'nonexistent' (choose from ...)
|
||||
[exit 2, argparse dumps help to stderr, no JSON envelope]
|
||||
|
||||
After #178:
|
||||
$ claw nonexistent --output-format json
|
||||
{"timestamp": "...", "command": "nonexistent", "exit_code": 1, ...,
|
||||
"error": {"kind": "parse", "operation": "argparse", ...}}
|
||||
[exit 1, JSON envelope on stdout, matches SCHEMAS.md contract]
|
||||
|
||||
Contract:
|
||||
- text mode: unchanged (argparse still dumps help to stderr, exit code 2)
|
||||
- JSON mode: envelope matches SCHEMAS.md 'error' shape, exit code 1
|
||||
- Parse errors use error.kind='parse' (distinct from runtime/session/etc.)
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
CLI = [sys.executable, '-m', 'src.main']
|
||||
REPO_ROOT = Path(__file__).resolve().parent.parent
|
||||
|
||||
|
||||
class TestParseErrorJsonEnvelope:
|
||||
"""Argparse errors emit JSON envelope when --output-format json is requested."""
|
||||
|
||||
def test_unknown_command_json_mode_emits_envelope(self) -> None:
|
||||
"""Unknown command + --output-format json → parse-error envelope."""
|
||||
result = subprocess.run(
|
||||
CLI + ['nonexistent-command', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 1, f"expected exit 1; got {result.returncode}"
|
||||
envelope = json.loads(result.stdout)
|
||||
# Common fields
|
||||
assert envelope['schema_version'] == '1.0'
|
||||
assert envelope['output_format'] == 'json'
|
||||
assert envelope['exit_code'] == 1
|
||||
# Error envelope shape
|
||||
assert envelope['error']['kind'] == 'parse'
|
||||
assert envelope['error']['operation'] == 'argparse'
|
||||
assert envelope['error']['retryable'] is False
|
||||
assert envelope['error']['target'] == 'nonexistent-command'
|
||||
assert 'hint' in envelope['error']
|
||||
|
||||
def test_unknown_command_json_equals_syntax(self) -> None:
|
||||
"""--output-format=json syntax also works."""
|
||||
result = subprocess.run(
|
||||
CLI + ['nonexistent-command', '--output-format=json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 1
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['error']['kind'] == 'parse'
|
||||
|
||||
def test_unknown_command_text_mode_unchanged(self) -> None:
|
||||
"""Text mode (default) preserves argparse behavior: help to stderr, exit 2."""
|
||||
result = subprocess.run(
|
||||
CLI + ['nonexistent-command'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 2, f"text mode must preserve argparse exit 2; got {result.returncode}"
|
||||
# stderr should have argparse error (help + error message)
|
||||
assert 'invalid choice' in result.stderr
|
||||
# stdout should be empty (no JSON leaked)
|
||||
assert result.stdout == ''
|
||||
|
||||
def test_invalid_flag_json_mode_emits_envelope(self) -> None:
|
||||
"""Invalid flag at top level + --output-format json → envelope."""
|
||||
result = subprocess.run(
|
||||
CLI + ['--invalid-top-level-flag', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
# argparse might reject before --output-format is parsed; still emit envelope
|
||||
assert result.returncode == 1, f"got {result.returncode}: {result.stderr}"
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['error']['kind'] == 'parse'
|
||||
|
||||
def test_missing_command_no_json_flag_behaves_normally(self) -> None:
|
||||
"""No --output-format flag + missing command → normal argparse behavior."""
|
||||
result = subprocess.run(
|
||||
CLI,
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
# argparse exits 2 when required subcommand is missing
|
||||
assert result.returncode == 2
|
||||
assert 'required' in result.stderr.lower() or 'the following arguments are required' in result.stderr.lower()
|
||||
|
||||
def test_valid_command_unaffected(self) -> None:
|
||||
"""Valid commands still work normally (no regression)."""
|
||||
result = subprocess.run(
|
||||
CLI + ['list-sessions', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 0
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['command'] == 'list-sessions'
|
||||
assert 'sessions' in envelope
|
||||
|
||||
def test_parse_error_envelope_contains_common_fields(self) -> None:
|
||||
"""Parse-error envelope must include all common fields per SCHEMAS.md."""
|
||||
result = subprocess.run(
|
||||
CLI + ['bogus', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
envelope = json.loads(result.stdout)
|
||||
# All common fields required by SCHEMAS.md
|
||||
for field in ('timestamp', 'command', 'exit_code', 'output_format', 'schema_version'):
|
||||
assert field in envelope, f"common field '{field}' missing from parse-error envelope"
|
||||
|
||||
|
||||
class TestParseErrorSchemaCompliance:
|
||||
"""Parse-error envelope matches SCHEMAS.md error shape."""
|
||||
|
||||
def test_error_kind_is_parse(self) -> None:
|
||||
"""error.kind='parse' distinguishes argparse errors from runtime errors."""
|
||||
result = subprocess.run(
|
||||
CLI + ['unknown', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['error']['kind'] == 'parse'
|
||||
|
||||
def test_error_retryable_false(self) -> None:
|
||||
"""Parse errors are never retryable (typo won't magically fix itself)."""
|
||||
result = subprocess.run(
|
||||
CLI + ['unknown', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['error']['retryable'] is False
|
||||
|
||||
|
||||
class TestParseErrorStderrHygiene:
|
||||
"""#179: JSON mode must fully suppress argparse stderr output.
|
||||
|
||||
Before #179: stderr leaked argparse usage + error text even when --output-format json.
|
||||
After #179: stderr is silent; envelope carries the real error message verbatim.
|
||||
"""
|
||||
|
||||
def test_json_mode_stderr_is_silent_on_unknown_command(self) -> None:
|
||||
"""Unknown command in JSON mode: stderr empty."""
|
||||
result = subprocess.run(
|
||||
CLI + ['nonexistent-cmd', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.stderr == '', (
|
||||
f"JSON mode stderr must be empty; got:\n{result.stderr!r}"
|
||||
)
|
||||
|
||||
def test_json_mode_stderr_is_silent_on_missing_arg(self) -> None:
|
||||
"""Missing required arg in JSON mode: stderr empty (no argparse usage leak)."""
|
||||
result = subprocess.run(
|
||||
CLI + ['load-session', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.stderr == '', (
|
||||
f"JSON mode stderr must be empty on missing arg; got:\n{result.stderr!r}"
|
||||
)
|
||||
|
||||
def test_json_mode_envelope_carries_real_argparse_message(self) -> None:
|
||||
"""#179: envelope.error.message contains argparse's actual text, not generic rejection."""
|
||||
result = subprocess.run(
|
||||
CLI + ['load-session', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
envelope = json.loads(result.stdout)
|
||||
# Real argparse message: 'the following arguments are required: session_id'
|
||||
msg = envelope['error']['message']
|
||||
assert 'session_id' in msg, (
|
||||
f"envelope.error.message must carry real argparse text mentioning missing arg; got: {msg!r}"
|
||||
)
|
||||
assert 'required' in msg.lower(), (
|
||||
f"envelope.error.message must indicate what is required; got: {msg!r}"
|
||||
)
|
||||
|
||||
def test_json_mode_envelope_carries_invalid_choice_details(self) -> None:
|
||||
"""#179: unknown command envelope includes valid-choice list from argparse."""
|
||||
result = subprocess.run(
|
||||
CLI + ['typo-command', '--output-format', 'json'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
envelope = json.loads(result.stdout)
|
||||
msg = envelope['error']['message']
|
||||
assert 'invalid choice' in msg.lower(), (
|
||||
f"envelope must mention 'invalid choice'; got: {msg!r}"
|
||||
)
|
||||
# Should include at least one valid command name for discoverability
|
||||
assert 'bootstrap' in msg or 'summary' in msg, (
|
||||
f"envelope must include valid choices for discoverability; got: {msg!r}"
|
||||
)
|
||||
|
||||
def test_text_mode_stderr_preserved_on_unknown_command(self) -> None:
|
||||
"""Text mode: argparse stderr behavior unchanged (backward compat)."""
|
||||
result = subprocess.run(
|
||||
CLI + ['nonexistent-cmd'],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
# Text mode still dumps argparse help to stderr
|
||||
assert 'invalid choice' in result.stderr
|
||||
assert result.returncode == 2
|
||||
@@ -173,6 +173,105 @@ class PortingWorkspaceTests(unittest.TestCase):
|
||||
self.assertIn(session_id, result.stdout)
|
||||
self.assertIn('messages', result.stdout)
|
||||
|
||||
def test_list_sessions_cli_runs(self) -> None:
|
||||
"""#160: list-sessions CLI enumerates stored sessions in text + json."""
|
||||
import json
|
||||
import tempfile
|
||||
from src.session_store import StoredSession, save_session
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
for sid in ['alpha', 'bravo']:
|
||||
save_session(
|
||||
StoredSession(session_id=sid, messages=('hi',), input_tokens=1, output_tokens=2),
|
||||
tmp_path,
|
||||
)
|
||||
# text mode
|
||||
text_result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'list-sessions', '--directory', str(tmp_path)],
|
||||
check=True, capture_output=True, text=True,
|
||||
)
|
||||
self.assertIn('alpha', text_result.stdout)
|
||||
self.assertIn('bravo', text_result.stdout)
|
||||
# json mode
|
||||
json_result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'list-sessions',
|
||||
'--directory', str(tmp_path), '--output-format', 'json'],
|
||||
check=True, capture_output=True, text=True,
|
||||
)
|
||||
data = json.loads(json_result.stdout)
|
||||
# Verify common envelope fields (SCHEMAS.md contract)
|
||||
self.assertIn('timestamp', data)
|
||||
self.assertEqual(data['command'], 'list-sessions')
|
||||
self.assertEqual(data['schema_version'], '1.0')
|
||||
# Verify command-specific fields
|
||||
self.assertEqual(data['sessions'], ['alpha', 'bravo'])
|
||||
self.assertEqual(data['count'], 2)
|
||||
|
||||
def test_delete_session_cli_idempotent(self) -> None:
|
||||
"""#160: delete-session CLI is idempotent (not-found is exit 0, status=not_found)."""
|
||||
import json
|
||||
import tempfile
|
||||
from src.session_store import StoredSession, save_session
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
save_session(
|
||||
StoredSession(session_id='once', messages=('hi',), input_tokens=1, output_tokens=2),
|
||||
tmp_path,
|
||||
)
|
||||
# first delete: success
|
||||
first = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'delete-session', 'once',
|
||||
'--directory', str(tmp_path), '--output-format', 'json'],
|
||||
capture_output=True, text=True,
|
||||
)
|
||||
self.assertEqual(first.returncode, 0)
|
||||
envelope_first = json.loads(first.stdout)
|
||||
# Verify common envelope fields (SCHEMAS.md contract)
|
||||
self.assertIn('timestamp', envelope_first)
|
||||
self.assertEqual(envelope_first['command'], 'delete-session')
|
||||
self.assertEqual(envelope_first['exit_code'], 0)
|
||||
self.assertEqual(envelope_first['schema_version'], '1.0')
|
||||
# Verify command-specific fields
|
||||
self.assertEqual(envelope_first['session_id'], 'once')
|
||||
self.assertEqual(envelope_first['deleted'], True)
|
||||
self.assertEqual(envelope_first['status'], 'deleted')
|
||||
# second delete: idempotent, still exit 0
|
||||
second = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'delete-session', 'once',
|
||||
'--directory', str(tmp_path), '--output-format', 'json'],
|
||||
capture_output=True, text=True,
|
||||
)
|
||||
self.assertEqual(second.returncode, 0)
|
||||
envelope_second = json.loads(second.stdout)
|
||||
self.assertEqual(envelope_second['session_id'], 'once')
|
||||
self.assertEqual(envelope_second['deleted'], False)
|
||||
self.assertEqual(envelope_second['status'], 'not_found')
|
||||
|
||||
def test_delete_session_cli_partial_failure_exit_1(self) -> None:
|
||||
"""#160: partial-failure (permission error) surfaces as exit 1 + typed JSON error."""
|
||||
import json
|
||||
import tempfile
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
bad = tmp_path / 'locked.json'
|
||||
bad.mkdir()
|
||||
try:
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'delete-session', 'locked',
|
||||
'--directory', str(tmp_path), '--output-format', 'json'],
|
||||
capture_output=True, text=True,
|
||||
)
|
||||
self.assertEqual(result.returncode, 1)
|
||||
data = json.loads(result.stdout)
|
||||
self.assertFalse(data['deleted'])
|
||||
self.assertEqual(data['error']['kind'], 'session_delete_failed')
|
||||
self.assertTrue(data['error']['retryable'])
|
||||
finally:
|
||||
bad.rmdir()
|
||||
|
||||
def test_tool_permission_filtering_cli_runs(self) -> None:
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'tools', '--limit', '10', '--deny-prefix', 'mcp'],
|
||||
|
||||
156
tests/test_run_turn_loop_cancellation.py
Normal file
156
tests/test_run_turn_loop_cancellation.py
Normal file
@@ -0,0 +1,156 @@
|
||||
"""Tests for run_turn_loop timeout triggering cooperative cancel (ROADMAP #164 Stage A).
|
||||
|
||||
End-to-end integration: when the wall-clock timeout fires in run_turn_loop,
|
||||
the runtime must signal the cancel_event so any in-flight submit_message
|
||||
thread sees it at its next safe checkpoint and returns without mutating
|
||||
state.
|
||||
|
||||
This closes the gap filed in #164: #161's timeout bounded caller wait but
|
||||
did not prevent ghost turns.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import sys
|
||||
import threading
|
||||
import time
|
||||
from pathlib import Path
|
||||
from unittest.mock import patch
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
from src.models import UsageSummary # noqa: E402
|
||||
from src.query_engine import TurnResult # noqa: E402
|
||||
from src.runtime import PortRuntime # noqa: E402
|
||||
|
||||
|
||||
def _completed(prompt: str) -> TurnResult:
|
||||
return TurnResult(
|
||||
prompt=prompt,
|
||||
output='ok',
|
||||
matched_commands=(),
|
||||
matched_tools=(),
|
||||
permission_denials=(),
|
||||
usage=UsageSummary(),
|
||||
stop_reason='completed',
|
||||
)
|
||||
|
||||
|
||||
class TestTimeoutPropagatesCancelEvent:
|
||||
def test_runtime_passes_cancel_event_to_submit_message(self) -> None:
|
||||
"""submit_message receives a cancel_event when a deadline is in play."""
|
||||
runtime = PortRuntime()
|
||||
captured_event: list[threading.Event | None] = []
|
||||
|
||||
def _capture(prompt, commands, tools, denials, cancel_event=None):
|
||||
captured_event.append(cancel_event)
|
||||
return _completed(prompt)
|
||||
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.submit_message.side_effect = _capture
|
||||
|
||||
runtime.run_turn_loop(
|
||||
'hello', max_turns=1, timeout_seconds=5.0,
|
||||
)
|
||||
|
||||
# Runtime passed a real Event object, not None
|
||||
assert len(captured_event) == 1
|
||||
assert isinstance(captured_event[0], threading.Event)
|
||||
|
||||
def test_legacy_no_timeout_does_not_pass_cancel_event(self) -> None:
|
||||
"""Without timeout_seconds, the cancel_event is None (legacy behaviour)."""
|
||||
runtime = PortRuntime()
|
||||
captured_kwargs: list[dict] = []
|
||||
|
||||
def _capture(prompt, commands, tools, denials):
|
||||
# Legacy call signature: no cancel_event kwarg
|
||||
captured_kwargs.append({'prompt': prompt})
|
||||
return _completed(prompt)
|
||||
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.submit_message.side_effect = _capture
|
||||
|
||||
runtime.run_turn_loop('hello', max_turns=1)
|
||||
|
||||
# Legacy path didn't pass cancel_event at all
|
||||
assert len(captured_kwargs) == 1
|
||||
|
||||
def test_timeout_sets_cancel_event_before_returning(self) -> None:
|
||||
"""When timeout fires mid-call, the event is set and the still-running
|
||||
thread would see 'cancelled' if it checks before returning."""
|
||||
runtime = PortRuntime()
|
||||
observed_events_at_checkpoint: list[bool] = []
|
||||
release = threading.Event() # test-side release so the thread doesn't leak forever
|
||||
|
||||
def _slow_submit(prompt, commands, tools, denials, cancel_event=None):
|
||||
# Simulate provider work: block until either cancel or a test-side release.
|
||||
# If cancel fires, check if the event is observably set.
|
||||
start = time.monotonic()
|
||||
while time.monotonic() - start < 2.0:
|
||||
if cancel_event is not None and cancel_event.is_set():
|
||||
observed_events_at_checkpoint.append(True)
|
||||
return TurnResult(
|
||||
prompt=prompt, output='',
|
||||
matched_commands=(), matched_tools=(),
|
||||
permission_denials=(), usage=UsageSummary(),
|
||||
stop_reason='cancelled',
|
||||
)
|
||||
if release.is_set():
|
||||
break
|
||||
time.sleep(0.05)
|
||||
return _completed(prompt)
|
||||
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.submit_message.side_effect = _slow_submit
|
||||
|
||||
# Tight deadline: 0.2s, submit will be mid-loop when timeout fires
|
||||
start = time.monotonic()
|
||||
results = runtime.run_turn_loop(
|
||||
'hello', max_turns=1, timeout_seconds=0.2,
|
||||
)
|
||||
elapsed = time.monotonic() - start
|
||||
release.set() # let the background thread exit cleanly
|
||||
|
||||
# Runtime returned a timeout TurnResult to the caller
|
||||
assert results[-1].stop_reason == 'timeout'
|
||||
# And it happened within a reasonable window of the deadline
|
||||
assert elapsed < 1.5, f'runtime did not honour deadline: {elapsed:.2f}s'
|
||||
|
||||
# Give the background thread a moment to observe the cancel.
|
||||
# We don't assert on it directly (thread-level observability is
|
||||
# timing-dependent), but the contract is: the event IS set, so any
|
||||
# cooperative checkpoint will see it.
|
||||
time.sleep(0.3)
|
||||
|
||||
|
||||
class TestCancelEventSharedAcrossTurns:
|
||||
"""Event is created once per run_turn_loop invocation and shared across turns."""
|
||||
|
||||
def test_same_event_threaded_to_every_submit_message(self) -> None:
|
||||
runtime = PortRuntime()
|
||||
captured_events: list[threading.Event] = []
|
||||
|
||||
def _capture(prompt, commands, tools, denials, cancel_event=None):
|
||||
if cancel_event is not None:
|
||||
captured_events.append(cancel_event)
|
||||
return _completed(prompt)
|
||||
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.submit_message.side_effect = _capture
|
||||
|
||||
runtime.run_turn_loop(
|
||||
'hello', max_turns=3, timeout_seconds=5.0,
|
||||
continuation_prompt='continue',
|
||||
)
|
||||
|
||||
# All 3 turns received the same event object (same identity)
|
||||
assert len(captured_events) == 3
|
||||
assert all(e is captured_events[0] for e in captured_events), (
|
||||
'runtime must share one cancel_event across turns, not create '
|
||||
'a new one per turn \u2014 otherwise a late-arriving cancel on turn '
|
||||
'N-1 cannot affect turn N'
|
||||
)
|
||||
161
tests/test_run_turn_loop_continuation.py
Normal file
161
tests/test_run_turn_loop_continuation.py
Normal file
@@ -0,0 +1,161 @@
|
||||
"""Tests for run_turn_loop continuation contract (ROADMAP #163).
|
||||
|
||||
The deprecated ``f'{prompt} [turn N]'`` suffix injection is gone. Verifies:
|
||||
- No ``[turn N]`` string ever lands in a submitted prompt
|
||||
- Default (``continuation_prompt=None``) stops the loop after turn 0
|
||||
- Explicit ``continuation_prompt`` is submitted verbatim on subsequent turns
|
||||
- The first turn always gets the original prompt, not the continuation
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from unittest.mock import patch
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
from src.models import UsageSummary # noqa: E402
|
||||
from src.query_engine import TurnResult # noqa: E402
|
||||
from src.runtime import PortRuntime # noqa: E402
|
||||
|
||||
|
||||
def _completed_result(prompt: str) -> TurnResult:
|
||||
return TurnResult(
|
||||
prompt=prompt,
|
||||
output='ok',
|
||||
matched_commands=(),
|
||||
matched_tools=(),
|
||||
permission_denials=(),
|
||||
usage=UsageSummary(),
|
||||
stop_reason='completed',
|
||||
)
|
||||
|
||||
|
||||
class TestNoTurnSuffixInjection:
|
||||
"""Core acceptance: no prompt submitted to the engine ever contains '[turn N]'."""
|
||||
|
||||
def test_default_path_submits_original_prompt_only(self) -> None:
|
||||
runtime = PortRuntime()
|
||||
submitted: list[str] = []
|
||||
|
||||
def _capture(prompt, commands, tools, denials):
|
||||
submitted.append(prompt)
|
||||
return _completed_result(prompt)
|
||||
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.submit_message.side_effect = _capture
|
||||
|
||||
runtime.run_turn_loop('investigate this bug', max_turns=3)
|
||||
|
||||
# Without continuation_prompt, only turn 0 should run
|
||||
assert submitted == ['investigate this bug']
|
||||
# And no '[turn N]' suffix anywhere
|
||||
for p in submitted:
|
||||
assert '[turn' not in p, f'found [turn suffix in submitted prompt: {p!r}'
|
||||
|
||||
def test_with_continuation_prompt_no_turn_suffix(self) -> None:
|
||||
runtime = PortRuntime()
|
||||
submitted: list[str] = []
|
||||
|
||||
def _capture(prompt, commands, tools, denials):
|
||||
submitted.append(prompt)
|
||||
return _completed_result(prompt)
|
||||
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.submit_message.side_effect = _capture
|
||||
|
||||
runtime.run_turn_loop(
|
||||
'investigate this bug',
|
||||
max_turns=3,
|
||||
continuation_prompt='Continue.',
|
||||
)
|
||||
|
||||
# Turn 0 = original, turns 1-2 = continuation, verbatim
|
||||
assert submitted == ['investigate this bug', 'Continue.', 'Continue.']
|
||||
# No harness-injected suffix anywhere
|
||||
for p in submitted:
|
||||
assert '[turn' not in p
|
||||
assert not p.endswith(']')
|
||||
|
||||
|
||||
class TestContinuationDefaultStopsAfterTurnZero:
|
||||
def test_default_continuation_returns_one_result(self) -> None:
|
||||
runtime = PortRuntime()
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.submit_message.side_effect = lambda p, *_: _completed_result(p)
|
||||
|
||||
results = runtime.run_turn_loop('x', max_turns=5)
|
||||
assert len(results) == 1
|
||||
assert results[0].prompt == 'x'
|
||||
|
||||
def test_default_continuation_does_not_call_engine_twice(self) -> None:
|
||||
runtime = PortRuntime()
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.submit_message.side_effect = lambda p, *_: _completed_result(p)
|
||||
|
||||
runtime.run_turn_loop('x', max_turns=10)
|
||||
# Exactly one submit_message call despite max_turns=10
|
||||
assert engine.submit_message.call_count == 1
|
||||
|
||||
|
||||
class TestExplicitContinuationBehaviour:
|
||||
def test_first_turn_always_uses_original_prompt(self) -> None:
|
||||
runtime = PortRuntime()
|
||||
captured: list[str] = []
|
||||
|
||||
def _capture(prompt, *_):
|
||||
captured.append(prompt)
|
||||
return _completed_result(prompt)
|
||||
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.submit_message.side_effect = _capture
|
||||
|
||||
runtime.run_turn_loop(
|
||||
'original task', max_turns=2, continuation_prompt='keep going'
|
||||
)
|
||||
|
||||
assert captured[0] == 'original task'
|
||||
assert captured[1] == 'keep going'
|
||||
|
||||
def test_continuation_respects_max_turns(self) -> None:
|
||||
runtime = PortRuntime()
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.submit_message.side_effect = lambda p, *_: _completed_result(p)
|
||||
|
||||
runtime.run_turn_loop('x', max_turns=3, continuation_prompt='go')
|
||||
assert engine.submit_message.call_count == 3
|
||||
|
||||
|
||||
class TestCLIContinuationFlag:
|
||||
def test_cli_default_runs_one_turn(self) -> None:
|
||||
"""Without --continuation-prompt, CLI should emit exactly '## Turn 1'."""
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'turn-loop', 'review MCP tool',
|
||||
'--max-turns', '3', '--structured-output'],
|
||||
check=True, capture_output=True, text=True,
|
||||
)
|
||||
assert '## Turn 1' in result.stdout
|
||||
assert '## Turn 2' not in result.stdout
|
||||
assert '[turn' not in result.stdout
|
||||
|
||||
def test_cli_with_continuation_runs_multiple_turns(self) -> None:
|
||||
"""With --continuation-prompt, CLI should run up to max_turns."""
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'turn-loop', 'review MCP tool',
|
||||
'--max-turns', '2', '--structured-output',
|
||||
'--continuation-prompt', 'continue'],
|
||||
check=True, capture_output=True, text=True,
|
||||
)
|
||||
assert '## Turn 1' in result.stdout
|
||||
assert '## Turn 2' in result.stdout
|
||||
# The continuation text is visible (it's submitted as the turn prompt)
|
||||
# but no harness-injected [turn N] suffix
|
||||
assert '[turn' not in result.stdout
|
||||
95
tests/test_run_turn_loop_permissions.py
Normal file
95
tests/test_run_turn_loop_permissions.py
Normal file
@@ -0,0 +1,95 @@
|
||||
"""Tests for run_turn_loop permission denials parity (ROADMAP #159).
|
||||
|
||||
Verifies that multi-turn sessions have the same security posture as
|
||||
single-turn bootstrap_session: denied_tools are inferred from matches
|
||||
and threaded through every turn, not hardcoded empty.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
from src.runtime import PortRuntime # noqa: E402
|
||||
|
||||
|
||||
class TestPermissionDenialsInTurnLoop:
|
||||
"""#159: permission denials must be non-empty in run_turn_loop,
|
||||
matching what bootstrap_session produces for the same prompt.
|
||||
"""
|
||||
|
||||
def test_turn_loop_surfaces_permission_denials_like_bootstrap(self) -> None:
|
||||
"""Symmetry check: turn_loop and bootstrap_session infer the same denials."""
|
||||
runtime = PortRuntime()
|
||||
prompt = 'run bash ls'
|
||||
|
||||
# Single-turn via bootstrap
|
||||
bootstrap_result = runtime.bootstrap_session(prompt)
|
||||
bootstrap_denials = bootstrap_result.turn_result.permission_denials
|
||||
|
||||
# Multi-turn via run_turn_loop (single turn, no continuation)
|
||||
loop_results = runtime.run_turn_loop(prompt, max_turns=1)
|
||||
loop_denials = loop_results[0].permission_denials
|
||||
|
||||
# Both should infer denials for bash-family tools
|
||||
assert len(bootstrap_denials) > 0, (
|
||||
'bootstrap_session should deny bash-family tools'
|
||||
)
|
||||
assert len(loop_denials) > 0, (
|
||||
f'#159 regression: run_turn_loop returned empty denials; '
|
||||
f'expected {len(bootstrap_denials)} like bootstrap_session'
|
||||
)
|
||||
|
||||
# The denial kinds should match (both deny the same tools)
|
||||
bootstrap_denied_names = {d.tool_name for d in bootstrap_denials}
|
||||
loop_denied_names = {d.tool_name for d in loop_denials}
|
||||
assert bootstrap_denied_names == loop_denied_names, (
|
||||
f'asymmetric denials: bootstrap denied {bootstrap_denied_names}, '
|
||||
f'loop denied {loop_denied_names}'
|
||||
)
|
||||
|
||||
def test_turn_loop_with_continuation_preserves_denials(self) -> None:
|
||||
"""Denials are inferred once at loop start, then passed to every turn."""
|
||||
runtime = PortRuntime()
|
||||
from unittest.mock import patch
|
||||
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
from src.models import UsageSummary
|
||||
from src.query_engine import TurnResult
|
||||
|
||||
engine = mock_factory.return_value
|
||||
submitted_denials: list[tuple] = []
|
||||
|
||||
def _capture(prompt, commands, tools, denials):
|
||||
submitted_denials.append(denials)
|
||||
return TurnResult(
|
||||
prompt=prompt,
|
||||
output='ok',
|
||||
matched_commands=(),
|
||||
matched_tools=(),
|
||||
permission_denials=denials, # echo back the denials
|
||||
usage=UsageSummary(),
|
||||
stop_reason='completed',
|
||||
)
|
||||
|
||||
engine.submit_message.side_effect = _capture
|
||||
|
||||
loop_results = runtime.run_turn_loop(
|
||||
'run bash rm', max_turns=2, continuation_prompt='continue'
|
||||
)
|
||||
|
||||
# Both turn 0 and turn 1 should have received the same denials
|
||||
assert len(submitted_denials) == 2
|
||||
assert submitted_denials[0] == submitted_denials[1], (
|
||||
'denials should be consistent across all turns'
|
||||
)
|
||||
# And they should be non-empty (bash is destructive)
|
||||
assert len(submitted_denials[0]) > 0, (
|
||||
'turn-loop denials were empty — #159 regression'
|
||||
)
|
||||
|
||||
# Turn results should reflect the denials that were passed
|
||||
for result in loop_results:
|
||||
assert len(result.permission_denials) > 0
|
||||
179
tests/test_run_turn_loop_timeout.py
Normal file
179
tests/test_run_turn_loop_timeout.py
Normal file
@@ -0,0 +1,179 @@
|
||||
"""Tests for run_turn_loop wall-clock timeout (ROADMAP #161).
|
||||
|
||||
Covers:
|
||||
- timeout_seconds=None preserves legacy unbounded behaviour
|
||||
- timeout_seconds=X aborts a hung turn and emits stop_reason='timeout'
|
||||
- Timeout budget is total wall-clock across all turns, not per-turn
|
||||
- Already-exhausted budget short-circuits before the first turn runs
|
||||
- Legacy path still runs without a ThreadPoolExecutor in the way
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import sys
|
||||
import time
|
||||
from pathlib import Path
|
||||
from unittest.mock import patch
|
||||
|
||||
import pytest
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
from src.models import UsageSummary # noqa: E402
|
||||
from src.query_engine import TurnResult # noqa: E402
|
||||
from src.runtime import PortRuntime # noqa: E402
|
||||
|
||||
|
||||
def _completed_result(prompt: str) -> TurnResult:
|
||||
return TurnResult(
|
||||
prompt=prompt,
|
||||
output='ok',
|
||||
matched_commands=(),
|
||||
matched_tools=(),
|
||||
permission_denials=(),
|
||||
usage=UsageSummary(),
|
||||
stop_reason='completed',
|
||||
)
|
||||
|
||||
|
||||
class TestLegacyUnboundedBehaviour:
|
||||
def test_no_timeout_preserves_existing_behaviour(self) -> None:
|
||||
"""timeout_seconds=None must not change legacy path at all."""
|
||||
results = PortRuntime().run_turn_loop('review MCP tool', max_turns=2)
|
||||
assert len(results) >= 1
|
||||
for r in results:
|
||||
assert r.stop_reason in {'completed', 'max_turns_reached', 'max_budget_reached'}
|
||||
assert r.stop_reason != 'timeout'
|
||||
|
||||
|
||||
class TestTimeoutAbortsHungTurn:
|
||||
def test_hung_submit_message_times_out(self) -> None:
|
||||
"""A stalled submit_message must be aborted and emit stop_reason='timeout'."""
|
||||
runtime = PortRuntime()
|
||||
|
||||
# #164 Stage A: runtime now passes cancel_event as a 5th positional
|
||||
# arg on the timeout path, so mocks must accept it (even if they ignore it).
|
||||
def _hang(prompt, commands, tools, denials, cancel_event=None):
|
||||
time.sleep(5.0) # would block the loop
|
||||
return _completed_result(prompt)
|
||||
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.config = None # attribute-assigned in run_turn_loop
|
||||
engine.submit_message.side_effect = _hang
|
||||
|
||||
start = time.monotonic()
|
||||
results = runtime.run_turn_loop(
|
||||
'review MCP tool', max_turns=3, timeout_seconds=0.3
|
||||
)
|
||||
elapsed = time.monotonic() - start
|
||||
|
||||
# Must exit well under the 5s hang
|
||||
assert elapsed < 1.5, f'run_turn_loop did not honor timeout: {elapsed:.2f}s'
|
||||
assert len(results) == 1
|
||||
assert results[-1].stop_reason == 'timeout'
|
||||
|
||||
|
||||
class TestTimeoutBudgetIsTotal:
|
||||
def test_budget_is_cumulative_across_turns(self) -> None:
|
||||
"""timeout_seconds is total wall-clock across all turns, not per-turn.
|
||||
|
||||
#163 interaction: multi-turn behaviour now requires an explicit
|
||||
``continuation_prompt``; otherwise the loop stops after turn 0 and
|
||||
the cumulative-budget contract is trivially satisfied. We supply one
|
||||
here so the test actually exercises the cross-turn deadline.
|
||||
"""
|
||||
runtime = PortRuntime()
|
||||
call_count = {'n': 0}
|
||||
|
||||
def _slow(prompt, commands, tools, denials, cancel_event=None):
|
||||
call_count['n'] += 1
|
||||
time.sleep(0.4) # each turn burns 0.4s
|
||||
return _completed_result(prompt)
|
||||
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.submit_message.side_effect = _slow
|
||||
|
||||
start = time.monotonic()
|
||||
# 0.6s budget, 0.4s per turn. First turn completes (~0.4s),
|
||||
# second turn times out before finishing.
|
||||
results = runtime.run_turn_loop(
|
||||
'review MCP tool',
|
||||
max_turns=5,
|
||||
timeout_seconds=0.6,
|
||||
continuation_prompt='continue',
|
||||
)
|
||||
elapsed = time.monotonic() - start
|
||||
|
||||
# Should exit at around 0.6s, not 2.0s (5 turns * 0.4s)
|
||||
assert elapsed < 1.5, f'cumulative budget not honored: {elapsed:.2f}s'
|
||||
# Last result should be the timeout
|
||||
assert results[-1].stop_reason == 'timeout'
|
||||
|
||||
|
||||
class TestExhaustedBudget:
|
||||
def test_zero_timeout_short_circuits_first_turn(self) -> None:
|
||||
"""timeout_seconds=0 emits timeout before the first submit_message call."""
|
||||
runtime = PortRuntime()
|
||||
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
# submit_message should never be called when budget is already 0
|
||||
engine.submit_message.side_effect = AssertionError(
|
||||
'submit_message should not run when budget is exhausted'
|
||||
)
|
||||
|
||||
results = runtime.run_turn_loop(
|
||||
'review MCP tool', max_turns=3, timeout_seconds=0.0
|
||||
)
|
||||
|
||||
assert len(results) == 1
|
||||
assert results[0].stop_reason == 'timeout'
|
||||
|
||||
|
||||
class TestTimeoutResultShape:
|
||||
def test_timeout_result_has_correct_prompt_and_matches(self) -> None:
|
||||
"""Synthetic TurnResult on timeout must carry the turn's prompt + routed matches."""
|
||||
runtime = PortRuntime()
|
||||
|
||||
def _hang(prompt, commands, tools, denials, cancel_event=None):
|
||||
time.sleep(5.0)
|
||||
return _completed_result(prompt)
|
||||
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.submit_message.side_effect = _hang
|
||||
|
||||
results = runtime.run_turn_loop(
|
||||
'review MCP tool', max_turns=2, timeout_seconds=0.2
|
||||
)
|
||||
|
||||
timeout_result = results[-1]
|
||||
assert timeout_result.stop_reason == 'timeout'
|
||||
assert timeout_result.prompt == 'review MCP tool'
|
||||
# matched_commands / matched_tools should still be populated from routing,
|
||||
# so downstream transcripts don't lose the routing context.
|
||||
# These may be empty tuples depending on routing; they must be tuples.
|
||||
assert isinstance(timeout_result.matched_commands, tuple)
|
||||
assert isinstance(timeout_result.matched_tools, tuple)
|
||||
assert isinstance(timeout_result.usage, UsageSummary)
|
||||
|
||||
|
||||
class TestNegativeTimeoutTreatedAsExhausted:
|
||||
def test_negative_timeout_short_circuits(self) -> None:
|
||||
"""A negative budget should behave identically to exhausted."""
|
||||
runtime = PortRuntime()
|
||||
|
||||
with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
|
||||
engine = mock_factory.return_value
|
||||
engine.submit_message.side_effect = AssertionError(
|
||||
'submit_message should not run when budget is negative'
|
||||
)
|
||||
|
||||
results = runtime.run_turn_loop(
|
||||
'review MCP tool', max_turns=3, timeout_seconds=-1.0
|
||||
)
|
||||
|
||||
assert len(results) == 1
|
||||
assert results[0].stop_reason == 'timeout'
|
||||
173
tests/test_session_store.py
Normal file
173
tests/test_session_store.py
Normal file
@@ -0,0 +1,173 @@
|
||||
"""Tests for session_store CRUD surface (ROADMAP #160).
|
||||
|
||||
Covers:
|
||||
- list_sessions enumeration
|
||||
- session_exists boolean check
|
||||
- delete_session idempotency + race-safety + partial-failure contract
|
||||
- SessionNotFoundError typing (KeyError subclass)
|
||||
- SessionDeleteError typing (OSError subclass)
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent / 'src'))
|
||||
|
||||
from session_store import ( # noqa: E402
|
||||
StoredSession,
|
||||
SessionDeleteError,
|
||||
SessionNotFoundError,
|
||||
delete_session,
|
||||
list_sessions,
|
||||
load_session,
|
||||
save_session,
|
||||
session_exists,
|
||||
)
|
||||
|
||||
|
||||
def _make_session(session_id: str) -> StoredSession:
|
||||
return StoredSession(
|
||||
session_id=session_id,
|
||||
messages=('hello',),
|
||||
input_tokens=1,
|
||||
output_tokens=2,
|
||||
)
|
||||
|
||||
|
||||
class TestListSessions:
|
||||
def test_empty_directory_returns_empty_list(self, tmp_path: Path) -> None:
|
||||
assert list_sessions(tmp_path) == []
|
||||
|
||||
def test_nonexistent_directory_returns_empty_list(self, tmp_path: Path) -> None:
|
||||
missing = tmp_path / 'never-created'
|
||||
assert list_sessions(missing) == []
|
||||
|
||||
def test_lists_saved_sessions_sorted(self, tmp_path: Path) -> None:
|
||||
save_session(_make_session('charlie'), tmp_path)
|
||||
save_session(_make_session('alpha'), tmp_path)
|
||||
save_session(_make_session('bravo'), tmp_path)
|
||||
assert list_sessions(tmp_path) == ['alpha', 'bravo', 'charlie']
|
||||
|
||||
def test_ignores_non_json_files(self, tmp_path: Path) -> None:
|
||||
save_session(_make_session('real'), tmp_path)
|
||||
(tmp_path / 'notes.txt').write_text('ignore me')
|
||||
(tmp_path / 'data.yaml').write_text('ignore me too')
|
||||
assert list_sessions(tmp_path) == ['real']
|
||||
|
||||
|
||||
class TestSessionExists:
|
||||
def test_returns_true_for_saved_session(self, tmp_path: Path) -> None:
|
||||
save_session(_make_session('present'), tmp_path)
|
||||
assert session_exists('present', tmp_path) is True
|
||||
|
||||
def test_returns_false_for_missing_session(self, tmp_path: Path) -> None:
|
||||
assert session_exists('absent', tmp_path) is False
|
||||
|
||||
def test_returns_false_for_nonexistent_directory(self, tmp_path: Path) -> None:
|
||||
missing = tmp_path / 'never-created'
|
||||
assert session_exists('anything', missing) is False
|
||||
|
||||
|
||||
class TestLoadSession:
|
||||
def test_raises_typed_error_on_missing(self, tmp_path: Path) -> None:
|
||||
with pytest.raises(SessionNotFoundError) as exc_info:
|
||||
load_session('nonexistent', tmp_path)
|
||||
assert 'nonexistent' in str(exc_info.value)
|
||||
|
||||
def test_not_found_error_is_keyerror_subclass(self, tmp_path: Path) -> None:
|
||||
"""Orchestrators catching KeyError should still work."""
|
||||
with pytest.raises(KeyError):
|
||||
load_session('nonexistent', tmp_path)
|
||||
|
||||
def test_not_found_error_is_not_filenotfounderror(self, tmp_path: Path) -> None:
|
||||
"""Callers can distinguish 'not found' from IO errors."""
|
||||
with pytest.raises(SessionNotFoundError):
|
||||
load_session('nonexistent', tmp_path)
|
||||
# Specifically, it should NOT match bare FileNotFoundError alone
|
||||
# (SessionNotFoundError inherits from KeyError, not FileNotFoundError)
|
||||
assert not issubclass(SessionNotFoundError, FileNotFoundError)
|
||||
|
||||
|
||||
class TestDeleteSessionIdempotency:
|
||||
"""Contract: delete_session(x) followed by delete_session(x) must be safe."""
|
||||
|
||||
def test_first_delete_returns_true(self, tmp_path: Path) -> None:
|
||||
save_session(_make_session('to-delete'), tmp_path)
|
||||
assert delete_session('to-delete', tmp_path) is True
|
||||
|
||||
def test_second_delete_returns_false_no_raise(self, tmp_path: Path) -> None:
|
||||
"""Idempotency: deleting an already-deleted session is a no-op."""
|
||||
save_session(_make_session('once'), tmp_path)
|
||||
delete_session('once', tmp_path)
|
||||
# Second call must not raise
|
||||
assert delete_session('once', tmp_path) is False
|
||||
|
||||
def test_delete_nonexistent_returns_false_no_raise(self, tmp_path: Path) -> None:
|
||||
"""Never-existed session is treated identically to already-deleted."""
|
||||
assert delete_session('never-existed', tmp_path) is False
|
||||
|
||||
def test_delete_removes_only_target(self, tmp_path: Path) -> None:
|
||||
save_session(_make_session('keep'), tmp_path)
|
||||
save_session(_make_session('remove'), tmp_path)
|
||||
delete_session('remove', tmp_path)
|
||||
assert list_sessions(tmp_path) == ['keep']
|
||||
|
||||
|
||||
class TestDeleteSessionPartialFailure:
|
||||
"""Contract: file exists but cannot be removed -> SessionDeleteError."""
|
||||
|
||||
def test_partial_failure_raises_session_delete_error(self, tmp_path: Path) -> None:
|
||||
"""If a directory exists where a session file should be, unlink fails."""
|
||||
bad_path = tmp_path / 'locked.json'
|
||||
bad_path.mkdir()
|
||||
try:
|
||||
with pytest.raises(SessionDeleteError) as exc_info:
|
||||
delete_session('locked', tmp_path)
|
||||
# Underlying cause should be wrapped
|
||||
assert exc_info.value.__cause__ is not None
|
||||
assert isinstance(exc_info.value.__cause__, OSError)
|
||||
finally:
|
||||
bad_path.rmdir()
|
||||
|
||||
def test_delete_error_is_oserror_subclass(self, tmp_path: Path) -> None:
|
||||
"""Callers catching OSError should still work for retries."""
|
||||
bad_path = tmp_path / 'locked.json'
|
||||
bad_path.mkdir()
|
||||
try:
|
||||
with pytest.raises(OSError):
|
||||
delete_session('locked', tmp_path)
|
||||
finally:
|
||||
bad_path.rmdir()
|
||||
|
||||
|
||||
class TestRaceSafety:
|
||||
"""Contract: delete_session must be race-safe between exists-check and unlink."""
|
||||
|
||||
def test_concurrent_deletion_returns_false_not_raises(
|
||||
self, tmp_path: Path, monkeypatch
|
||||
) -> None:
|
||||
"""If another process deletes between exists-check and unlink, return False."""
|
||||
save_session(_make_session('racy'), tmp_path)
|
||||
# Simulate: file disappears right before unlink (concurrent deletion)
|
||||
path = tmp_path / 'racy.json'
|
||||
path.unlink()
|
||||
# Now delete_session should return False, not raise
|
||||
assert delete_session('racy', tmp_path) is False
|
||||
|
||||
|
||||
class TestRoundtrip:
|
||||
def test_save_list_load_delete_cycle(self, tmp_path: Path) -> None:
|
||||
session = _make_session('lifecycle')
|
||||
save_session(session, tmp_path)
|
||||
assert 'lifecycle' in list_sessions(tmp_path)
|
||||
assert session_exists('lifecycle', tmp_path)
|
||||
loaded = load_session('lifecycle', tmp_path)
|
||||
assert loaded.session_id == 'lifecycle'
|
||||
assert loaded.messages == ('hello',)
|
||||
assert delete_session('lifecycle', tmp_path) is True
|
||||
assert not session_exists('lifecycle', tmp_path)
|
||||
assert list_sessions(tmp_path) == []
|
||||
203
tests/test_show_command_tool_output_format.py
Normal file
203
tests/test_show_command_tool_output_format.py
Normal file
@@ -0,0 +1,203 @@
|
||||
"""Tests for --output-format flag on show-command and show-tool (ROADMAP #167).
|
||||
|
||||
Verifies parity with session-lifecycle CLI family (#160/#165/#166):
|
||||
- show-command and show-tool now accept --output-format {text,json}
|
||||
- Found case returns success with JSON envelope: {name, found: true, source_hint, responsibility}
|
||||
- Not-found case returns typed error envelope: {name, found: false, error: {kind, message, retryable}}
|
||||
- Legacy text output (default) unchanged for backward compat
|
||||
- Exit code 0 on success, 1 on not-found (matching load-session contract)
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
|
||||
class TestShowCommandOutputFormat:
|
||||
"""show-command --output-format {text,json} parity with session-lifecycle family."""
|
||||
|
||||
def test_show_command_found_json(self) -> None:
|
||||
"""show-command with found entry returns JSON envelope."""
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'show-command', 'add-dir', '--output-format', 'json'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 0, f'Expected exit 0, got {result.returncode}: {result.stderr}'
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['found'] is True
|
||||
assert envelope['name'] == 'add-dir'
|
||||
assert 'source_hint' in envelope
|
||||
assert 'responsibility' in envelope
|
||||
# No error field when found
|
||||
assert 'error' not in envelope
|
||||
|
||||
def test_show_command_not_found_json(self) -> None:
|
||||
"""show-command with missing entry returns typed error envelope."""
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'show-command', 'nonexistent-cmd', '--output-format', 'json'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 1, f'Expected exit 1 on not-found, got {result.returncode}'
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['found'] is False
|
||||
assert envelope['name'] == 'nonexistent-cmd'
|
||||
assert envelope['error']['kind'] == 'command_not_found'
|
||||
assert envelope['error']['retryable'] is False
|
||||
# No source_hint/responsibility when not found
|
||||
assert 'source_hint' not in envelope or envelope.get('source_hint') is None
|
||||
assert 'responsibility' not in envelope or envelope.get('responsibility') is None
|
||||
|
||||
def test_show_command_text_mode_backward_compat(self) -> None:
|
||||
"""show-command text mode (default) is unchanged from pre-#167."""
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'show-command', 'add-dir'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 0
|
||||
|
||||
# Text output is newline-separated (name, source_hint, responsibility)
|
||||
lines = result.stdout.strip().split('\n')
|
||||
assert len(lines) == 3
|
||||
assert lines[0] == 'add-dir'
|
||||
assert 'commands/add-dir/add-dir.tsx' in lines[1]
|
||||
|
||||
def test_show_command_text_mode_not_found(self) -> None:
|
||||
"""show-command text mode on not-found returns prose error."""
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'show-command', 'missing'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 1
|
||||
assert 'not found' in result.stdout.lower()
|
||||
assert 'missing' in result.stdout
|
||||
|
||||
def test_show_command_default_is_text(self) -> None:
|
||||
"""Omitting --output-format defaults to text."""
|
||||
result_implicit = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'show-command', 'add-dir'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
result_explicit = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'show-command', 'add-dir', '--output-format', 'text'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result_implicit.stdout == result_explicit.stdout
|
||||
|
||||
|
||||
class TestShowToolOutputFormat:
|
||||
"""show-tool --output-format {text,json} parity with session-lifecycle family."""
|
||||
|
||||
def test_show_tool_found_json(self) -> None:
|
||||
"""show-tool with found entry returns JSON envelope."""
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'show-tool', 'BashTool', '--output-format', 'json'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 0, f'Expected exit 0, got {result.returncode}: {result.stderr}'
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['found'] is True
|
||||
assert envelope['name'] == 'BashTool'
|
||||
assert 'source_hint' in envelope
|
||||
assert 'responsibility' in envelope
|
||||
assert 'error' not in envelope
|
||||
|
||||
def test_show_tool_not_found_json(self) -> None:
|
||||
"""show-tool with missing entry returns typed error envelope."""
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'show-tool', 'NotARealTool', '--output-format', 'json'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 1, f'Expected exit 1 on not-found, got {result.returncode}'
|
||||
|
||||
envelope = json.loads(result.stdout)
|
||||
assert envelope['found'] is False
|
||||
assert envelope['name'] == 'NotARealTool'
|
||||
assert envelope['error']['kind'] == 'tool_not_found'
|
||||
assert envelope['error']['retryable'] is False
|
||||
|
||||
def test_show_tool_text_mode_backward_compat(self) -> None:
|
||||
"""show-tool text mode (default) is unchanged from pre-#167."""
|
||||
result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'show-tool', 'BashTool'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 0
|
||||
|
||||
lines = result.stdout.strip().split('\n')
|
||||
assert len(lines) == 3
|
||||
assert lines[0] == 'BashTool'
|
||||
assert 'tools/BashTool/BashTool.tsx' in lines[1]
|
||||
|
||||
|
||||
class TestShowCommandToolFormatParity:
|
||||
"""Verify symmetry between show-command and show-tool formats."""
|
||||
|
||||
def test_both_accept_output_format_flag(self) -> None:
|
||||
"""Both commands accept the same --output-format choices."""
|
||||
# Just ensure both fail with invalid choice (they accept text/json)
|
||||
result_cmd = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'show-command', 'add-dir', '--output-format', 'invalid'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
result_tool = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'show-tool', 'BashTool', '--output-format', 'invalid'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
# Both should fail with argument parser error
|
||||
assert result_cmd.returncode != 0
|
||||
assert result_tool.returncode != 0
|
||||
assert 'invalid choice' in result_cmd.stderr
|
||||
assert 'invalid choice' in result_tool.stderr
|
||||
|
||||
def test_json_envelope_shape_consistency(self) -> None:
|
||||
"""Both commands return consistent JSON envelope shape."""
|
||||
cmd_result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'show-command', 'add-dir', '--output-format', 'json'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
tool_result = subprocess.run(
|
||||
[sys.executable, '-m', 'src.main', 'show-tool', 'BashTool', '--output-format', 'json'],
|
||||
cwd=Path(__file__).resolve().parent.parent,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
|
||||
cmd_envelope = json.loads(cmd_result.stdout)
|
||||
tool_envelope = json.loads(tool_result.stdout)
|
||||
|
||||
# Same top-level keys for found=true case
|
||||
assert set(cmd_envelope.keys()) == set(tool_envelope.keys())
|
||||
assert cmd_envelope['found'] is True
|
||||
assert tool_envelope['found'] is True
|
||||
167
tests/test_submit_message_budget.py
Normal file
167
tests/test_submit_message_budget.py
Normal file
@@ -0,0 +1,167 @@
|
||||
"""Tests for submit_message budget-overflow atomicity (ROADMAP #162).
|
||||
|
||||
Covers:
|
||||
- Budget overflow returns stop_reason='max_budget_reached' without mutating session
|
||||
- mutable_messages, transcript_store, permission_denials, total_usage all unchanged
|
||||
- Session persisted after overflow does not contain the overflow turn
|
||||
- Engine remains usable after overflow: subsequent in-budget call succeeds
|
||||
- Normal (non-overflow) path still commits state as before
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
from src.models import PermissionDenial, UsageSummary # noqa: E402
|
||||
from src.port_manifest import build_port_manifest # noqa: E402
|
||||
from src.query_engine import QueryEngineConfig, QueryEnginePort # noqa: E402
|
||||
from src.session_store import StoredSession, load_session, save_session # noqa: E402
|
||||
|
||||
|
||||
def _make_engine(max_budget_tokens: int = 10) -> QueryEnginePort:
|
||||
engine = QueryEnginePort(manifest=build_port_manifest())
|
||||
engine.config = QueryEngineConfig(max_budget_tokens=max_budget_tokens)
|
||||
return engine
|
||||
|
||||
|
||||
class TestBudgetOverflowDoesNotMutate:
|
||||
"""The core #162 contract: overflow must leave session state untouched."""
|
||||
|
||||
def test_mutable_messages_unchanged_on_overflow(self) -> None:
|
||||
engine = _make_engine(max_budget_tokens=10)
|
||||
pre_count = len(engine.mutable_messages)
|
||||
overflow_prompt = ' '.join(['word'] * 50)
|
||||
result = engine.submit_message(overflow_prompt)
|
||||
assert result.stop_reason == 'max_budget_reached'
|
||||
assert len(engine.mutable_messages) == pre_count
|
||||
|
||||
def test_transcript_unchanged_on_overflow(self) -> None:
|
||||
engine = _make_engine(max_budget_tokens=10)
|
||||
pre_count = len(engine.transcript_store.entries)
|
||||
overflow_prompt = ' '.join(['word'] * 50)
|
||||
result = engine.submit_message(overflow_prompt)
|
||||
assert result.stop_reason == 'max_budget_reached'
|
||||
assert len(engine.transcript_store.entries) == pre_count
|
||||
|
||||
def test_permission_denials_unchanged_on_overflow(self) -> None:
|
||||
engine = _make_engine(max_budget_tokens=10)
|
||||
pre_count = len(engine.permission_denials)
|
||||
denials = (PermissionDenial(tool_name='bash', reason='gated in test'),)
|
||||
overflow_prompt = ' '.join(['word'] * 50)
|
||||
result = engine.submit_message(overflow_prompt, denied_tools=denials)
|
||||
assert result.stop_reason == 'max_budget_reached'
|
||||
assert len(engine.permission_denials) == pre_count
|
||||
|
||||
def test_total_usage_unchanged_on_overflow(self) -> None:
|
||||
engine = _make_engine(max_budget_tokens=10)
|
||||
pre_usage = engine.total_usage
|
||||
overflow_prompt = ' '.join(['word'] * 50)
|
||||
result = engine.submit_message(overflow_prompt)
|
||||
assert result.stop_reason == 'max_budget_reached'
|
||||
assert engine.total_usage == pre_usage
|
||||
|
||||
def test_turn_result_reports_pre_mutation_usage(self) -> None:
|
||||
"""The TurnResult.usage must reflect session state as-if overflow never happened."""
|
||||
engine = _make_engine(max_budget_tokens=10)
|
||||
pre_usage = engine.total_usage
|
||||
overflow_prompt = ' '.join(['word'] * 50)
|
||||
result = engine.submit_message(overflow_prompt)
|
||||
assert result.stop_reason == 'max_budget_reached'
|
||||
assert result.usage == pre_usage
|
||||
|
||||
|
||||
class TestOverflowPersistence:
|
||||
"""Session persisted after overflow must not contain the overflow turn."""
|
||||
|
||||
def test_persisted_session_empty_when_first_turn_overflows(
|
||||
self, tmp_path: Path, monkeypatch
|
||||
) -> None:
|
||||
"""When the very first call overflows, persisted session has zero messages."""
|
||||
monkeypatch.chdir(tmp_path)
|
||||
engine = _make_engine(max_budget_tokens=10)
|
||||
overflow_prompt = ' '.join(['word'] * 50)
|
||||
result = engine.submit_message(overflow_prompt)
|
||||
assert result.stop_reason == 'max_budget_reached'
|
||||
|
||||
path_str = engine.persist_session()
|
||||
path = Path(path_str)
|
||||
assert path.exists()
|
||||
loaded = load_session(path.stem, path.parent)
|
||||
assert loaded.messages == (), (
|
||||
f'overflow turn poisoned session: {loaded.messages!r}'
|
||||
)
|
||||
|
||||
def test_persisted_session_retains_only_successful_turns(
|
||||
self, tmp_path: Path, monkeypatch
|
||||
) -> None:
|
||||
"""A successful turn followed by an overflow persists only the successful turn."""
|
||||
monkeypatch.chdir(tmp_path)
|
||||
# Budget large enough for one short turn but not a second big one.
|
||||
# Token counting is whitespace-split (see UsageSummary.add_turn),
|
||||
# so overflow prompts must contain many whitespace-separated words.
|
||||
engine = QueryEnginePort(manifest=build_port_manifest())
|
||||
engine.config = QueryEngineConfig(max_budget_tokens=50)
|
||||
|
||||
ok = engine.submit_message('short')
|
||||
assert ok.stop_reason == 'completed'
|
||||
assert 'short' in engine.mutable_messages
|
||||
|
||||
# 500 whitespace-separated tokens — definitely over a 50-token budget
|
||||
overflow_prompt = ' '.join(['word'] * 500)
|
||||
overflow = engine.submit_message(overflow_prompt)
|
||||
assert overflow.stop_reason == 'max_budget_reached'
|
||||
|
||||
path = Path(engine.persist_session())
|
||||
loaded = load_session(path.stem, path.parent)
|
||||
assert loaded.messages == ('short',), (
|
||||
f'expected only the successful turn, got {loaded.messages!r}'
|
||||
)
|
||||
|
||||
|
||||
class TestEngineUsableAfterOverflow:
|
||||
"""After overflow, engine must still be usable — overflow is rejection, not corruption."""
|
||||
|
||||
def test_subsequent_in_budget_call_succeeds(self) -> None:
|
||||
"""After an overflow rejection, raising the budget and retrying works."""
|
||||
engine = _make_engine(max_budget_tokens=10)
|
||||
overflow_prompt = ' '.join(['word'] * 100)
|
||||
overflow = engine.submit_message(overflow_prompt)
|
||||
assert overflow.stop_reason == 'max_budget_reached'
|
||||
|
||||
# Raise the budget and retry — the engine should be in a clean state
|
||||
engine.config = QueryEngineConfig(max_budget_tokens=10_000)
|
||||
ok = engine.submit_message('short retry')
|
||||
assert ok.stop_reason == 'completed'
|
||||
assert 'short retry' in engine.mutable_messages
|
||||
# The overflow prompt should never have been recorded
|
||||
assert overflow_prompt not in engine.mutable_messages
|
||||
|
||||
def test_multiple_overflow_calls_remain_idempotent(self) -> None:
|
||||
"""Repeated overflow calls must not accumulate hidden state."""
|
||||
engine = _make_engine(max_budget_tokens=10)
|
||||
overflow_prompt = ' '.join(['word'] * 50)
|
||||
for _ in range(5):
|
||||
result = engine.submit_message(overflow_prompt)
|
||||
assert result.stop_reason == 'max_budget_reached'
|
||||
assert len(engine.mutable_messages) == 0
|
||||
assert len(engine.transcript_store.entries) == 0
|
||||
assert engine.total_usage == UsageSummary()
|
||||
|
||||
|
||||
class TestNormalPathStillCommits:
|
||||
"""Regression guard: non-overflow path must still mutate state as before."""
|
||||
|
||||
def test_in_budget_turn_commits_all_state(self) -> None:
|
||||
engine = QueryEnginePort(manifest=build_port_manifest())
|
||||
engine.config = QueryEngineConfig(max_budget_tokens=10_000)
|
||||
result = engine.submit_message('review MCP tool')
|
||||
assert result.stop_reason == 'completed'
|
||||
assert len(engine.mutable_messages) == 1
|
||||
assert len(engine.transcript_store.entries) == 1
|
||||
assert engine.total_usage.input_tokens > 0
|
||||
assert engine.total_usage.output_tokens > 0
|
||||
220
tests/test_submit_message_cancellation.py
Normal file
220
tests/test_submit_message_cancellation.py
Normal file
@@ -0,0 +1,220 @@
|
||||
"""Tests for cooperative cancellation in submit_message (ROADMAP #164 Stage A).
|
||||
|
||||
Verifies that cancel_event enables safe early termination:
|
||||
- Event set before call => immediate return with stop_reason='cancelled'
|
||||
- Event set between budget check and commit => still 'cancelled', no mutation
|
||||
- Event set after commit => not observable (honest cooperative limit)
|
||||
- Legacy callers (cancel_event=None) see zero behaviour change
|
||||
- State is untouched on cancellation: mutable_messages, transcript_store,
|
||||
permission_denials, total_usage all preserved
|
||||
|
||||
This closes the #161 follow-up gap filed as #164: wedged provider threads
|
||||
can no longer silently commit ghost turns after the caller observed a
|
||||
timeout.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import sys
|
||||
import threading
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
from src.models import PermissionDenial # noqa: E402
|
||||
from src.port_manifest import build_port_manifest # noqa: E402
|
||||
from src.query_engine import QueryEngineConfig, QueryEnginePort, TurnResult # noqa: E402
|
||||
|
||||
|
||||
def _fresh_engine(**config_overrides) -> QueryEnginePort:
|
||||
config = QueryEngineConfig(**config_overrides) if config_overrides else QueryEngineConfig()
|
||||
return QueryEnginePort(manifest=build_port_manifest(), config=config)
|
||||
|
||||
|
||||
class TestCancellationBeforeCall:
|
||||
"""Event set before submit_message is invoked => immediate 'cancelled'."""
|
||||
|
||||
def test_pre_set_event_returns_cancelled_immediately(self) -> None:
|
||||
engine = _fresh_engine()
|
||||
event = threading.Event()
|
||||
event.set()
|
||||
|
||||
result = engine.submit_message('hello', cancel_event=event)
|
||||
|
||||
assert result.stop_reason == 'cancelled'
|
||||
assert result.prompt == 'hello'
|
||||
# Output is empty on pre-budget cancel (no synthesis)
|
||||
assert result.output == ''
|
||||
|
||||
def test_pre_set_event_preserves_mutable_messages(self) -> None:
|
||||
engine = _fresh_engine()
|
||||
event = threading.Event()
|
||||
event.set()
|
||||
|
||||
engine.submit_message('ghost turn', cancel_event=event)
|
||||
|
||||
assert engine.mutable_messages == [], (
|
||||
'cancelled turn must not appear in mutable_messages'
|
||||
)
|
||||
|
||||
def test_pre_set_event_preserves_transcript_store(self) -> None:
|
||||
engine = _fresh_engine()
|
||||
event = threading.Event()
|
||||
event.set()
|
||||
|
||||
engine.submit_message('ghost turn', cancel_event=event)
|
||||
|
||||
assert engine.transcript_store.entries == [], (
|
||||
'cancelled turn must not appear in transcript_store'
|
||||
)
|
||||
|
||||
def test_pre_set_event_preserves_usage_counters(self) -> None:
|
||||
engine = _fresh_engine()
|
||||
initial_usage = engine.total_usage
|
||||
event = threading.Event()
|
||||
event.set()
|
||||
|
||||
engine.submit_message('expensive prompt ' * 100, cancel_event=event)
|
||||
|
||||
assert engine.total_usage == initial_usage, (
|
||||
'cancelled turn must not increment token counters'
|
||||
)
|
||||
|
||||
def test_pre_set_event_preserves_permission_denials(self) -> None:
|
||||
engine = _fresh_engine()
|
||||
event = threading.Event()
|
||||
event.set()
|
||||
|
||||
denials = (PermissionDenial(tool_name='BashTool', reason='destructive'),)
|
||||
engine.submit_message('run bash ls', denied_tools=denials, cancel_event=event)
|
||||
|
||||
assert engine.permission_denials == [], (
|
||||
'cancelled turn must not extend permission_denials'
|
||||
)
|
||||
|
||||
|
||||
class TestCancellationAfterBudgetCheck:
|
||||
"""Event set between budget projection and commit => 'cancelled', state intact.
|
||||
|
||||
This simulates the realistic racy case: engine starts computing output,
|
||||
caller hits deadline, sets event. Engine observes at post-budget checkpoint
|
||||
and returns cleanly.
|
||||
"""
|
||||
|
||||
def test_post_budget_cancel_returns_cancelled(self) -> None:
|
||||
engine = _fresh_engine()
|
||||
event = threading.Event()
|
||||
|
||||
# Patch: set the event after projection but before mutation. We do this
|
||||
# by wrapping _format_output (called mid-submit) to set the event.
|
||||
original_format = engine._format_output
|
||||
|
||||
def _set_then_format(*args, **kwargs):
|
||||
result = original_format(*args, **kwargs)
|
||||
event.set() # trigger cancel right after output is built
|
||||
return result
|
||||
|
||||
engine._format_output = _set_then_format # type: ignore[method-assign]
|
||||
|
||||
result = engine.submit_message('hello', cancel_event=event)
|
||||
|
||||
assert result.stop_reason == 'cancelled'
|
||||
# Output IS built here (we're past the pre-budget checkpoint), so it's
|
||||
# not empty. The contract is about *state*, not output synthesis.
|
||||
assert result.output != ''
|
||||
# Critical: state still unchanged
|
||||
assert engine.mutable_messages == []
|
||||
assert engine.transcript_store.entries == []
|
||||
|
||||
|
||||
class TestCancellationAfterCommit:
|
||||
"""Event set after commit is not observable \u2014 honest cooperative limit."""
|
||||
|
||||
def test_post_commit_cancel_is_not_observable(self) -> None:
|
||||
engine = _fresh_engine()
|
||||
event = threading.Event()
|
||||
|
||||
# Event only set *after* submit_message returns. The first call has
|
||||
# already committed before the event is set.
|
||||
result = engine.submit_message('hello', cancel_event=event)
|
||||
event.set() # too late
|
||||
|
||||
assert result.stop_reason == 'completed', (
|
||||
'cancel set after commit must not retroactively invalidate the turn'
|
||||
)
|
||||
assert engine.mutable_messages == ['hello']
|
||||
|
||||
def test_next_call_observes_cancel(self) -> None:
|
||||
"""The cancel_event persists \u2014 the next call on the same engine sees it."""
|
||||
engine = _fresh_engine()
|
||||
event = threading.Event()
|
||||
|
||||
engine.submit_message('first', cancel_event=event)
|
||||
assert engine.mutable_messages == ['first']
|
||||
|
||||
event.set()
|
||||
# Next call observes the cancel at entry
|
||||
result = engine.submit_message('second', cancel_event=event)
|
||||
|
||||
assert result.stop_reason == 'cancelled'
|
||||
# 'second' must NOT have been committed
|
||||
assert engine.mutable_messages == ['first']
|
||||
|
||||
|
||||
class TestLegacyCallersUnchanged:
|
||||
"""cancel_event=None (default) => zero behaviour change from pre-#164."""
|
||||
|
||||
def test_no_event_submits_normally(self) -> None:
|
||||
engine = _fresh_engine()
|
||||
result = engine.submit_message('hello')
|
||||
|
||||
assert result.stop_reason == 'completed'
|
||||
assert engine.mutable_messages == ['hello']
|
||||
|
||||
def test_no_event_with_budget_overflow_still_rejects_atomically(self) -> None:
|
||||
"""#162 atomicity contract survives when cancel_event is absent."""
|
||||
engine = _fresh_engine(max_budget_tokens=1)
|
||||
words = ' '.join(['word'] * 100)
|
||||
|
||||
result = engine.submit_message(words) # no cancel_event
|
||||
|
||||
assert result.stop_reason == 'max_budget_reached'
|
||||
assert engine.mutable_messages == []
|
||||
|
||||
def test_no_event_respects_max_turns(self) -> None:
|
||||
"""max_turns_reached contract survives when cancel_event is absent."""
|
||||
engine = _fresh_engine(max_turns=1)
|
||||
engine.submit_message('first')
|
||||
result = engine.submit_message('second') # no cancel_event
|
||||
|
||||
assert result.stop_reason == 'max_turns_reached'
|
||||
assert engine.mutable_messages == ['first']
|
||||
|
||||
|
||||
class TestCancellationVsOtherStopReasons:
|
||||
"""cancel_event has a defined precedence relative to budget/turns."""
|
||||
|
||||
def test_cancel_precedes_max_turns_check(self) -> None:
|
||||
"""If cancel is set when capacity is also full, cancel wins (clearer signal)."""
|
||||
engine = _fresh_engine(max_turns=0) # immediately full
|
||||
event = threading.Event()
|
||||
event.set()
|
||||
|
||||
result = engine.submit_message('hello', cancel_event=event)
|
||||
|
||||
# cancel_event check is the very first thing in submit_message,
|
||||
# so it fires before the max_turns check even sees capacity
|
||||
assert result.stop_reason == 'cancelled'
|
||||
|
||||
def test_cancel_does_not_override_commit(self) -> None:
|
||||
"""Completed turn with late cancel still reports 'completed' \u2014 the
|
||||
turn already succeeded; we don't lie about it."""
|
||||
engine = _fresh_engine()
|
||||
event = threading.Event()
|
||||
|
||||
# Event gets set after the mutation is done \u2014 submit_message doesn't
|
||||
# re-check after commit
|
||||
result = engine.submit_message('hello', cancel_event=event)
|
||||
event.set()
|
||||
|
||||
assert result.stop_reason == 'completed'
|
||||
Reference in New Issue
Block a user