fix(cli): #122 doctor invocation now checks stale-base condition — added run_stale_base_preflight(None) call to doctor action handler, matching Prompt + REPL dispatch ordering; closes inconsistency where doctor says 'ok' but prompt warns 'stale base'

roadmap: cluster closure + defer #155/#156 design questions (config section validation, mcp/agents soft-warning)
roadmap: cluster closure note — help-parity family complete (#130c, #130d, #130e)
2026-04-27 00:34:59 +08:00 · 2026-04-23 02:24:27 +09:00 · 2026-04-23 02:18:46 +09:00 · 2026-04-23 02:10:07 +09:00 · 2026-04-23 02:00:59 +09:00 · 2026-04-23 01:53:31 +09:00
34 changed files with 7362 additions and 60 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -8,5 +8,8 @@ archive/
 # Claw Code local artifacts
 .claw/settings.local.json
 .claw/sessions/
+# #160/#166: default session storage directory (flush-transcript output,
+# dogfood runs, etc.). Claws specifying --directory elsewhere are fine.
+.port_sessions/
 .clawhip/
 status-help.txt
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -1,21 +1,195 @@
-# CLAUDE.md
+# CLAUDE.md — Python Reference Implementation

-This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+**This file guides work on `src/` and `tests/` — the Python reference harness for claw-code protocol.**

-## Detected stack
- Languages: Rust.
- Frameworks: none detected from the supported starter markers.
+The production CLI lives in `rust/`; this directory (`src/`, `tests/`, `.py` files) is a **protocol validation and dogfood surface**.

-## Verification
- Run Rust verification from `rust/`: `cargo fmt`, `cargo clippy --workspace --all-targets -- -D warnings`, `cargo test --workspace`
- `src/` and `tests/` are both present; update both surfaces together when behavior changes.
+## What this Python harness does
+
+**Machine-first orchestration layer** — proves that the claw-code JSON protocol is:
+- Deterministic and recoverable (every output is reproducible)
+- Self-describing (SCHEMAS.md documents every field)
+- Clawable (external agents can build ONE error handler for all commands)
+
+## Stack
+- **Language:** Python 3.13+
+- **Dependencies:** minimal (no frameworks; pure stdlibs + attrs/dataclasses)
+- **Test runner:** pytest
+- **Protocol contract:** SCHEMAS.md (machine-readable JSON envelope)
+
+## Quick start
+
+```bash
+# 1. Install dependencies (if not already in venv)
+python3 -m venv .venv && source .venv/bin/activate
+# (dependencies minimal; standard library mostly)
+
+# 2. Run tests
+python3 -m pytest tests/ -q
+
+# 3. Try a command
+python3 -m src.main bootstrap "hello" --output-format json | python3 -m json.tool
+```
+
+## Verification workflow
+
+```bash
+# Unit tests (fast)
+python3 -m pytest tests/ -q 2>&1 | tail -3
+
+# Type checking (optional but recommended)
+python3 -m mypy src/ --ignore-missing-imports 2>&1 | tail -5
+```

 ## Repository shape
- `rust/` contains the Rust workspace and active CLI/runtime implementation.
- `src/` contains source files that should stay consistent with generated guidance and tests.
- `tests/` contains validation surfaces that should be reviewed alongside code changes.

-## Working agreement
- Prefer small, reviewable changes and keep generated bootstrap files aligned with actual repo workflows.
- Keep shared defaults in `.claude.json`; reserve `.claude/settings.local.json` for machine-local overrides.
- Do not overwrite existing `CLAUDE.md` content automatically; update it intentionally when repo workflows change.
+- **`src/`** — Python reference harness implementing SCHEMAS.md protocol
+  - `main.py` — CLI entry point; all 14 clawable commands
+  - `query_engine.py` — core TurnResult / QueryEngineConfig
+  - `runtime.py` — PortRuntime; turn loop + cancellation (#164 Stage A/B)
+  - `session_store.py` — session persistence
+  - `transcript.py` — turn transcript assembly
+  - `commands.py`, `tools.py` — simulated command/tool trees
+  - `models.py` — PermissionDenial, UsageSummary, etc.
+
+- **`tests/`** — comprehensive protocol validation (22 baseline → 192 passing as of 2026-04-22)
+  - `test_cli_parity_audit.py` — proves all 14 clawable commands accept --output-format
+  - `test_json_envelope_field_consistency.py` — validates SCHEMAS.md contract
+  - `test_cancel_observed_field.py` — #164 Stage B: cancellation observability + safe-to-reuse semantics
+  - `test_run_turn_loop_*.py` — turn loop behavior (timeout, cancellation, continuation, permissions)
+  - `test_submit_message_*.py` — budget, cancellation contracts
+  - `test_*_cli.py` — command-specific JSON output validation
+
+- **`SCHEMAS.md`** — canonical JSON contract
+  - Common fields (all envelopes): timestamp, command, exit_code, output_format, schema_version
+  - Error envelope shape
+  - Not-found envelope shape
+  - Per-command success schemas (14 commands documented)
+  - Turn Result fields (including cancel_observed as of #164 Stage B)
+
+- **`.gitignore`** — excludes `.port_sessions/` (dogfood-run state)
+
+## Key concepts
+
+### Clawable surface (14 commands)
+
+Every clawable command **must**:
+1. Accept `--output-format {text,json}`
+2. Return JSON envelopes matching SCHEMAS.md
+3. Use common fields (timestamp, command, exit_code, output_format, schema_version)
+4. Exit 0 on success, 1 on error/not-found, 2 on timeout
+
+**Commands:** list-sessions, delete-session, load-session, flush-transcript, show-command, show-tool, exec-command, exec-tool, route, bootstrap, command-graph, tool-pool, bootstrap-graph, turn-loop
+
+**Validation:** `test_cli_parity_audit.py` auto-tests all 14 for --output-format acceptance.
+
+### OPT_OUT surfaces (12 commands)
+
+Explicitly exempt from --output-format requirement (for now):
+- Rich-Markdown reports: summary, manifest, parity-audit, setup-report
+- List commands with query filters: subsystems, commands, tools
+- Simulation/debug: remote-mode, ssh-mode, teleport-mode, direct-connect-mode, deep-link-mode
+
+**Future work:** audit OPT_OUT surfaces for JSON promotion (post-#164).
+
+### Protocol layers
+
+**Coverage (#167–#170):** All clawable commands emit JSON
+**Enforcement (#171):** Parity CI prevents new commands skipping JSON
+**Documentation (#172):** SCHEMAS.md locks field contract
+**Alignment (#173):** Test framework validates docs ↔ code match
+**Field evolution (#164 Stage B):** cancel_observed proves protocol extensibility
+
+## Testing & coverage
+
+### Run full suite
+```bash
+python3 -m pytest tests/ -q
+```
+
+### Run one test file
+```bash
+python3 -m pytest tests/test_cancel_observed_field.py -v
+```
+
+### Run one test
+```bash
+python3 -m pytest tests/test_cancel_observed_field.py::TestCancelObservedField::test_default_value_is_false -v
+```
+
+### Check coverage (optional)
+```bash
+python3 -m pip install coverage  # if not already installed
+python3 -m coverage run -m pytest tests/
+python3 -m coverage report --skip-covered
+```
+
+Target: >90% line coverage for src/ (currently ~85%).
+
+## Common workflows
+
+### Add a new clawable command
+
+1. Add parser in `main.py` (argparse)
+2. Add `--output-format` flag
+3. Emit JSON envelope using `wrap_json_envelope(data, command_name)`
+4. Add command to CLAWABLE_SURFACES in test_cli_parity_audit.py
+5. Document in SCHEMAS.md (schema + example)
+6. Write test in tests/test_*_cli.py or tests/test_json_envelope_field_consistency.py
+7. Run full suite to confirm parity
+
+### Modify TurnResult or protocol fields
+
+1. Update dataclass in `query_engine.py`
+2. Update SCHEMAS.md with new field + rationale
+3. Write test in `tests/test_json_envelope_field_consistency.py` that validates field presence
+4. Update all places that construct TurnResult (grep for `TurnResult(`)
+5. Update bootstrap/turn-loop JSON builders in main.py
+6. Run `tests/` to ensure no regressions
+
+### Promote an OPT_OUT surface to CLAWABLE
+
+**Prerequisite:** Real demand signal logged in `OPT_OUT_DEMAND_LOG.md` (threshold: 2+ independent signals per surface). Speculative promotions are not allowed.
+
+Once demand is evidenced:
+1. Add --output-format flag to argparse
+2. Emit wrap_json_envelope() output in JSON path
+3. Move command from OPT_OUT_SURFACES to CLAWABLE_SURFACES
+4. Document in SCHEMAS.md
+5. Write test for JSON output
+6. Run parity audit to confirm no regressions
+7. Update `OPT_OUT_DEMAND_LOG.md` to mark signal as resolved
+
+### File a demand signal (when a claw actually needs JSON from an OPT_OUT surface)
+
+1. Open `OPT_OUT_DEMAND_LOG.md`
+2. Find the surface's entry under Group A/B/C
+3. Append a dated entry with Source, Use Case, and Markdown-alternative-checked explanation
+4. If this is the 2nd signal for the same surface, file a promotion pinpoint in ROADMAP.md
+
+## Dogfood principles
+
+The Python harness is continuously dogfood-tested:
+- Every cycle ships to `main` with detailed commit messages
+- New tests are written before/alongside implementation
+- Test suite must pass before pushing (zero-regression principle)
+- Commits grouped by pinpoint (#159, #160, ..., #174)
+- Failure modes classified per exit code: 0=success, 1=error, 2=timeout
+
+## Protocol governance
+
+- **SCHEMAS.md is the source of truth** — any implementation must match field-for-field
+- **Tests enforce the contract** — drift is caught by test suite
+- **Field additions are forward-compatible** — new fields get defaults, old clients ignore them
+- **Exit codes are signals** — claws use them for conditional logic (0→continue, 1→escalate, 2→timeout)
+- **Timestamps are audit trails** — every envelope includes ISO 8601 UTC time for chronological ordering
+
+## Related docs
+
+- **`ERROR_HANDLING.md`** — Unified error-handling pattern for claws (one handler for all 14 clawable commands)
+- **`SCHEMAS.md`** — JSON protocol specification (read before implementing)
+- **`OPT_OUT_AUDIT.md`** — Governance for the 12 non-clawable surfaces
+- **`OPT_OUT_DEMAND_LOG.md`** — Active survey recording real demand signals (evidence base for decisions)
+- **`ROADMAP.md`** — macro roadmap and macro pain points
+- **`PHILOSOPHY.md`** — system design intent
+- **`PARITY.md`** — status of Python ↔ Rust protocol equivalence
--- a/ERROR_HANDLING.md
+++ b/ERROR_HANDLING.md
@@ -0,0 +1,489 @@
+# Error Handling for Claw Code Claws
+
+**Purpose:** Build a unified error handler for orchestration code using claw-code as a library or subprocess.
+
+After cycles #178–#179 (parser-front-door hole closure), claw-code's error interface is deterministic, machine-readable, and clawable: **one error handler for all 14 clawable commands.**
+
+---
+
+## Quick Reference: Exit Codes and Envelopes
+
+Every clawable command returns JSON on stdout when `--output-format json` is requested.
+
+**IMPORTANT:** The exit code contract below applies **only when `--output-format json` is explicitly set**. Text mode follows argparse conventions and may return different exit codes (e.g., `2` for argparse parse errors). Claws consuming claw-code as a subprocess MUST always pass `--output-format json` to get the documented contract.
+
+| Exit Code | Meaning | Response Format | Example |
+|---|---|---|---|
+| **0** | Success | `{success fields}` | `{"session_id": "...", "loaded": true}` |
+| **1** | Error / Not Found | `{error: {kind, message, ...}}` | `{"error": {"kind": "session_not_found", ...}}` |
+| **2** | Timeout | `{final_stop_reason: "timeout", final_cancel_observed: ...}` | `{"final_stop_reason": "timeout", ...}` |
+
+### Text mode vs JSON mode exit codes
+
+| Scenario | Text mode exit | JSON mode exit | Why |
+|---|---|---|---|
+| Unknown subcommand | 2 (argparse default) | 1 (parse error envelope) | argparse defaults to 2; JSON mode normalizes to contract |
+| Missing required arg | 2 (argparse default) | 1 (parse error envelope) | Same reason |
+| Session not found | 1 | 1 | Application-level error, same in both |
+| Command executed OK | 0 | 0 | Success path, identical |
+| Turn-loop timeout | 2 | 2 | Identical (#161 implementation) |
+
+**Practical rule for claws:** always pass `--output-format json`. This eliminates text-mode surprises and gives you the documented exit-code contract for every error path.
+
+---
+
+## One-Handler Pattern
+
+Build a single error-recovery function that works for all 14 clawable commands:
+
+```python
+import subprocess
+import json
+import sys
+from typing import Any
+
+def run_claw_command(command: list[str], timeout_seconds: float = 30.0) -> dict[str, Any]:
+    """
+    Run a clawable claw-code command and handle errors uniformly.
+    
+    Args:
+        command: Full command list, e.g. ["claw", "load-session", "id", "--output-format", "json"]
+        timeout_seconds: Wall-clock timeout
+    
+    Returns:
+        Parsed JSON result from stdout
+    
+    Raises:
+        ClawError: Classified by error.kind (parse, session_not_found, runtime, timeout, etc.)
+    """
+    try:
+        result = subprocess.run(
+            command,
+            capture_output=True,
+            text=True,
+            timeout=timeout_seconds,
+        )
+    except subprocess.TimeoutExpired:
+        raise ClawError(
+            kind='subprocess_timeout',
+            message=f'Command exceeded {timeout_seconds}s wall-clock timeout',
+            retryable=True,  # Caller's decision; subprocess timeout != engine timeout
+        )
+    
+    # Parse JSON (valid for all success/error/timeout paths in claw-code)
+    try:
+        envelope = json.loads(result.stdout)
+    except json.JSONDecodeError as err:
+        raise ClawError(
+            kind='parse_failure',
+            message=f'Command output is not JSON: {err}',
+            hint='Check that --output-format json is being passed',
+            retryable=False,
+        )
+    
+    # Classify by exit code and error.kind
+    match (result.returncode, envelope.get('error', {}).get('kind')):
+        case (0, _):
+            # Success
+            return envelope
+        
+        case (1, 'parse'):
+            # #179: argparse error — typically a typo or missing required argument
+            raise ClawError(
+                kind='parse',
+                message=envelope['error']['message'],
+                hint=envelope['error'].get('hint'),
+                retryable=False,  # Typos don't fix themselves
+            )
+        
+        case (1, 'session_not_found'):
+            # Common: load-session on nonexistent ID
+            raise ClawError(
+                kind='session_not_found',
+                message=envelope['error']['message'],
+                session_id=envelope.get('session_id'),
+                retryable=False,  # Session won't appear on retry
+            )
+        
+        case (1, 'filesystem'):
+            # Directory missing, permission denied, disk full
+            raise ClawError(
+                kind='filesystem',
+                message=envelope['error']['message'],
+                retryable=True,  # Might be transient (disk space, NFS flake)
+            )
+        
+        case (1, 'runtime'):
+            # Generic engine error (unexpected exception, malformed input, etc.)
+            raise ClawError(
+                kind='runtime',
+                message=envelope['error']['message'],
+                retryable=envelope['error'].get('retryable', False),
+            )
+        
+        case (1, _):
+            # Catch-all for any new error.kind values
+            raise ClawError(
+                kind=envelope['error']['kind'],
+                message=envelope['error']['message'],
+                retryable=envelope['error'].get('retryable', False),
+            )
+        
+        case (2, _):
+            # Timeout (engine was asked to cancel and had fair chance to observe)
+            cancel_observed = envelope.get('final_cancel_observed', False)
+            raise ClawError(
+                kind='timeout',
+                message=f'Turn exceeded timeout (cancel_observed={cancel_observed})',
+                cancel_observed=cancel_observed,
+                retryable=True,  # Caller can retry with a fresh session
+                safe_to_reuse_session=(cancel_observed is True),
+            )
+        
+        case (exit_code, _):
+            # Unexpected exit code
+            raise ClawError(
+                kind='unexpected_exit_code',
+                message=f'Unexpected exit code {exit_code}',
+                retryable=False,
+            )
+
+
+class ClawError(Exception):
+    """Unified error type for claw-code commands."""
+    
+    def __init__(
+        self,
+        kind: str,
+        message: str,
+        hint: str | None = None,
+        retryable: bool = False,
+        cancel_observed: bool = False,
+        safe_to_reuse_session: bool = False,
+        session_id: str | None = None,
+    ):
+        self.kind = kind
+        self.message = message
+        self.hint = hint
+        self.retryable = retryable
+        self.cancel_observed = cancel_observed
+        self.safe_to_reuse_session = safe_to_reuse_session
+        self.session_id = session_id
+        super().__init__(self.message)
+    
+    def __str__(self) -> str:
+        parts = [f"{self.kind}: {self.message}"]
+        if self.hint:
+            parts.append(f"Hint: {self.hint}")
+        if self.retryable:
+            parts.append("(retryable)")
+        if self.cancel_observed:
+            parts.append(f"(safe_to_reuse_session={self.safe_to_reuse_session})")
+        return "\n".join(parts)
+```
+
+---
+
+## Practical Recovery Patterns
+
+### Pattern 1: Retry on transient errors
+
+```python
+from time import sleep
+
+def run_with_retry(
+    command: list[str],
+    max_attempts: int = 3,
+    backoff_seconds: float = 0.5,
+) -> dict:
+    """Retry on transient errors (filesystem, timeout)."""
+    for attempt in range(1, max_attempts + 1):
+        try:
+            return run_claw_command(command)
+        except ClawError as err:
+            if not err.retryable:
+                raise  # Non-transient; fail fast
+            
+            if attempt == max_attempts:
+                raise  # Last attempt; propagate
+            
+            print(f"Attempt {attempt} failed ({err.kind}); retrying in {backoff_seconds}s...", file=sys.stderr)
+            sleep(backoff_seconds)
+            backoff_seconds *= 1.5  # exponential backoff
+    
+    raise RuntimeError("Unreachable")
+```
+
+### Pattern 2: Reuse session after timeout (if safe)
+
+```python
+def run_with_timeout_recovery(
+    command: list[str],
+    timeout_seconds: float = 30.0,
+    fallback_timeout: float = 60.0,
+) -> dict:
+    """
+    On timeout, check cancel_observed. If True, the session is safe for retry.
+    If False, the session is potentially wedged; use a fresh one.
+    """
+    try:
+        return run_claw_command(command, timeout_seconds=timeout_seconds)
+    except ClawError as err:
+        if err.kind != 'timeout':
+            raise
+        
+        if err.safe_to_reuse_session:
+            # Engine saw the cancel signal; safe to reuse this session with a larger timeout
+            print(f"Timeout observed (cancel_observed=true); retrying with {fallback_timeout}s...", file=sys.stderr)
+            return run_claw_command(command, timeout_seconds=fallback_timeout)
+        else:
+            # Engine didn't see the cancel signal; session may be wedged
+            print(f"Timeout not observed (cancel_observed=false); session is potentially wedged", file=sys.stderr)
+            raise  # Caller should allocate a fresh session
+```
+
+### Pattern 3: Detect parse errors (typos in command-line construction)
+
+```python
+def validate_command_before_dispatch(command: list[str]) -> None:
+    """
+    Dry-run with --help to detect obvious syntax errors before dispatching work.
+    
+    This is cheap (no API call) and catches typos like:
+    - Unknown subcommand: `claw typo-command`
+    - Unknown flag: `claw bootstrap --invalid-flag`
+    - Missing required argument: `claw load-session` (no session_id)
+    """
+    help_cmd = command + ['--help']
+    try:
+        result = subprocess.run(help_cmd, capture_output=True, timeout=2.0)
+        if result.returncode != 0:
+            print(f"Warning: {' '.join(help_cmd)} returned {result.returncode}", file=sys.stderr)
+            print("(This doesn't prove the command is invalid, just that --help failed)", file=sys.stderr)
+    except subprocess.TimeoutExpired:
+        pass  # --help shouldn't hang, but don't block on it
+```
+
+### Pattern 4: Log and forward errors to observability
+
+```python
+import logging
+
+logger = logging.getLogger(__name__)
+
+def run_claw_with_logging(command: list[str]) -> dict:
+    """Run command and log errors for observability."""
+    try:
+        result = run_claw_command(command)
+        logger.info(f"Claw command succeeded: {' '.join(command)}")
+        return result
+    except ClawError as err:
+        logger.error(
+            "Claw command failed",
+            extra={
+                'command': ' '.join(command),
+                'error_kind': err.kind,
+                'error_message': err.message,
+                'retryable': err.retryable,
+                'cancel_observed': err.cancel_observed,
+            },
+        )
+        raise
+```
+
+---
+
+## Error Kinds (Enumeration)
+
+After cycles #178–#179, the complete set of `error.kind` values is:
+
+| Kind | Exit Code | Meaning | Retryable | Notes |
+|---|---|---|---|---|
+| **parse** | 1 | Argparse error (unknown command, missing arg, invalid flag) | No | Real error message included (#179); valid choices list for discoverability |
+| **session_not_found** | 1 | load-session target doesn't exist | No | session_id and directory included in envelope |
+| **filesystem** | 1 | Directory missing, permission denied, disk full | Yes | Transient issues (disk space, NFS flake) can be retried |
+| **runtime** | 1 | Engine error (unexpected exception, malformed input) | Depends | `error.retryable` field in envelope specifies |
+| **timeout** | 2 | Engine timeout with cooperative cancellation | Yes* | `cancel_observed` field signals session safety (#164) |
+
+*Retry safety depends on `cancel_observed`:
+- `cancel_observed=true` → session is safe to reuse
+- `cancel_observed=false` → session may be wedged; allocate fresh one
+
+---
+
+## What We Did to Make This Work
+
+### Cycle #178: Parse-Error Envelope
+
+**Problem:** `claw nonexistent --output-format json` returned argparse help text on stderr instead of an envelope.
+**Solution:** Catch argparse `SystemExit` in JSON mode and emit a structured error envelope.
+**Benefit:** Claws no longer need to parse human help text to understand parse errors.
+
+### Cycle #179: Stderr Hygiene + Real Error Message
+
+**Problem:** Even after #178, argparse usage was leaking to stderr AND the envelope message was generic ("invalid command or argument").
+**Solution:** Monkey-patch `parser.error()` in JSON mode to raise an internal exception, preserving argparse's real message verbatim. Suppress stderr entirely in JSON mode.
+**Benefit:** Claws see one stream (stdout), one envelope, and real error context (e.g., "invalid choice: typo (choose from ...)") for discoverability.
+
+### Contract: #164 Stage B (`cancel_observed` field)
+
+**Problem:** Timeout results didn't signal whether the engine actually observed the cancellation request.
+**Solution:** Add `cancel_observed: bool` field to timeout TurnResult; signal true iff the engine had a fair chance to observe the cancel event.
+**Benefit:** Claws can decide "retry with fresh session" vs "reuse this session with larger timeout" based on a single boolean.
+
+---
+
+## Common Mistakes to Avoid
+
+❌ **Don't parse exit code alone**  
+```python
+# BAD: Exit code 1 could mean parse error, not-found, filesystem, or runtime
+if result.returncode == 1:
+    # What should I do? Unclear.
+    pass
+```
+
+✅ **Do parse error.kind**  
+```python
+# GOOD: error.kind tells you exactly how to recover
+match envelope['error']['kind']:
+    case 'parse': ...
+    case 'session_not_found': ...
+    case 'filesystem': ...
+```
+
+---
+
+❌ **Don't capture both stdout and stderr and assume they're separate concerns**  
+```python
+# BAD (pre-#179): Capture stdout + stderr, then parse stdout as JSON
+# But stderr might contain argparse noise that you have to string-match
+result = subprocess.run(..., capture_output=True, text=True)
+if "invalid choice" in result.stderr:
+    # ... custom error handling
+```
+
+✅ **Do silence stderr in JSON mode**  
+```python
+# GOOD (post-#179): In JSON mode, stderr is guaranteed silent
+# Envelope on stdout is your single source of truth
+result = subprocess.run(..., capture_output=True, text=True)
+envelope = json.loads(result.stdout)  # Always valid in JSON mode
+```
+
+---
+
+❌ **Don't retry on parse errors**  
+```python
+# BAD: Typos don't fix themselves
+error_kind = envelope['error']['kind']
+if error_kind == 'parse':
+    retry()  # Will fail again
+```
+
+✅ **Do check retryable before retrying**  
+```python
+# GOOD: Let the error tell you
+error = envelope['error']
+if error.get('retryable', False):
+    retry()
+else:
+    raise
+```
+
+---
+
+❌ **Don't reuse a session after timeout without checking cancel_observed**  
+```python
+# BAD: Reuse session = potential wedge
+result = run_claw_command(...)  # times out
+# ... later, reuse same session
+result = run_claw_command(...)  # might be stuck in the previous turn
+```
+
+✅ **Do allocate a fresh session if cancel_observed=false**  
+```python
+# GOOD: Allocate fresh session if wedge is suspected
+try:
+    result = run_claw_command(...)
+except ClawError as err:
+    if err.cancel_observed:
+        # Safe to reuse
+        result = run_claw_command(...)
+    else:
+        # Allocate fresh session
+        fresh_session = create_session()
+        result = run_claw_command_in_session(fresh_session, ...)
+```
+
+---
+
+## Testing Your Error Handler
+
+```python
+def test_error_handler_parse_error():
+    """Verify parse errors are caught and classified."""
+    try:
+        run_claw_command(['claw', 'nonexistent', '--output-format', 'json'])
+        assert False, "Should have raised ClawError"
+    except ClawError as err:
+        assert err.kind == 'parse'
+        assert 'invalid choice' in err.message.lower()
+        assert err.retryable is False
+
+def test_error_handler_timeout_safe():
+    """Verify timeout with cancel_observed=true marks session as safe."""
+    # Requires a live claw-code server; mock this test
+    try:
+        run_claw_command(
+            ['claw', 'turn-loop', '"x"', '--timeout-seconds', '0.0001'],
+            timeout_seconds=2.0,
+        )
+        assert False, "Should have raised ClawError"
+    except ClawError as err:
+        assert err.kind == 'timeout'
+        assert err.safe_to_reuse_session is True  # cancel_observed=true
+
+def test_error_handler_not_found():
+    """Verify session_not_found is clearly classified."""
+    try:
+        run_claw_command(['claw', 'load-session', 'nonexistent', '--output-format', 'json'])
+        assert False, "Should have raised ClawError"
+    except ClawError as err:
+        assert err.kind == 'session_not_found'
+        assert err.retryable is False
+```
+
+---
+
+## Appendix: SCHEMAS.md Error Shape
+
+For reference, the canonical JSON error envelope shape (SCHEMAS.md):
+
+```json
+{
+  "timestamp": "2026-04-22T11:40:00Z",
+  "command": "load-session",
+  "exit_code": 1,
+  "output_format": "json",
+  "schema_version": "1.0",
+  "error": {
+    "kind": "session_not_found",
+    "operation": "session_store.load_session",
+    "target": "nonexistent",
+    "retryable": false,
+    "message": "session 'nonexistent' not found in .port_sessions",
+    "hint": "use 'list-sessions' to see available sessions"
+  }
+}
+```
+
+All commands that emit errors follow this shape (with error.kind varying). See `SCHEMAS.md` for the complete contract.
+
+---
+
+## Summary
+
+After cycles #178–#179, **one error handler works for all 14 clawable commands.** No more string-matching, no more stderr parsing, no more exit-code ambiguity. Just parse the JSON, check `error.kind`, and decide: retry, escalate, or reuse session (if safe).
+
+The handler itself is ~80 lines of Python; the patterns are reusable across any language that can speak JSON.
--- a/OPT_OUT_AUDIT.md
+++ b/OPT_OUT_AUDIT.md
@@ -0,0 +1,151 @@
+# OPT_OUT Surface Audit Roadmap
+
+**Status:** Pre-audit (decision table ready, survey pending)
+
+This document governs the audit and potential promotion of 12 OPT_OUT surfaces (commands that currently do **not** support `--output-format json`).
+
+## OPT_OUT Classification Rationale
+
+A surface is classified as OPT_OUT when:
+1. **Human-first by nature:** Rich Markdown prose / diagrams / structured text where JSON would be information loss
+2. **Query-filtered alternative exists:** Commands with internal `--query` / `--limit` don't need JSON (users already have escape hatch)
+3. **Simulation/debug only:** Not meant for production orchestration (e.g., mode simulators)
+4. **Future JSON work is planned:** Documented in ROADMAP with clear upgrade path
+
+---
+
+## OPT_OUT Surfaces (12 Total)
+
+### Group A: Rich-Markdown Reports (4 commands)
+
+**Rationale:** These emit structured narrative prose. JSON would require lossy serialization.
+
+| Command | Output | Current use | JSON case |
+|---|---|---|---|
+| `summary` | Multi-section workspace summary (Markdown) | Human readability | Not applicable; Markdown is the output |
+| `manifest` | Workspace manifest with project tree (Markdown) | Human readability | Not applicable; Markdown is the output |
+| `parity-audit` | TypeScript/Python port comparison report (Markdown) | Human readability | Not applicable; Markdown is the output |
+| `setup-report` | Preflight + startup diagnostics (Markdown) | Human readability | Not applicable; Markdown is the output |
+
+**Audit decision:** These likely remain OPT_OUT long-term (Markdown-as-output is intentional). If JSON version needed in future, would be a separate `--output-format json` path generating structured data (project summary object, manifest array, audit deltas, setup checklist) — but that's a **new contract**, not an addition to existing Markdown surfaces.
+
+**Pinpoint:** #175 (deferred) — audit whether `summary`/`manifest` should emit JSON structured versions *in parallel* with Markdown, or if Markdown-only is the right UX.
+
+---
+
+### Group B: List Commands with Query Filters (3 commands)
+
+**Rationale:** These already support `--query` and `--limit` for filtering. JSON output would be redundant; users can pipe to `jq`.
+
+| Command | Filtering | Current output | JSON case |
+|---|---|---|---|
+| `subsystems` | `--limit` | Human-readable list | Use `--query` to filter, users can parse if needed |
+| `commands` | `--query`, `--limit`, `--no-plugin-commands`, `--no-skill-commands` | Human-readable list | Use `--query` to filter, users can parse if needed |
+| `tools` | `--query`, `--limit`, `--simple-mode` | Human-readable list | Use `--query` to filter, users can parse if needed |
+
+**Audit decision:** `--query` / `--limit` are already the machine-friendly escape hatch. These commands are **intentionally** list-filter-based (not orchestration-primary). Promoting to CLAWABLE would require:
+1. Formalizing what the structured output *is* (command array? tool array?)
+2. Versioning the schema per command
+3. Updating tests to validate per-command schemas
+
+**Cost-benefit:** Low. Users who need structured data can already use `--query` to narrow results, then parse. Effort to promote > value.
+
+**Pinpoint:** #176 (backlog) — audit `--query` UX; consider if a `--query-json` escape hatch (output JSON of matching items) is worth the schema tax.
+
+---
+
+### Group C: Simulation / Debug Surfaces (5 commands)
+
+**Rationale:** These are intentionally **not production-orchestrated**. They simulate behavior, test modes, or debug scenarios. JSON output doesn't add value.
+
+| Command | Purpose | Output | Use case |
+|---|---|---|---|
+| `remote-mode` | Simulate remote execution | Text (mock session) | Testing harness behavior under remote constraints |
+| `ssh-mode` | Simulate SSH execution | Text (mock SSH session) | Testing harness behavior over SSH-like transport |
+| `teleport-mode` | Simulate teleport hop | Text (mock hop session) | Testing harness behavior with teleport bouncing |
+| `direct-connect-mode` | Simulate direct network | Text (mock session) | Testing harness behavior with direct connectivity |
+| `deep-link-mode` | Simulate deep-link invocation | Text (mock deep-link) | Testing harness behavior from URL/deeplink |
+
+**Audit decision:** These are **intentionally simulation-only**. Promoting to CLAWABLE means:
+1. "This simulated mode is now a valid orchestration surface"
+2. Need to define what JSON output *means* (mock session state? simulation log?)
+3. Need versioning + test coverage
+
+**Cost-benefit:** Very low. These are debugging tools, not orchestration endpoints. Effort to promote >> value.
+
+**Pinpoint:** #177 (backlog) — decide if mode simulators should ever be CLAWABLE (probably no).
+
+---
+
+## Audit Workflow (Future Cycles)
+
+### For each surface:
+1. **Survey:** Check if any external claw actually uses --output-format with this surface
+2. **Cost estimate:** How much schema work + testing?
+3. **Value estimate:** How much demand for JSON version?
+4. **Decision:** CLAWABLE, remain OPT_OUT, or new pinpoint?
+
+### Promotion criteria (if promoting to CLAWABLE):
+
+A surface moves from OPT_OUT → CLAWABLE **only if**:
+- ✅ Clear use case for JSON (not just "hypothetically could be JSON")
+- ✅ Schema is simple and stable (not 20+ fields)
+- ✅ At least one external claw has requested it
+- ✅ Tests can be added without major refactor
+- ✅ Maintainability burden is worth the value
+
+### Demote criteria (if staying OPT_OUT):
+
+A surface stays OPT_OUT **if**:
+- ✅ JSON would be information loss (Markdown reports)
+- ✅ Equivalent filtering already exists (`--query` / `--limit`)
+- ✅ Use case is simulation/debug, not production
+- ✅ Promotion effort > value to users
+
+---
+
+## Post-Audit Outcomes
+
+### Likely scenario (high confidence)
+
+**Group A (Markdown reports):** Remain OPT_OUT
+- `summary`, `manifest`, `parity-audit`, `setup-report` are **intentionally** human-first
+- If JSON-like structure is needed in future, would be separate `*-json` commands or distinct `--output-format`, not added to Markdown surfaces
+
+**Group B (List filters):** Remain OPT_OUT
+- `subsystems`, `commands`, `tools` have `--query` / `--limit` as query layer
+- Users who need structured data already have escape hatch
+
+**Group C (Mode simulators):** Remain OPT_OUT
+- `remote-mode`, `ssh-mode`, etc. are debug tools, not orchestration endpoints
+- No demand for JSON version; promotion would be forced, not driven
+
+**Result:** OPT_OUT audit concludes that 12/12 surfaces should **remain OPT_OUT** (no promotions).
+
+### If demand emerges
+
+If external claws report needing JSON from any OPT_OUT surface:
+1. File pinpoint with use case + rationale
+2. Estimate cost + value
+3. If value > cost, promote to CLAWABLE with full test coverage
+4. Update SCHEMAS.md
+5. Update CLAUDE.md
+
+---
+
+## Timeline
+
+- **Post-#174 (now):** OPT_OUT audit documented (this file)
+- **Cycles #19–#21 (deferred):** Survey period — collect data on external demand
+- **Cycle #22 (deferred):** Final audit decision + any promotions
+- **Post-audit:** Move to protocol maintenance mode (new commands/fields/surfaces)
+
+---
+
+## Related
+
+- **OPT_OUT_DEMAND_LOG.md** — Active survey recording real demand signals (evidentiary base for any promotion decision)
+- **SCHEMAS.md** — Clawable surface contracts
+- **CLAUDE.md** — Development guidance
+- **test_cli_parity_audit.py** — Parametrized tests for CLAWABLE_SURFACES enforcement
+- **ROADMAP.md** — Macro phases (this audit is Phase 3 before Phase 2 closure)
--- a/OPT_OUT_DEMAND_LOG.md
+++ b/OPT_OUT_DEMAND_LOG.md
@@ -0,0 +1,167 @@
+# OPT_OUT Demand Log
+
+**Purpose:** Record real demand signals for promoting OPT_OUT surfaces to CLAWABLE. Without this log, the audit criteria in `OPT_OUT_AUDIT.md` have no evidentiary base.
+
+**Status:** Active survey window (post-#178/#179, cycles #21+)
+
+## How to file a demand signal
+
+When any external claw, operator, or downstream consumer actually needs JSON output from one of the 12 OPT_OUT surfaces, add an entry below. **Speculation, "could be useful someday," and internal hypotheticals do NOT count.**
+
+A valid signal requires:
+- **Source:** Who/what asked (human, automation, agent session, external tool)
+- **Surface:** Which OPT_OUT command (from the 12)
+- **Use case:** The concrete orchestration problem they're trying to solve
+- **Would-parse-Markdown alternative checked?** Why the existing OPT_OUT output is insufficient
+- **Date:** When the signal was received
+
+## Promotion thresholds
+
+Per `OPT_OUT_AUDIT.md` criteria:
+- **2+ independent signals** for the same surface within a survey window → file promotion pinpoint
+- **1 signal + existing stable schema** → file pinpoint for discussion
+- **0 signals** → surface stays OPT_OUT (documented rationale in audit file)
+
+The threshold is intentionally high. Single-use hacks can be served via one-off Markdown parsing; schema promotion is expensive (docs, tests, maintenance).
+
+---
+
+## Demand Signals Received
+
+### Group A: Rich-Markdown Reports
+
+#### `summary`
+**Signals received: 0**
+
+Notes: No demand recorded. Markdown output is intentional and useful for human review.
+
+#### `manifest`
+**Signals received: 0**
+
+Notes: No demand recorded.
+
+#### `parity-audit`
+**Signals received: 0**
+
+Notes: No demand recorded. Report consumers are humans reviewing porting progress, not automation.
+
+#### `setup-report`
+**Signals received: 0**
+
+Notes: No demand recorded.
+
+---
+
+### Group B: List Commands with Query Filters
+
+#### `subsystems`
+**Signals received: 0**
+
+Notes: `--limit` already provides filtering. No claws requesting JSON.
+
+#### `commands`
+**Signals received: 0**
+
+Notes: `--query`, `--limit`, `--no-plugin-commands`, `--no-skill-commands` already allow filtering. No demand recorded.
+
+#### `tools`
+**Signals received: 0**
+
+Notes: `--query`, `--limit`, `--simple-mode` provide filtering. No demand recorded.
+
+---
+
+### Group C: Simulation / Debug Surfaces
+
+#### `remote-mode`
+**Signals received: 0**
+
+Notes: Simulation-only. No production orchestration need.
+
+#### `ssh-mode`
+**Signals received: 0**
+
+Notes: Simulation-only.
+
+#### `teleport-mode`
+**Signals received: 0**
+
+Notes: Simulation-only.
+
+#### `direct-connect-mode`
+**Signals received: 0**
+
+Notes: Simulation-only.
+
+#### `deep-link-mode`
+**Signals received: 0**
+
+Notes: Simulation-only.
+
+---
+
+## Survey Window Status
+
+| Cycle | Date | New Signals | Running Total | Action |
+|---|---|---|---|---|
+| #21 | 2026-04-22 | 0 | 0 | Survey opened; log established |
+
+**Current assessment:** Zero demand for any OPT_OUT surface promotion. This is consistent with `OPT_OUT_AUDIT.md` prediction that all 12 likely stay OPT_OUT long-term.
+
+---
+
+## Signal Entry Template
+
+```
+### <surface-name>
+**Signal received: [N]**
+
+Entry N (YYYY-MM-DD):
+- Source: <who/what>
+- Use case: <concrete orchestration problem>
+- Markdown-alternative-checked: <yes/no + why insufficient>
+- Follow-up: <filed pinpoint / discussion thread / closed>
+```
+
+---
+
+## Decision Framework
+
+At cycle #22 (or whenever survey window closes):
+
+### If 0 signals total (likely):
+- Move all 12 surfaces to `PERMANENTLY_OPT_OUT` or similar
+- Remove `OPT_OUT_SURFACES` from `test_cli_parity_audit.py` (everything is explicitly non-goal)
+- Update `CLAUDE.md` to reflect maintainership mode
+- Close `OPT_OUT_AUDIT.md` with "audit complete, no promotions"
+
+### If 1–2 signals on isolated surfaces:
+- File individual promotion pinpoints per surface with demand evidence
+- Each goes through standard #171/#172/#173 loop (parity audit, SCHEMAS.md, consistency test)
+
+### If high demand (3+ signals):
+- Reopen audit: is the OPT_OUT classification actually correct?
+- Review whether protocol expansion is warranted
+
+---
+
+## Related Files
+
+- **`OPT_OUT_AUDIT.md`** — Audit criteria, decision table, rationale by group
+- **`SCHEMAS.md`** — JSON contract for the 14 CLAWABLE surfaces
+- **`tests/test_cli_parity_audit.py`** — Machine enforcement of CLAWABLE/OPT_OUT classification
+- **`CLAUDE.md`** — Development posture (maintainership mode)
+
+---
+
+## Philosophy
+
+**Prevent speculative expansion.** The discipline of requiring real signals before promotion protects the protocol from schema bloat. Every new CLAWABLE surface adds:
+- A SCHEMAS.md section (maintenance burden)
+- Test coverage (test suite tax)
+- Documentation (cognitive load for new developers)
+- Version compatibility (schema_version bump risk)
+
+If a claw can't articulate *why* it needs JSON for `summary` beyond "it would be nice," then JSON for `summary` is not needed. The Markdown output is a feature, not a gap.
+
+The audit log closes the loop on "governed non-goals": OPT_OUT surfaces are intentionally not clawable until proven otherwise by evidence.
--- a/README.md
+++ b/README.md
@@ -5,6 +5,8 @@
  ·
  <a href="./USAGE.md">Usage</a>
  ·
+  <a href="./ERROR_HANDLING.md">Error Handling</a>
+  ·
  <a href="./rust/README.md">Rust workspace</a>
  ·
  <a href="./PARITY.md">Parity</a>
@@ -40,9 +42,11 @@ The canonical implementation lives in [`rust/`](./rust), and the current source

 - **`rust/`** — canonical Rust workspace and the `claw` CLI binary
 - **`USAGE.md`** — task-oriented usage guide for the current product surface
+- **`ERROR_HANDLING.md`** — unified error-handling pattern for orchestration code
 - **`PARITY.md`** — Rust-port parity status and migration notes
 - **`ROADMAP.md`** — active roadmap and cleanup backlog
 - **`PHILOSOPHY.md`** — project intent and system-design framing
+- **`SCHEMAS.md`** — JSON protocol contract (Python harness reference)
 - **`src/` + `tests/`** — companion Python/reference workspace and audit helpers; not the primary runtime surface

 ## Quick start
--- a/ROADMAP.md
+++ b/ROADMAP.md
--- a/SCHEMAS.md
+++ b/SCHEMAS.md
@@ -0,0 +1,454 @@
+# JSON Envelope Schemas — Clawable CLI Contract
+
+This document locks the field-level contract for all clawable-surface commands. Every command accepting `--output-format json` must conform to the envelope shapes below.
+
+**Target audience:** Claws building orchestrators, automation, or monitoring against claw-code's JSON output.
+
+---
+
+## Common Fields (All Envelopes)
+
+Every command response, success or error, carries:
+
+```json
+{
+  "timestamp": "2026-04-22T10:10:00Z",
+  "command": "list-sessions",
+  "exit_code": 0,
+  "output_format": "json",
+  "schema_version": "1.0"
+}
+```
+
+| Field | Type | Required | Notes |
+|---|---|---|---|
+| `timestamp` | ISO 8601 UTC | Yes | Time command completed |
+| `command` | string | Yes | argv[1] (e.g. "list-sessions") |
+| `exit_code` | int (0/1/2) | Yes | 0=success, 1=error/not-found, 2=timeout |
+| `output_format` | string | Yes | Always "json" (for symmetry with text mode) |
+| `schema_version` | string | Yes | "1.0" (bump for breaking changes) |
+
+---
+
+## Turn Result Fields (Multi-Turn Sessions)
+
+When a command's response includes a `turn` object (e.g., in `bootstrap` or `turn-loop`), it carries:
+
+| Field | Type | Required | Notes |
+|---|---|---|---|
+| `prompt` | string | Yes | User input for this turn |
+| `output` | string | Yes | Assistant response |
+| `stop_reason` | enum | Yes | One of: `completed`, `timeout`, `cancelled`, `max_budget_reached`, `max_turns_reached` |
+| `cancel_observed` | bool | Yes | #164 Stage B: cancellation was signaled and observed (#161/#164) |
+
+---
+
+## Error Envelope
+
+When a command fails (exit code 1), responses carry:
+
+```json
+{
+  "timestamp": "2026-04-22T10:10:00Z",
+  "command": "exec-command",
+  "exit_code": 1,
+  "error": {
+    "kind": "filesystem",
+    "operation": "write",
+    "target": "/tmp/nonexistent/out.md",
+    "retryable": true,
+    "message": "No such file or directory",
+    "hint": "intermediate directory does not exist; try mkdir -p /tmp/nonexistent"
+  }
+}
+```
+
+| Field | Type | Required | Notes |
+|---|---|---|---|
+| `error.kind` | enum | Yes | One of: `filesystem`, `auth`, `session`, `parse`, `runtime`, `mcp`, `delivery`, `usage`, `policy`, `unknown` |
+| `error.operation` | string | Yes | Syscall/method that failed (e.g. "write", "open", "resolve_session") |
+| `error.target` | string | Yes | Resource that failed (path, session-id, server-name, etc.) |
+| `error.retryable` | bool | Yes | Whether caller can safely retry without intervention |
+| `error.message` | string | Yes | Platform error message (e.g. errno text) |
+| `error.hint` | string | No | Optional actionable next step |
+
+---
+
+## Not-Found Envelope
+
+When an entity does not exist (exit code 1, but not a failure):
+
+```json
+{
+  "timestamp": "2026-04-22T10:10:00Z",
+  "command": "load-session",
+  "exit_code": 1,
+  "name": "does-not-exist",
+  "found": false,
+  "error": {
+    "kind": "session_not_found",
+    "message": "session 'does-not-exist' not found in .claw/sessions/",
+    "retryable": false
+  }
+}
+```
+
+| Field | Type | Required | Notes |
+|---|---|---|---|
+| `name` | string | Yes | Entity name/id that was looked up |
+| `found` | bool | Yes | Always `false` for not-found |
+| `error.kind` | enum | Yes | One of: `command_not_found`, `tool_not_found`, `session_not_found` |
+| `error.message` | string | Yes | User-visible explanation |
+| `error.retryable` | bool | Yes | Usually `false` (entity will not magically appear) |
+
+---
+
+## Per-Command Success Schemas
+
+### `list-sessions`
+
+**Status**: ✅ Implemented (closed #251 cycle #45, 2026-04-23).
+
+**Actual binary envelope** (as of #251 fix):
+```json
+{
+  "command": "list-sessions",
+  "sessions": [
+    {
+      "id": "session-1775777421902-1",
+      "path": "/path/to/.claw/sessions/session-1775777421902-1.jsonl",
+      "updated_at_ms": 1775777421902,
+      "message_count": 0
+    }
+  ]
+}
+```
+
+**Aspirational (future) shape**:
+```json
+{
+  "timestamp": "2026-04-22T10:10:00Z",
+  "command": "list-sessions",
+  "exit_code": 0,
+  "output_format": "json",
+  "schema_version": "1.0",
+  "directory": ".claw/sessions",
+  "sessions_count": 2,
+  "sessions": [
+    {
+      "session_id": "sess_abc123",
+      "created_at": "2026-04-21T15:30:00Z",
+      "last_modified": "2026-04-22T09:45:00Z",
+      "prompt_count": 5,
+      "stopped": false
+    }
+  ]
+}
+```
+
+**Gap**: Current impl lacks `timestamp`, `exit_code`, `output_format`, `schema_version`, `directory`, `sessions_count` (derivable), and the session object uses `id`/`updated_at_ms`/`message_count` instead of `session_id`/`last_modified`/`prompt_count`. Follow-up #250 Option B to align field names and add common-envelope fields.
+
+### `delete-session`
+
+**Status**: ⚠️ Stub only (closed #251 dispatch-order fix; full impl deferred).
+
+**Actual binary envelope** (as of #251 fix):
+```json
+{
+  "type": "error",
+  "command": "delete-session",
+  "error": "not_yet_implemented",
+  "kind": "not_yet_implemented"
+}
+```
+
+Exit code: 1. No credentials required. The stub ensures the verb does NOT fall through to Prompt/auth (the #251 fix), but the actual delete operation is not yet wired.
+
+**Aspirational (future) shape**:
+```json
+{
+  "timestamp": "2026-04-22T10:10:00Z",
+  "command": "delete-session",
+  "exit_code": 0,
+  "session_id": "sess_abc123",
+  "deleted": true,
+  "directory": ".claw/sessions"
+}
+```
+
+### `load-session`
+
+**Status**: ✅ Implemented (closed #251 cycle #45, 2026-04-23).
+
+**Actual binary envelope** (as of #251 fix):
+```json
+{
+  "command": "load-session",
+  "session": {
+    "id": "session-abc123",
+    "path": "/path/to/.claw/sessions/session-abc123.jsonl",
+    "messages": 5
+  }
+}
+```
+
+For nonexistent sessions, emits a local `session_not_found` error (NOT `missing_credentials`):
+```json
+{
+  "error": "session not found: nonexistent",
+  "kind": "session_not_found",
+  "type": "error",
+  "hint": "Hint: managed sessions live in .claw/sessions/<hash>/ ..."
+}
+```
+
+**Aspirational (future) shape**:
+```json
+{
+  "timestamp": "2026-04-22T10:10:00Z",
+  "command": "load-session",
+  "exit_code": 0,
+  "session_id": "sess_abc123",
+  "loaded": true,
+  "directory": ".claw/sessions",
+  "path": ".claw/sessions/sess_abc123.jsonl"
+}
+```
+
+**Gap**: Current impl uses nested `session: {...}` instead of flat fields, and omits common-envelope fields. Follow-up #250 Option B to align.
+
+### `flush-transcript`
+
+**Status**: ⚠️ Stub only (closed #251 dispatch-order fix; full impl deferred).
+
+**Actual binary envelope** (as of #251 fix):
+```json
+{
+  "type": "error",
+  "command": "flush-transcript",
+  "error": "not_yet_implemented",
+  "kind": "not_yet_implemented"
+}
+```
+
+Exit code: 1. No credentials required. Like `delete-session`, this stub resolves the #251 dispatch-order bug but the actual flush operation is not yet wired.
+
+**Aspirational (future) shape**:
+```json
+{
+  "timestamp": "2026-04-22T10:10:00Z",
+  "command": "flush-transcript",
+  "exit_code": 0,
+  "session_id": "sess_abc123",
+  "path": ".claw/sessions/sess_abc123.jsonl",
+  "flushed": true,
+  "messages_count": 12,
+  "input_tokens": 4500,
+  "output_tokens": 1200
+}
+```
+
+### `show-command`
+
+```json
+{
+  "timestamp": "2026-04-22T10:10:00Z",
+  "command": "show-command",
+  "exit_code": 0,
+  "name": "add-dir",
+  "found": true,
+  "source_hint": "commands/add-dir/add-dir.tsx",
+  "responsibility": "creates a new directory in the worktree"
+}
+```
+
+### `show-tool`
+
+```json
+{
+  "timestamp": "2026-04-22T10:10:00Z",
+  "command": "show-tool",
+  "exit_code": 0,
+  "name": "BashTool",
+  "found": true,
+  "source_hint": "tools/BashTool/BashTool.tsx"
+}
+```
+
+### `exec-command`
+
+```json
+{
+  "timestamp": "2026-04-22T10:10:00Z",
+  "command": "exec-command",
+  "exit_code": 0,
+  "name": "add-dir",
+  "prompt": "create src/util/",
+  "handled": true,
+  "message": "created directory",
+  "source_hint": "commands/add-dir/add-dir.tsx"
+}
+```
+
+### `exec-tool`
+
+```json
+{
+  "timestamp": "2026-04-22T10:10:00Z",
+  "command": "exec-tool",
+  "exit_code": 0,
+  "name": "BashTool",
+  "payload": "cargo build",
+  "handled": true,
+  "message": "exit code 0",
+  "source_hint": "tools/BashTool/BashTool.tsx"
+}
+```
+
+### `route`
+
+```json
+{
+  "timestamp": "2026-04-22T10:10:00Z",
+  "command": "route",
+  "exit_code": 0,
+  "prompt": "add a test",
+  "limit": 10,
+  "match_count": 3,
+  "matches": [
+    {
+      "kind": "command",
+      "name": "add-file",
+      "score": 0.92,
+      "source_hint": "commands/add-file/add-file.tsx"
+    }
+  ]
+}
+```
+
+### `bootstrap`
+
+```json
+{
+  "timestamp": "2026-04-22T10:10:00Z",
+  "command": "bootstrap",
+  "exit_code": 0,
+  "prompt": "hello",
+  "setup": {
+    "python_version": "3.13.12",
+    "implementation": "CPython",
+    "platform_name": "darwin",
+    "test_command": "pytest"
+  },
+  "routed_matches": [
+    {"kind": "command", "name": "init", "score": 0.85, "source_hint": "..."}
+  ],
+  "turn": {
+    "prompt": "hello",
+    "output": "...",
+    "stop_reason": "completed"
+  },
+  "persisted_session_path": ".claw/sessions/sess_abc.jsonl"
+}
+```
+
+### `command-graph`
+
+```json
+{
+  "timestamp": "2026-04-22T10:10:00Z",
+  "command": "command-graph",
+  "exit_code": 0,
+  "builtins_count": 185,
+  "plugin_like_count": 20,
+  "skill_like_count": 2,
+  "total_count": 207,
+  "builtins": [
+    {"name": "add-dir", "source_hint": "commands/add-dir/add-dir.tsx"}
+  ],
+  "plugin_like": [],
+  "skill_like": []
+}
+```
+
+### `tool-pool`
+
+```json
+{
+  "timestamp": "2026-04-22T10:10:00Z",
+  "command": "tool-pool",
+  "exit_code": 0,
+  "simple_mode": false,
+  "include_mcp": true,
+  "tool_count": 184,
+  "tools": [
+    {"name": "BashTool", "source_hint": "tools/BashTool/BashTool.tsx"}
+  ]
+}
+```
+
+### `bootstrap-graph`
+
+```json
+{
+  "timestamp": "2026-04-22T10:10:00Z",
+  "command": "bootstrap-graph",
+  "exit_code": 0,
+  "stages": ["stage 1", "stage 2", "..."],
+  "note": "bootstrap-graph is markdown-only in this version"
+}
+```
+
+---
+
+## Versioning & Compatibility
+
+- **schema_version = "1.0":** Current as of 2026-04-22. Covers all 13 clawable commands.
+- **Breaking changes** (e.g. renaming a field) bump schema_version to "2.0".
+- **Additive changes** (e.g. new optional field) stay at "1.0" and are backward compatible.
+- Downstream claws **must** check `schema_version` before relying on field presence.
+
+---
+
+## Regression Testing
+
+Each command is covered by:
+1. **Fixture file** (golden JSON snapshot under `tests/fixtures/json/<command>.json`)
+2. **Parametrised test** in `test_cli_parity_audit.py::TestJsonOutputContractEndToEnd`
+3. **Field consistency test** (new, tracked as ROADMAP #172)
+
+To update a fixture after a intentional schema change:
+```bash
+claw <command> --output-format json <args> > tests/fixtures/json/<command>.json
+# Review the diff, commit
+git add tests/fixtures/json/<command>.json
+```
+
+To verify no regressions:
+```bash
+cargo test --release test_json_envelope_field_consistency
+```
+
+---
+
+## Design Notes
+
+**Why common fields on every response?**
+- Downstream claws can build one error handler that works for all commands
+- Timestamp + command + exit_code give context without scraping argv or timestamps from command output
+- `schema_version` signals compatibility for future upgrades
+
+**Why both "found" and "error" on not-found?**
+- Exit code 1 covers both "entity missing" and "operation failed"
+- `found=false` distinguishes not-found from error without string matching
+- `error.kind` and `error.retryable` let automation decide: retry a temporary miss vs escalate a permanent refusal
+
+**Why "operation" and "target" in error?**
+- Claws can aggregate failures by operation type (e.g. "how many `write` ops failed?")
+- Claws can implement per-target retry policy (e.g. "skip missing files, retry networking")
+- Pure text errors ("No such file") do not provide enough structure for pattern matching
+
+**Why "handled" vs "found"?**
+- `show-command` reports `found: bool` (inventory signal: "does this exist?")
+- `exec-command` reports `handled: bool` (operational signal: "was this work performed?")
+- The names matter: a command can be found but not handled (e.g. too large for context window), or handled silently (no output message)
--- a/USAGE.md
+++ b/USAGE.md
@@ -2,6 +2,9 @@

 This guide covers the current Rust workspace under `rust/` and the `claw` CLI binary. If you are brand new, make the doctor health check your first run: start `claw`, then run `/doctor`.

+> [!TIP]
+> **Building orchestration code that calls `claw` as a subprocess?** See [`ERROR_HANDLING.md`](./ERROR_HANDLING.md) for the unified error-handling pattern (one handler for all 14 clawable commands, exit codes, JSON envelope contract, and recovery strategies).
+
 ## Quick-start health check

 Run this before prompts, sessions, or automation:
@@ -95,11 +98,17 @@ cd rust

 ### JSON output for scripting

+All clawable commands support `--output-format json` for machine-readable output. Every invocation returns a consistent JSON envelope with `exit_code`, `command`, `timestamp`, and either `{success fields}` or `{error: {kind, message, ...}}`.
+
 ```bash
 cd rust
 ./target/debug/claw --output-format json prompt "status"
+./target/debug/claw --output-format json load-session my-session-id
+./target/debug/claw --output-format json turn-loop "analyze logs" --max-turns 1
 ```

+**Building a dispatcher or orchestration script?** See [`ERROR_HANDLING.md`](./ERROR_HANDLING.md) for the unified error-handling pattern. One code example works for all 14 clawable commands: parse the exit code, classify by `error.kind`, apply recovery strategies (retry, timeout recovery, validation, logging). Use that pattern instead of reimplementing error handling per command.
+
 ### Inspect worker state

 The `claw state` command reads `.claw/worker-state.json`, which is written by the interactive REPL or a one-shot prompt when a worker executes a task. This file contains the worker ID, session reference, model, and permission mode.
--- a/rust/crates/rusty-claude-cli/src/main.rs
+++ b/rust/crates/rusty-claude-cli/src/main.rs
@@ -213,7 +213,16 @@ fn main() {
            // #77: classify error by prefix so downstream claws can route without
            // regex-scraping the prose. Split short-reason from hint-runbook.
            let kind = classify_error_kind(&message);
-            let (short_reason, hint) = split_error_hint(&message);
+            let (short_reason, mut hint) = split_error_hint(&message);
+            // #247: JSON envelope was losing the `Run claw --help for usage.`
+            // trailer that text-mode stderr includes. When the error is a
+            // cli_parse and the message itself carried no embedded hint,
+            // synthesize the trailer so typed-error consumers get the same
+            // actionable pointer that text-mode users see. Cross-channel
+            // consistency is a §4.44 typed-envelope contract requirement.
+            if hint.is_none() && kind == "cli_parse" && !short_reason.contains("`claw --help`") {
+                hint = Some("Run `claw --help` for usage.".to_string());
+            }
            eprintln!(
                "{}",
                serde_json::json!({
@@ -264,6 +273,12 @@ fn classify_error_kind(message: &str) -> &'static str {
        "no_managed_sessions"
    } else if message.contains("unrecognized argument") || message.contains("unknown option") {
        "cli_parse"
+    } else if message.contains("prompt subcommand requires") {
+        // #247: `claw prompt` with no argument — a parse error, not `unknown`.
+        "cli_parse"
+    } else if message.starts_with("empty prompt:") {
+        // #247: `claw ""` or `claw "   "` — a parse error, not `unknown`.
+        "cli_parse"
    } else if message.contains("invalid model syntax") {
        "invalid_model_syntax"
    } else if message.contains("is not yet implemented") {
@@ -402,7 +417,10 @@ fn run() -> Result<(), Box<dyn std::error::Error>> {
            cli.set_reasoning_effort(reasoning_effort);
            cli.run_turn_with_output(&effective_prompt, output_format, compact)?;
        }
-        CliAction::Doctor { output_format } => run_doctor(output_format)?,
+        CliAction::Doctor { output_format } => {
+            run_stale_base_preflight(None);
+            run_doctor(output_format)?
+        }
        CliAction::Acp { output_format } => print_acp_status(output_format)?,
        CliAction::State { output_format } => run_worker_state(output_format)?,
        CliAction::Init { output_format } => run_init(output_format)?,
@@ -10434,6 +10452,32 @@ mod tests {
        assert_eq!(classify_error_kind("something completely unknown"), "unknown");
    }

+    #[test]
+    fn classify_error_kind_covers_prompt_parse_errors_247() {
+        // #247: prompt-related parse errors must classify as `cli_parse`,
+        // not fall through to `unknown`. Regression guard for ROADMAP #247
+        // (typed-error contract drift found in cycle #33 dogfood).
+        assert_eq!(
+            classify_error_kind("prompt subcommand requires a prompt string"),
+            "cli_parse",
+            "bare `claw prompt` must surface as cli_parse so typed-error consumers can dispatch"
+        );
+        assert_eq!(
+            classify_error_kind(
+                "empty prompt: provide a subcommand (run `claw --help`) or a non-empty prompt string"
+            ),
+            "cli_parse",
+            "`claw \"\"` must surface as cli_parse, not unknown"
+        );
+        // Sanity: the new patterns must be specific enough not to hijack
+        // genuinely unknown errors that happen to contain the word `prompt`.
+        assert_eq!(
+            classify_error_kind("some random prompt-adjacent failure we don't recognize"),
+            "unknown",
+            "generic prompt-containing text should still fall through to unknown"
+        );
+    }
+
    #[test]
    fn split_error_hint_separates_reason_from_runbook() {
        // #77: short reason / hint separation for JSON error payloads
--- a/rust/crates/rusty-claude-cli/tests/output_format_contract.rs
+++ b/rust/crates/rusty-claude-cli/tests/output_format_contract.rs
@@ -388,6 +388,114 @@ fn assert_json_command(current_dir: &Path, args: &[&str]) -> Value {
    assert_json_command_with_env(current_dir, args, &[])
 }

+/// #247 regression helper: run claw expecting a non-zero exit and return
+/// the JSON error envelope parsed from stderr. Asserts exit != 0 and that
+/// the envelope includes `type: "error"` at the very least.
+fn assert_json_error_envelope(current_dir: &Path, args: &[&str]) -> Value {
+    let output = run_claw(current_dir, args, &[]);
+    assert!(
+        !output.status.success(),
+        "command unexpectedly succeeded; stdout:\n{}\nstderr:\n{}",
+        String::from_utf8_lossy(&output.stdout),
+        String::from_utf8_lossy(&output.stderr)
+    );
+    // The JSON envelope is written to stderr for error cases (see main.rs).
+    let envelope: Value = serde_json::from_slice(&output.stderr).unwrap_or_else(|err| {
+        panic!(
+            "stderr should be a JSON error envelope but failed to parse: {err}\nstderr bytes:\n{}",
+            String::from_utf8_lossy(&output.stderr)
+        )
+    });
+    assert_eq!(
+        envelope["type"], "error",
+        "envelope should carry type=error"
+    );
+    envelope
+}
+
+#[test]
+fn prompt_subcommand_without_arg_emits_cli_parse_envelope_with_hint_247() {
+    // #247: `claw prompt` with no argument must classify as `cli_parse`
+    // (not `unknown`) and the JSON envelope must carry the same actionable
+    // `Run claw --help for usage.` hint that text-mode stderr appends.
+    let root = unique_temp_dir("247-prompt-no-arg");
+    fs::create_dir_all(&root).expect("temp dir should exist");
+
+    let envelope = assert_json_error_envelope(&root, &["--output-format", "json", "prompt"]);
+    assert_eq!(
+        envelope["kind"], "cli_parse",
+        "prompt subcommand without arg should classify as cli_parse, envelope: {envelope}"
+    );
+    assert_eq!(
+        envelope["error"], "prompt subcommand requires a prompt string",
+        "short reason should match the raw error, envelope: {envelope}"
+    );
+    assert_eq!(
+        envelope["hint"],
+        "Run `claw --help` for usage.",
+        "JSON envelope must carry the same help-runbook hint as text mode, envelope: {envelope}"
+    );
+}
+
+#[test]
+fn empty_positional_arg_emits_cli_parse_envelope_247() {
+    // #247: `claw ""` must classify as `cli_parse`, not `unknown`. The
+    // message itself embeds a ``run `claw --help`` pointer so the explicit
+    // hint field is allowed to remain null to avoid duplication — what
+    // matters for the typed-error contract is that `kind == cli_parse`.
+    let root = unique_temp_dir("247-empty-arg");
+    fs::create_dir_all(&root).expect("temp dir should exist");
+
+    let envelope = assert_json_error_envelope(&root, &["--output-format", "json", ""]);
+    assert_eq!(
+        envelope["kind"], "cli_parse",
+        "empty-prompt error should classify as cli_parse, envelope: {envelope}"
+    );
+    let short = envelope["error"]
+        .as_str()
+        .expect("error field should be a string");
+    assert!(
+        short.starts_with("empty prompt:"),
+        "short reason should preserve the original empty-prompt message, got: {short}"
+    );
+}
+
+#[test]
+fn whitespace_only_positional_arg_emits_cli_parse_envelope_247() {
+    // #247: same rule for `claw "   "` — any whitespace-only prompt must
+    // flow through the empty-prompt path and classify as `cli_parse`.
+    let root = unique_temp_dir("247-whitespace-arg");
+    fs::create_dir_all(&root).expect("temp dir should exist");
+
+    let envelope = assert_json_error_envelope(&root, &["--output-format", "json", "   "]);
+    assert_eq!(
+        envelope["kind"], "cli_parse",
+        "whitespace-only prompt should classify as cli_parse, envelope: {envelope}"
+    );
+}
+
+#[test]
+fn unrecognized_argument_still_classifies_as_cli_parse_247_regression_guard() {
+    // #247 regression guard: the new empty-prompt / prompt-subcommand
+    // patterns must NOT hijack the existing #77 unrecognized-argument
+    // classification. `claw doctor --foo` must still surface as cli_parse
+    // with the runbook hint present.
+    let root = unique_temp_dir("247-unrecognized-arg");
+    fs::create_dir_all(&root).expect("temp dir should exist");
+
+    let envelope =
+        assert_json_error_envelope(&root, &["--output-format", "json", "doctor", "--foo"]);
+    assert_eq!(
+        envelope["kind"], "cli_parse",
+        "unrecognized-argument must remain cli_parse, envelope: {envelope}"
+    );
+    assert_eq!(
+        envelope["hint"],
+        "Run `claw --help` for usage.",
+        "unrecognized-argument hint should stay intact, envelope: {envelope}"
+    );
+}
+
 fn assert_json_command_with_env(current_dir: &Path, args: &[&str], envs: &[(&str, &str)]) -> Value {
    let output = run_claw(current_dir, args, envs);
    assert!(
--- a/src/init.py
+++ b/src/init.py
@@ -5,7 +5,16 @@ from .parity_audit import ParityAuditResult, run_parity_audit
 from .port_manifest import PortManifest, build_port_manifest
 from .query_engine import QueryEnginePort, TurnResult
 from .runtime import PortRuntime, RuntimeSession
-from .session_store import StoredSession, load_session, save_session
+from .session_store import (
+    SessionDeleteError,
+    SessionNotFoundError,
+    StoredSession,
+    delete_session,
+    list_sessions,
+    load_session,
+    save_session,
+    session_exists,
+)
 from .system_init import build_system_init_message
 from .tools import PORTED_TOOLS, build_tool_backlog

@@ -15,6 +24,8 @@ __all__ = [
    'PortRuntime',
    'QueryEnginePort',
    'RuntimeSession',
+    'SessionDeleteError',
+    'SessionNotFoundError',
    'StoredSession',
    'TurnResult',
    'PORTED_COMMANDS',
@@ -23,7 +34,10 @@ __all__ = [
    'build_port_manifest',
    'build_system_init_message',
    'build_tool_backlog',
+    'delete_session',
+    'list_sessions',
    'load_session',
    'run_parity_audit',
    'save_session',
+    'session_exists',
 ]
--- a/src/main.py
+++ b/src/main.py
@@ -12,22 +12,48 @@ from .port_manifest import build_port_manifest
 from .query_engine import QueryEnginePort
 from .remote_runtime import run_remote_mode, run_ssh_mode, run_teleport_mode
 from .runtime import PortRuntime
-from .session_store import load_session
+from .session_store import (
+    SessionDeleteError,
+    SessionNotFoundError,
+    delete_session,
+    list_sessions,
+    load_session,
+    session_exists,
+)
 from .setup import run_setup
 from .tool_pool import assemble_tool_pool
 from .tools import execute_tool, get_tool, get_tools, render_tool_index


+def wrap_json_envelope(data: dict, command: str, exit_code: int = 0) -> dict:
+    """Wrap command output in canonical JSON envelope per SCHEMAS.md."""
+    from datetime import datetime, timezone
+    now_utc = datetime.now(timezone.utc).isoformat(timespec='seconds').replace('+00:00', 'Z')
+    return {
+        'timestamp': now_utc,
+        'command': command,
+        'exit_code': exit_code,
+        'output_format': 'json',
+        'schema_version': '1.0',
+        **data,
+    }
+
+
 def build_parser() -> argparse.ArgumentParser:
    parser = argparse.ArgumentParser(description='Python porting workspace for the Claude Code rewrite effort')
+    # #180: Add --version flag to match canonical CLI contract
+    parser.add_argument('--version', action='version', version='claw-code 1.0.0 (Python harness)')
    subparsers = parser.add_subparsers(dest='command', required=True)
    subparsers.add_parser('summary', help='render a Markdown summary of the Python porting workspace')
    subparsers.add_parser('manifest', help='print the current Python workspace manifest')
    subparsers.add_parser('parity-audit', help='compare the Python workspace against the local ignored TypeScript archive when available')
    subparsers.add_parser('setup-report', help='render the startup/prefetch setup report')
-    subparsers.add_parser('command-graph', help='show command graph segmentation')
-    subparsers.add_parser('tool-pool', help='show assembled tool pool with default settings')
-    subparsers.add_parser('bootstrap-graph', help='show the mirrored bootstrap/runtime graph stages')
+    command_graph_parser = subparsers.add_parser('command-graph', help='show command graph segmentation')
+    command_graph_parser.add_argument('--output-format', choices=['text', 'json'], default='text')
+    tool_pool_parser = subparsers.add_parser('tool-pool', help='show assembled tool pool with default settings')
+    tool_pool_parser.add_argument('--output-format', choices=['text', 'json'], default='text')
+    bootstrap_graph_parser = subparsers.add_parser('bootstrap-graph', help='show the mirrored bootstrap/runtime graph stages')
+    bootstrap_graph_parser.add_argument('--output-format', choices=['text', 'json'], default='text')
    list_parser = subparsers.add_parser('subsystems', help='list the current Python modules in the workspace')
    list_parser.add_argument('--limit', type=int, default=32)

@@ -48,22 +74,104 @@ def build_parser() -> argparse.ArgumentParser:
    route_parser = subparsers.add_parser('route', help='route a prompt across mirrored command/tool inventories')
    route_parser.add_argument('prompt')
    route_parser.add_argument('--limit', type=int, default=5)
+    # #168: parity with show-command/show-tool/session-lifecycle CLI family
+    route_parser.add_argument('--output-format', choices=['text', 'json'], default='text')

    bootstrap_parser = subparsers.add_parser('bootstrap', help='build a runtime-style session report from the mirrored inventories')
    bootstrap_parser.add_argument('prompt')
    bootstrap_parser.add_argument('--limit', type=int, default=5)
+    # #168: parity with CLI family
+    bootstrap_parser.add_argument('--output-format', choices=['text', 'json'], default='text')

    loop_parser = subparsers.add_parser('turn-loop', help='run a small stateful turn loop for the mirrored runtime')
    loop_parser.add_argument('prompt')
    loop_parser.add_argument('--limit', type=int, default=5)
    loop_parser.add_argument('--max-turns', type=int, default=3)
    loop_parser.add_argument('--structured-output', action='store_true')
+    loop_parser.add_argument(
+        '--timeout-seconds',
+        type=float,
+        default=None,
+        help='total wall-clock budget across all turns (#161). Default: unbounded.',
+    )
+    loop_parser.add_argument(
+        '--continuation-prompt',
+        default=None,
+        help=(
+            'prompt to submit on turns after the first (#163). Default: None '
+            '(loop stops after turn 0). Replaces the deprecated implicit "[turn N]" '
+            'suffix that used to pollute the transcript.'
+        ),
+    )
+    loop_parser.add_argument(
+        '--output-format',
+        choices=['text', 'json'],
+        default='text',
+        help='output format (#164 Stage B: JSON includes cancel_observed per turn)',
+    )

-    flush_parser = subparsers.add_parser('flush-transcript', help='persist and flush a temporary session transcript')
+    flush_parser = subparsers.add_parser(
+        'flush-transcript',
+        help='persist and flush a temporary session transcript (#160/#166: claw-native session API)',
+    )
    flush_parser.add_argument('prompt')
+    flush_parser.add_argument(
+        '--directory', help='session storage directory (default: .port_sessions)'
+    )
+    flush_parser.add_argument(
+        '--output-format',
+        choices=['text', 'json'],
+        default='text',
+        help='output format',
+    )
+    flush_parser.add_argument(
+        '--session-id',
+        help='deterministic session ID (default: auto-generated UUID)',
+    )

-    load_session_parser = subparsers.add_parser('load-session', help='load a previously persisted session')
+    load_session_parser = subparsers.add_parser(
+        'load-session',
+        help='load a previously persisted session (#160/#165: claw-native session API)',
+    )
    load_session_parser.add_argument('session_id')
+    load_session_parser.add_argument(
+        '--directory', help='session storage directory (default: .port_sessions)'
+    )
+    load_session_parser.add_argument(
+        '--output-format',
+        choices=['text', 'json'],
+        default='text',
+        help='output format',
+    )
+
+    list_sessions_parser = subparsers.add_parser(
+        'list-sessions',
+        help='enumerate stored session IDs (#160: claw-native session API)',
+    )
+    list_sessions_parser.add_argument(
+        '--directory', help='session storage directory (default: .port_sessions)'
+    )
+    list_sessions_parser.add_argument(
+        '--output-format',
+        choices=['text', 'json'],
+        default='text',
+        help='output format',
+    )
+
+    delete_session_parser = subparsers.add_parser(
+        'delete-session',
+        help='delete a persisted session (#160: idempotent, race-safe)',
+    )
+    delete_session_parser.add_argument('session_id')
+    delete_session_parser.add_argument(
+        '--directory', help='session storage directory (default: .port_sessions)'
+    )
+    delete_session_parser.add_argument(
+        '--output-format',
+        choices=['text', 'json'],
+        default='text',
+        help='output format',
+    )

    remote_parser = subparsers.add_parser('remote-mode', help='simulate remote-control runtime branching')
    remote_parser.add_argument('target')
@@ -78,22 +186,112 @@ def build_parser() -> argparse.ArgumentParser:

    show_command = subparsers.add_parser('show-command', help='show one mirrored command entry by exact name')
    show_command.add_argument('name')
+    show_command.add_argument('--output-format', choices=['text', 'json'], default='text')
    show_tool = subparsers.add_parser('show-tool', help='show one mirrored tool entry by exact name')
    show_tool.add_argument('name')
+    show_tool.add_argument('--output-format', choices=['text', 'json'], default='text')

    exec_command_parser = subparsers.add_parser('exec-command', help='execute a mirrored command shim by exact name')
    exec_command_parser.add_argument('name')
    exec_command_parser.add_argument('prompt')
+    # #168: parity with CLI family
+    exec_command_parser.add_argument('--output-format', choices=['text', 'json'], default='text')

    exec_tool_parser = subparsers.add_parser('exec-tool', help='execute a mirrored tool shim by exact name')
    exec_tool_parser.add_argument('name')
    exec_tool_parser.add_argument('payload')
+    # #168: parity with CLI family
+    exec_tool_parser.add_argument('--output-format', choices=['text', 'json'], default='text')
    return parser


+class _ArgparseError(Exception):
+    """#179: internal exception capturing argparse's real error message.
+
+    Subclassed ArgumentParser raises this instead of printing + exiting,
+    so JSON mode can preserve the actual error (e.g. 'the following arguments
+    are required: session_id') in the envelope.
+    """
+    def __init__(self, message: str) -> None:
+        super().__init__(message)
+        self.message = message
+
+
+def _emit_parse_error_envelope(argv: list[str], message: str) -> None:
+    """#178/#179: emit JSON envelope for argparse-level errors when --output-format json is requested.
+
+    Pre-scans argv for --output-format json. If found, prints a parse-error envelope
+    to stdout (per SCHEMAS.md 'error' envelope shape) instead of letting argparse
+    dump help text to stderr. This preserves the JSON contract for claws that can't
+    parse argparse usage messages.
+
+    #179 update: `message` now carries argparse's actual error text, not a generic
+    rejection string. Stderr is fully suppressed in JSON mode.
+    """
+    import json
+    # Extract the attempted command (argv[0] is the first positional)
+    attempted = argv[0] if argv and not argv[0].startswith('-') else '<missing>'
+    envelope = wrap_json_envelope(
+        {
+            'error': {
+                'kind': 'parse',
+                'operation': 'argparse',
+                'target': attempted,
+                'retryable': False,
+                'message': message,
+                'hint': 'run with no arguments to see available subcommands',
+            },
+        },
+        command=attempted,
+        exit_code=1,
+    )
+    print(json.dumps(envelope))
+
+
+def _wants_json_output(argv: list[str]) -> bool:
+    """#178: check if argv contains --output-format json anywhere (for parse-error routing)."""
+    for i, arg in enumerate(argv):
+        if arg == '--output-format' and i + 1 < len(argv) and argv[i + 1] == 'json':
+            return True
+        if arg == '--output-format=json':
+            return True
+    return False
+
+
 def main(argv: list[str] | None = None) -> int:
+    import sys
+    if argv is None:
+        argv = sys.argv[1:]
    parser = build_parser()
-    args = parser.parse_args(argv)
+    json_mode = _wants_json_output(argv)
+    # #178/#179: capture argparse errors with real message and emit JSON envelope
+    # when --output-format json is requested. In JSON mode, stderr is silenced
+    # so claws only see the envelope on stdout.
+    if json_mode:
+        # Monkey-patch parser.error to raise instead of print+exit. This preserves
+        # the original error message text (e.g. 'argument X: invalid choice: ...').
+        original_error = parser.error
+        def _json_mode_error(message: str) -> None:
+            raise _ArgparseError(message)
+        parser.error = _json_mode_error  # type: ignore[method-assign]
+        # Also patch all subparsers
+        for action in parser._actions:
+            if hasattr(action, 'choices') and isinstance(action.choices, dict):
+                for subp in action.choices.values():
+                    subp.error = _json_mode_error  # type: ignore[method-assign]
+        try:
+            args = parser.parse_args(argv)
+        except _ArgparseError as err:
+            _emit_parse_error_envelope(argv, err.message)
+            return 1
+        except SystemExit as exc:
+            # Defensive: if argparse exits via some other path (e.g. --help in JSON mode)
+            if exc.code != 0:
+                _emit_parse_error_envelope(argv, 'argparse exited with non-zero code')
+                return 1
+            raise
+    else:
+        args = parser.parse_args(argv)
    manifest = build_port_manifest()
    if args.command == 'summary':
        print(QueryEnginePort(manifest).render_summary())
@@ -108,13 +306,44 @@ def main(argv: list[str] | None = None) -> int:
        print(run_setup().as_markdown())
        return 0
    if args.command == 'command-graph':
-        print(build_command_graph().as_markdown())
+        graph = build_command_graph()
+        if args.output_format == 'json':
+            import json
+            envelope = {
+                'builtins_count': len(graph.builtins),
+                'plugin_like_count': len(graph.plugin_like),
+                'skill_like_count': len(graph.skill_like),
+                'total_count': len(graph.flattened()),
+                'builtins': [{'name': m.name, 'source_hint': m.source_hint} for m in graph.builtins],
+                'plugin_like': [{'name': m.name, 'source_hint': m.source_hint} for m in graph.plugin_like],
+                'skill_like': [{'name': m.name, 'source_hint': m.source_hint} for m in graph.skill_like],
+            }
+            print(json.dumps(wrap_json_envelope(envelope, args.command)))
+        else:
+            print(graph.as_markdown())
        return 0
    if args.command == 'tool-pool':
-        print(assemble_tool_pool().as_markdown())
+        pool = assemble_tool_pool()
+        if args.output_format == 'json':
+            import json
+            envelope = {
+                'simple_mode': pool.simple_mode,
+                'include_mcp': pool.include_mcp,
+                'tool_count': len(pool.tools),
+                'tools': [{'name': t.name, 'source_hint': t.source_hint} for t in pool.tools],
+            }
+            print(json.dumps(wrap_json_envelope(envelope, args.command)))
+        else:
+            print(pool.as_markdown())
        return 0
    if args.command == 'bootstrap-graph':
-        print(build_bootstrap_graph().as_markdown())
+        graph = build_bootstrap_graph()
+        if args.output_format == 'json':
+            import json
+            envelope = {'stages': graph.as_markdown().split('\n'), 'note': 'bootstrap-graph is markdown-only in this version'}
+            print(json.dumps(wrap_json_envelope(envelope, args.command)))
+        else:
+            print(graph.as_markdown())
        return 0
    if args.command == 'subsystems':
        for subsystem in manifest.top_level_modules[: args.limit]:
@@ -141,6 +370,25 @@ def main(argv: list[str] | None = None) -> int:
        return 0
    if args.command == 'route':
        matches = PortRuntime().route_prompt(args.prompt, limit=args.limit)
+        # #168: JSON envelope for machine parsing
+        if args.output_format == 'json':
+            import json
+            envelope = {
+                'prompt': args.prompt,
+                'limit': args.limit,
+                'match_count': len(matches),
+                'matches': [
+                    {
+                        'kind': m.kind,
+                        'name': m.name,
+                        'score': m.score,
+                        'source_hint': m.source_hint,
+                    }
+                    for m in matches
+                ],
+            }
+            print(json.dumps(wrap_json_envelope(envelope, args.command)))
+            return 0
        if not matches:
            print('No mirrored command/tool matches found.')
            return 0
@@ -148,25 +396,220 @@ def main(argv: list[str] | None = None) -> int:
            print(f'{match.kind}\t{match.name}\t{match.score}\t{match.source_hint}')
        return 0
    if args.command == 'bootstrap':
-        print(PortRuntime().bootstrap_session(args.prompt, limit=args.limit).as_markdown())
+        session = PortRuntime().bootstrap_session(args.prompt, limit=args.limit)
+        # #168: JSON envelope for machine parsing
+        if args.output_format == 'json':
+            import json
+            envelope = {
+                'prompt': session.prompt,
+                'limit': args.limit,
+                'setup': {
+                    'python_version': session.setup.python_version,
+                    'implementation': session.setup.implementation,
+                    'platform_name': session.setup.platform_name,
+                    'test_command': session.setup.test_command,
+                },
+                'routed_matches': [
+                    {
+                        'kind': m.kind,
+                        'name': m.name,
+                        'score': m.score,
+                        'source_hint': m.source_hint,
+                    }
+                    for m in session.routed_matches
+                ],
+                'command_execution_messages': list(session.command_execution_messages),
+                'tool_execution_messages': list(session.tool_execution_messages),
+                'turn': {
+                    'prompt': session.turn_result.prompt,
+                    'output': session.turn_result.output,
+                    'stop_reason': session.turn_result.stop_reason,
+                    'cancel_observed': session.turn_result.cancel_observed,
+                },
+                'persisted_session_path': session.persisted_session_path,
+            }
+            print(json.dumps(wrap_json_envelope(envelope, args.command)))
+            return 0
+        print(session.as_markdown())
        return 0
    if args.command == 'turn-loop':
-        results = PortRuntime().run_turn_loop(args.prompt, limit=args.limit, max_turns=args.max_turns, structured_output=args.structured_output)
+        results = PortRuntime().run_turn_loop(
+            args.prompt,
+            limit=args.limit,
+            max_turns=args.max_turns,
+            structured_output=args.structured_output,
+            timeout_seconds=args.timeout_seconds,
+            continuation_prompt=args.continuation_prompt,
+        )
+        # Exit 2 when a timeout terminated the loop so claws can distinguish
+        # 'ran to completion' from 'hit wall-clock budget'.
+        loop_exit_code = 2 if results and results[-1].stop_reason == 'timeout' else 0
+        if args.output_format == 'json':
+            # #164 Stage B + #173: JSON envelope with per-turn cancel_observed
+            # Promotes turn-loop from OPT_OUT to CLAWABLE surface.
+            import json
+            envelope = {
+                'prompt': args.prompt,
+                'max_turns': args.max_turns,
+                'turns_completed': len(results),
+                'timeout_seconds': args.timeout_seconds,
+                'continuation_prompt': args.continuation_prompt,
+                'turns': [
+                    {
+                        'prompt': r.prompt,
+                        'output': r.output,
+                        'stop_reason': r.stop_reason,
+                        'cancel_observed': r.cancel_observed,
+                        'matched_commands': list(r.matched_commands),
+                        'matched_tools': list(r.matched_tools),
+                    }
+                    for r in results
+                ],
+                'final_stop_reason': results[-1].stop_reason if results else None,
+                'final_cancel_observed': results[-1].cancel_observed if results else False,
+            }
+            print(json.dumps(wrap_json_envelope(envelope, args.command, exit_code=loop_exit_code)))
+            return loop_exit_code
        for idx, result in enumerate(results, start=1):
            print(f'## Turn {idx}')
            print(result.output)
            print(f'stop_reason={result.stop_reason}')
-        return 0
+        return loop_exit_code
    if args.command == 'flush-transcript':
+        from pathlib import Path as _Path
        engine = QueryEnginePort.from_workspace()
+        # #166: allow deterministic session IDs for claw checkpointing/replay.
+        # When unset, the engine's auto-generated UUID is used (backward compat).
+        if args.session_id:
+            engine.session_id = args.session_id
        engine.submit_message(args.prompt)
-        path = engine.persist_session()
-        print(path)
-        print(f'flushed={engine.transcript_store.flushed}')
+        directory = _Path(args.directory) if args.directory else None
+        path = engine.persist_session(directory)
+        if args.output_format == 'json':
+            import json as _json
+            _env = {
+                'session_id': engine.session_id,
+                'path': path,
+                'flushed': engine.transcript_store.flushed,
+                'messages_count': len(engine.mutable_messages),
+                'input_tokens': engine.total_usage.input_tokens,
+                'output_tokens': engine.total_usage.output_tokens,
+            }
+            print(_json.dumps(wrap_json_envelope(_env, args.command)))
+        else:
+            # #166: legacy text output preserved byte-for-byte for backward compat.
+            print(path)
+            print(f'flushed={engine.transcript_store.flushed}')
        return 0
    if args.command == 'load-session':
-        session = load_session(args.session_id)
-        print(f'{session.session_id}\n{len(session.messages)} messages\nin={session.input_tokens} out={session.output_tokens}')
+        from pathlib import Path as _Path
+        directory = _Path(args.directory) if args.directory else None
+        # #165: catch typed SessionNotFoundError + surface a JSON error envelope
+        # matching the delete-session contract shape. No more raw tracebacks.
+        try:
+            session = load_session(args.session_id, directory)
+        except SessionNotFoundError as exc:
+            if args.output_format == 'json':
+                import json as _json
+                resolved_dir = str(directory) if directory else '.port_sessions'
+                _env = {
+                    'session_id': args.session_id,
+                    'loaded': False,
+                    'error': {
+                        'kind': 'session_not_found',
+                        'message': str(exc),
+                        'directory': resolved_dir,
+                        'retryable': False,
+                    },
+                }
+                print(_json.dumps(wrap_json_envelope(_env, args.command, exit_code=1)))
+            else:
+                print(f'error: {exc}')
+            return 1
+        except (OSError, ValueError) as exc:
+            # Corrupted session file, IO error, JSON decode error — distinct
+            # from 'not found'. Callers may retry here (fs glitch).
+            if args.output_format == 'json':
+                import json as _json
+                resolved_dir = str(directory) if directory else '.port_sessions'
+                _env = {
+                    'session_id': args.session_id,
+                    'loaded': False,
+                    'error': {
+                        'kind': 'session_load_failed',
+                        'message': str(exc),
+                        'directory': resolved_dir,
+                        'retryable': True,
+                    },
+                }
+                print(_json.dumps(wrap_json_envelope(_env, args.command, exit_code=1)))
+            else:
+                print(f'error: {exc}')
+            return 1
+        if args.output_format == 'json':
+            import json as _json
+            _env = {
+                'session_id': session.session_id,
+                'loaded': True,
+                'messages_count': len(session.messages),
+                'input_tokens': session.input_tokens,
+                'output_tokens': session.output_tokens,
+            }
+            print(_json.dumps(wrap_json_envelope(_env, args.command)))
+        else:
+            print(f'{session.session_id}\n{len(session.messages)} messages\nin={session.input_tokens} out={session.output_tokens}')
+        return 0
+    if args.command == 'list-sessions':
+        from pathlib import Path as _Path
+        directory = _Path(args.directory) if args.directory else None
+        ids = list_sessions(directory)
+        if args.output_format == 'json':
+            import json as _json
+            _env = {'sessions': ids, 'count': len(ids)}
+            print(_json.dumps(wrap_json_envelope(_env, args.command)))
+        else:
+            if not ids:
+                print('(no sessions)')
+            else:
+                for sid in ids:
+                    print(sid)
+        return 0
+    if args.command == 'delete-session':
+        from pathlib import Path as _Path
+        directory = _Path(args.directory) if args.directory else None
+        try:
+            deleted = delete_session(args.session_id, directory)
+        except SessionDeleteError as exc:
+            if args.output_format == 'json':
+                import json as _json
+                _env = {
+                    'session_id': args.session_id,
+                    'deleted': False,
+                    'error': {
+                        'kind': 'session_delete_failed',
+                        'message': str(exc),
+                        'retryable': True,
+                    },
+                }
+                print(_json.dumps(wrap_json_envelope(_env, args.command, exit_code=1)))
+            else:
+                print(f'error: {exc}')
+            return 1
+        if args.output_format == 'json':
+            import json as _json
+            _env = {
+                'session_id': args.session_id,
+                'deleted': deleted,
+                'status': 'deleted' if deleted else 'not_found',
+            }
+            print(_json.dumps(wrap_json_envelope(_env, args.command)))
+        else:
+            if deleted:
+                print(f'deleted: {args.session_id}')
+            else:
+                print(f'not found: {args.session_id}')
+        # Exit 0 for both cases — delete_session is idempotent,
+        # not-found is success from a cleanup perspective
        return 0
    if args.command == 'remote-mode':
        print(run_remote_mode(args.target).as_text())
@@ -186,25 +629,123 @@ def main(argv: list[str] | None = None) -> int:
    if args.command == 'show-command':
        module = get_command(args.name)
        if module is None:
-            print(f'Command not found: {args.name}')
+            if args.output_format == 'json':
+                import json
+                error_envelope = {
+                    'name': args.name,
+                    'found': False,
+                    'error': {
+                        'kind': 'command_not_found',
+                        'message': f'Unknown command: {args.name}',
+                        'retryable': False,
+                    },
+                }
+                print(json.dumps(wrap_json_envelope(error_envelope, args.command, exit_code=1)))
+            else:
+                print(f'Command not found: {args.name}')
            return 1
-        print('\n'.join([module.name, module.source_hint, module.responsibility]))
+        if args.output_format == 'json':
+            import json
+            output = {
+                'name': module.name,
+                'found': True,
+                'source_hint': module.source_hint,
+                'responsibility': module.responsibility,
+            }
+            print(json.dumps(wrap_json_envelope(output, args.command)))
+        else:
+            print('\n'.join([module.name, module.source_hint, module.responsibility]))
        return 0
    if args.command == 'show-tool':
        module = get_tool(args.name)
        if module is None:
-            print(f'Tool not found: {args.name}')
+            if args.output_format == 'json':
+                import json
+                error_envelope = {
+                    'name': args.name,
+                    'found': False,
+                    'error': {
+                        'kind': 'tool_not_found',
+                        'message': f'Unknown tool: {args.name}',
+                        'retryable': False,
+                    },
+                }
+                print(json.dumps(wrap_json_envelope(error_envelope, args.command, exit_code=1)))
+            else:
+                print(f'Tool not found: {args.name}')
            return 1
-        print('\n'.join([module.name, module.source_hint, module.responsibility]))
+        if args.output_format == 'json':
+            import json
+            output = {
+                'name': module.name,
+                'found': True,
+                'source_hint': module.source_hint,
+                'responsibility': module.responsibility,
+            }
+            print(json.dumps(wrap_json_envelope(output, args.command)))
+        else:
+            print('\n'.join([module.name, module.source_hint, module.responsibility]))
        return 0
    if args.command == 'exec-command':
        result = execute_command(args.name, args.prompt)
-        print(result.message)
-        return 0 if result.handled else 1
+        # #168: JSON envelope with typed not-found error
+        # #181: envelope exit_code must match process exit code
+        exit_code = 0 if result.handled else 1
+        if args.output_format == 'json':
+            import json
+            if not result.handled:
+                envelope = {
+                    'name': args.name,
+                    'prompt': args.prompt,
+                    'handled': False,
+                    'error': {
+                        'kind': 'command_not_found',
+                        'message': result.message,
+                        'retryable': False,
+                    },
+                }
+            else:
+                envelope = {
+                    'name': result.name,
+                    'prompt': result.prompt,
+                    'source_hint': result.source_hint,
+                    'handled': True,
+                    'message': result.message,
+                }
+            print(json.dumps(wrap_json_envelope(envelope, args.command, exit_code=exit_code)))
+        else:
+            print(result.message)
+        return exit_code
    if args.command == 'exec-tool':
        result = execute_tool(args.name, args.payload)
-        print(result.message)
-        return 0 if result.handled else 1
+        # #168: JSON envelope with typed not-found error
+        # #181: envelope exit_code must match process exit code
+        exit_code = 0 if result.handled else 1
+        if args.output_format == 'json':
+            import json
+            if not result.handled:
+                envelope = {
+                    'name': args.name,
+                    'payload': args.payload,
+                    'handled': False,
+                    'error': {
+                        'kind': 'tool_not_found',
+                        'message': result.message,
+                        'retryable': False,
+                    },
+                }
+            else:
+                envelope = {
+                    'name': result.name,
+                    'payload': result.payload,
+                    'source_hint': result.source_hint,
+                    'handled': True,
+                    'message': result.message,
+                }
+            print(json.dumps(wrap_json_envelope(envelope, args.command, exit_code=exit_code)))
+        else:
+            print(result.message)
+        return exit_code
    parser.error(f'unknown command: {args.command}')
    return 2

--- a/src/query_engine.py
+++ b/src/query_engine.py
@@ -1,6 +1,7 @@
 from __future__ import annotations

 import json
+import threading
 from dataclasses import dataclass, field
 from uuid import uuid4

@@ -30,6 +31,7 @@ class TurnResult:
    permission_denials: tuple[PermissionDenial, ...]
    usage: UsageSummary
    stop_reason: str
+    cancel_observed: bool = False


@dataclass
@@ -64,7 +66,59 @@ class QueryEnginePort:
        matched_commands: tuple[str, ...] = (),
        matched_tools: tuple[str, ...] = (),
        denied_tools: tuple[PermissionDenial, ...] = (),
+        cancel_event: threading.Event | None = None,
    ) -> TurnResult:
+        """Submit a prompt and return a TurnResult.
+
+        #164 Stage A: cooperative cancellation via cancel_event.
+
+        The cancel_event argument (added for #164) lets a caller request early
+        termination at a safe point. When set before the pre-mutation commit
+        stage, submit_message returns early with ``stop_reason='cancelled'``
+        and the engine's state (mutable_messages, transcript_store,
+        permission_denials, total_usage) is left **exactly as it was on
+        entry**. This closes the #161 follow-up gap: before this change, a
+        wedged provider thread could finish executing and silently mutate
+        state after the caller had already observed ``stop_reason='timeout'``,
+        giving the session a ghost turn the caller never acknowledged.
+
+        Contract:
+          - cancel_event is None (default) — legacy behaviour, no checks.
+          - cancel_event set **before** budget check — returns 'cancelled'
+            immediately; no output synthesis, no projection, no mutation.
+          - cancel_event set **between** budget check and commit — returns
+            'cancelled' with state intact.
+          - cancel_event set **after** commit — not observable; the turn is
+            already committed and the caller sees 'completed'. Cancellation
+            is a *safe point* mechanism, not preemption. This is the honest
+            limit of cooperative cancellation in Python threading land.
+
+        Stop reason taxonomy after #164 Stage A:
+          - 'completed'            — turn committed, state mutated exactly once
+          - 'max_budget_reached'   — overflow, state unchanged (#162)
+          - 'max_turns_reached'    — capacity exceeded, state unchanged
+          - 'cancelled'            — cancel_event observed, state unchanged
+          - 'timeout'              — synthesised by runtime, not engine (#161)
+
+        Callers that care about deadline-driven cancellation (run_turn_loop)
+        can now request cleanup by setting the event on timeout — the next
+        submit_message on the same engine will observe it at the start and
+        return 'cancelled' without touching state, even if the previous call
+        is still wedged in provider IO.
+        """
+        # #164 Stage A: earliest safe cancellation point. No output synthesis,
+        # no budget projection, no mutation — just an immediate clean return.
+        if cancel_event is not None and cancel_event.is_set():
+            return TurnResult(
+                prompt=prompt,
+                output='',
+                matched_commands=matched_commands,
+                matched_tools=matched_tools,
+                permission_denials=denied_tools,
+                usage=self.total_usage,  # unchanged
+                stop_reason='cancelled',
+            )
+
        if len(self.mutable_messages) >= self.config.max_turns:
            output = f'Max turns reached before processing prompt: {prompt}'
            return TurnResult(
@@ -85,9 +139,40 @@ class QueryEnginePort:
        ]
        output = self._format_output(summary_lines)
        projected_usage = self.total_usage.add_turn(prompt, output)
-        stop_reason = 'completed'
+
+        # #162: budget check must precede mutation. Previously this block set
+        # stop_reason='max_budget_reached' but still appended the overflow turn
+        # to mutable_messages / transcript_store / permission_denials, corrupting
+        # the session for any caller that persisted it afterwards. The overflow
+        # prompt was effectively committed even though the TurnResult signalled
+        # rejection. Now we early-return with pre-mutation state intact so
+        # callers can safely retry with a smaller prompt or a fresh budget.
        if projected_usage.input_tokens + projected_usage.output_tokens > self.config.max_budget_tokens:
-            stop_reason = 'max_budget_reached'
+            return TurnResult(
+                prompt=prompt,
+                output=output,
+                matched_commands=matched_commands,
+                matched_tools=matched_tools,
+                permission_denials=denied_tools,
+                usage=self.total_usage,  # unchanged — overflow turn was rejected
+                stop_reason='max_budget_reached',
+            )
+
+        # #164 Stage A: second safe cancellation point. Projection is done
+        # but nothing has been committed yet. If the caller cancelled while
+        # we were building output / computing budget, honour it here — still
+        # no mutation.
+        if cancel_event is not None and cancel_event.is_set():
+            return TurnResult(
+                prompt=prompt,
+                output=output,
+                matched_commands=matched_commands,
+                matched_tools=matched_tools,
+                permission_denials=denied_tools,
+                usage=self.total_usage,  # unchanged
+                stop_reason='cancelled',
+            )
+
        self.mutable_messages.append(prompt)
        self.transcript_store.append(prompt)
        self.permission_denials.extend(denied_tools)
@@ -100,7 +185,7 @@ class QueryEnginePort:
            matched_tools=matched_tools,
            permission_denials=denied_tools,
            usage=self.total_usage,
-            stop_reason=stop_reason,
+            stop_reason='completed',
        )

    def stream_submit_message(
@@ -137,7 +222,19 @@ class QueryEnginePort:
    def flush_transcript(self) -> None:
        self.transcript_store.flush()

-    def persist_session(self) -> str:
+    def persist_session(self, directory: 'Path | None' = None) -> str:
+        """Flush the transcript and save the session to disk.
+
+        Args:
+            directory: Optional override for the storage directory. When None
+                (default, for backward compat), uses the default location
+                (``.port_sessions`` in CWD). When set, passes through to
+                ``save_session`` which already supports directory overrides.
+
+        #166: added directory parameter to match the session-lifecycle CLI
+        surface established by #160/#165. Claws running out-of-tree can now
+        redirect session creation to a workspace-specific dir without chdir.
+        """
        self.flush_transcript()
        path = save_session(
            StoredSession(
@@ -145,7 +242,8 @@ class QueryEnginePort:
                messages=tuple(self.mutable_messages),
                input_tokens=self.total_usage.input_tokens,
                output_tokens=self.total_usage.output_tokens,
-            )
+            ),
+            directory,
        )
        return str(path)

--- a/src/runtime.py
+++ b/src/runtime.py
@@ -1,11 +1,14 @@
 from __future__ import annotations

+import threading
+import time
+from concurrent.futures import ThreadPoolExecutor, TimeoutError as FuturesTimeoutError
 from dataclasses import dataclass

 from .commands import PORTED_COMMANDS
 from .context import PortContext, build_port_context, render_context
 from .history import HistoryLog
-from .models import PermissionDenial, PortingModule
+from .models import PermissionDenial, PortingModule, UsageSummary
 from .query_engine import QueryEngineConfig, QueryEnginePort, TurnResult
 from .setup import SetupReport, WorkspaceSetup, run_setup
 from .system_init import build_system_init_message
@@ -151,21 +154,161 @@ class PortRuntime:
            persisted_session_path=persisted_session_path,
        )

-    def run_turn_loop(self, prompt: str, limit: int = 5, max_turns: int = 3, structured_output: bool = False) -> list[TurnResult]:
+    def run_turn_loop(
+        self,
+        prompt: str,
+        limit: int = 5,
+        max_turns: int = 3,
+        structured_output: bool = False,
+        timeout_seconds: float | None = None,
+        continuation_prompt: str | None = None,
+    ) -> list[TurnResult]:
+        """Run a multi-turn engine loop with optional wall-clock deadline.
+
+        Args:
+            prompt: The initial prompt to submit.
+            limit: Match routing limit.
+            max_turns: Maximum number of turns before stopping.
+            structured_output: Whether to request structured output.
+            timeout_seconds: Total wall-clock budget across all turns. When the
+                budget is exhausted mid-turn, a synthetic TurnResult with
+                ``stop_reason='timeout'`` is appended and the loop exits.
+                ``None`` (default) preserves legacy unbounded behaviour.
+            continuation_prompt: What to send on turns after the first. When
+                ``None`` (default, #163), the loop stops after turn 0 and the
+                caller decides how to continue. When set, the same text is
+                submitted for every turn after the first, giving claws a clean
+                hook for structured follow-ups (e.g. ``"Continue."``, a
+                routing-planner instruction, or a tool-output cue). Previously
+                the loop silently appended ``" [turn N]"`` to the original
+                prompt, polluting the transcript with harness-generated
+                annotation the model had no way to interpret.
+
+        Returns:
+            A list of TurnResult objects. The final entry's ``stop_reason``
+            distinguishes ``'completed'``, ``'max_turns_reached'``,
+            ``'max_budget_reached'``, or ``'timeout'``.
+
+        #161: prior to this change a hung ``engine.submit_message`` call would
+        block the loop indefinitely with no cancellation path, forcing claws to
+        rely on external watchdogs or OS-level kills. Callers can now enforce a
+        deadline and receive a typed timeout signal instead.
+
+        #163: the old ``f'{prompt} [turn {turn + 1}]'`` suffix was never
+        interpreted by the engine or any system prompt. It looked like a real
+        user turn in ``mutable_messages`` and the transcript, making replay and
+        analysis fragile. Removed entirely; callers supply ``continuation_prompt``
+        for meaningful follow-ups or let the loop stop after turn 0.
+        """
        engine = QueryEnginePort.from_workspace()
        engine.config = QueryEngineConfig(max_turns=max_turns, structured_output=structured_output)
        matches = self.route_prompt(prompt, limit=limit)
        command_names = tuple(match.name for match in matches if match.kind == 'command')
        tool_names = tuple(match.name for match in matches if match.kind == 'tool')
+        # #159: infer permission denials from the routed matches, not hardcoded empty tuple.
+        # Multi-turn sessions must have the same security posture as bootstrap_session.
+        denied_tools = tuple(self._infer_permission_denials(matches))
        results: list[TurnResult] = []
-        for turn in range(max_turns):
-            turn_prompt = prompt if turn == 0 else f'{prompt} [turn {turn + 1}]'
-            result = engine.submit_message(turn_prompt, command_names, tool_names, ())
-            results.append(result)
-            if result.stop_reason != 'completed':
-                break
+        deadline = time.monotonic() + timeout_seconds if timeout_seconds is not None else None
+        # #164 Stage A: shared cancel_event signals cooperative cancellation
+        # across turns. On timeout we set() it so any still-running
+        # submit_message call (or the next one on the same engine) observes
+        # the cancel at a safe checkpoint and returns stop_reason='cancelled'
+        # without mutating state. This closes the window where a wedged
+        # provider thread could commit a ghost turn after the caller saw
+        # 'timeout'.
+        cancel_event = threading.Event() if deadline is not None else None
+
+        # ThreadPoolExecutor is reused across turns so we cancel cleanly on exit.
+        executor = ThreadPoolExecutor(max_workers=1) if deadline is not None else None
+        try:
+            for turn in range(max_turns):
+                # #163: no more f'{prompt} [turn N]' suffix injection.
+                # On turn 0 submit the original prompt.
+                # On turn > 0, submit the caller-supplied continuation prompt;
+                # if the caller did not supply one, stop the loop cleanly instead
+                # of fabricating a fake user turn.
+                if turn == 0:
+                    turn_prompt = prompt
+                elif continuation_prompt is not None:
+                    turn_prompt = continuation_prompt
+                else:
+                    break
+
+                if deadline is None:
+                    # Legacy path: unbounded call, preserves existing behaviour exactly.
+                    # #159: pass inferred denied_tools (no longer hardcoded empty tuple)
+                    # #164: cancel_event is None on this path; submit_message skips
+                    # cancellation checks entirely (legacy zero-overhead behaviour).
+                    result = engine.submit_message(turn_prompt, command_names, tool_names, denied_tools)
+                else:
+                    remaining = deadline - time.monotonic()
+                    if remaining <= 0:
+                        # #164: signal cancel for any in-flight/future submit_message
+                        # calls that share this engine. Safe because nothing has been
+                        # submitted yet this turn.
+                        assert cancel_event is not None
+                        cancel_event.set()
+                        results.append(self._build_timeout_result(
+                            turn_prompt, command_names, tool_names,
+                            cancel_observed=cancel_event.is_set()
+                        ))
+                        break
+                    assert executor is not None
+                    future = executor.submit(
+                        engine.submit_message, turn_prompt, command_names, tool_names,
+                        denied_tools, cancel_event,
+                    )
+                    try:
+                        result = future.result(timeout=remaining)
+                    except FuturesTimeoutError:
+                        # #164 Stage A: explicitly signal cancel to the still-running
+                        # submit_message thread. The next time it hits a checkpoint
+                        # (entry or post-budget), it returns 'cancelled' without
+                        # mutating state instead of committing a ghost turn. This
+                        # upgrades #161's best-effort future.cancel() (which only
+                        # cancels pre-start futures) to cooperative mid-flight cancel.
+                        assert cancel_event is not None
+                        cancel_event.set()
+                        future.cancel()
+                        results.append(self._build_timeout_result(
+                            turn_prompt, command_names, tool_names,
+                            cancel_observed=cancel_event.is_set()
+                        ))
+                        break
+
+                results.append(result)
+                if result.stop_reason != 'completed':
+                    break
+        finally:
+            if executor is not None:
+                # wait=False: don't let a hung thread block loop exit indefinitely.
+                # The thread will be reaped when the interpreter shuts down or when
+                # the engine call eventually returns.
+                executor.shutdown(wait=False)
        return results

+    @staticmethod
+    def _build_timeout_result(
+        prompt: str,
+        command_names: tuple[str, ...],
+        tool_names: tuple[str, ...],
+        cancel_observed: bool = False,
+    ) -> TurnResult:
+        """Synthesize a TurnResult representing a wall-clock timeout (#161).
+        #164 Stage B: cancel_observed signals cancellation event was set.
+        """
+        return TurnResult(
+            prompt=prompt,
+            output='Wall-clock timeout exceeded before turn completed.',
+            matched_commands=command_names,
+            matched_tools=tool_names,
+            permission_denials=(),
+            usage=UsageSummary(),
+            stop_reason='timeout',
+            cancel_observed=cancel_observed,
+        )
+
    def _infer_permission_denials(self, matches: list[RoutedMatch]) -> list[PermissionDenial]:
        denials: list[PermissionDenial] = []
        for match in matches:
--- a/src/session_store.py
+++ b/src/session_store.py
@@ -26,10 +26,96 @@ def save_session(session: StoredSession, directory: Path | None = None) -> Path:

 def load_session(session_id: str, directory: Path | None = None) -> StoredSession:
    target_dir = directory or DEFAULT_SESSION_DIR
-    data = json.loads((target_dir / f'{session_id}.json').read_text())
+    try:
+        data = json.loads((target_dir / f'{session_id}.json').read_text())
+    except FileNotFoundError:
+        raise SessionNotFoundError(f'session {session_id!r} not found in {target_dir}') from None
    return StoredSession(
        session_id=data['session_id'],
        messages=tuple(data['messages']),
        input_tokens=data['input_tokens'],
        output_tokens=data['output_tokens'],
    )
+
+
+class SessionNotFoundError(KeyError):
+    """Raised when a session does not exist in the store."""
+    pass
+
+
+def list_sessions(directory: Path | None = None) -> list[str]:
+    """List all stored session IDs in the target directory.
+    
+    Args:
+        directory: Target session directory. Defaults to DEFAULT_SESSION_DIR.
+    
+    Returns:
+        Sorted list of session IDs (JSON filenames without .json extension).
+    """
+    target_dir = directory or DEFAULT_SESSION_DIR
+    if not target_dir.exists():
+        return []
+    return sorted(p.stem for p in target_dir.glob('*.json'))
+
+
+def session_exists(session_id: str, directory: Path | None = None) -> bool:
+    """Check if a session exists without raising an error.
+    
+    Args:
+        session_id: The session ID to check.
+        directory: Target session directory. Defaults to DEFAULT_SESSION_DIR.
+    
+    Returns:
+        True if the session file exists, False otherwise.
+    """
+    target_dir = directory or DEFAULT_SESSION_DIR
+    return (target_dir / f'{session_id}.json').exists()
+
+
+class SessionDeleteError(OSError):
+    """Raised when a session file exists but cannot be removed (permission, IO error).
+    
+    Distinct from SessionNotFoundError: this means the session was present but
+    deletion failed mid-operation. Callers can retry or escalate.
+    """
+    pass
+
+
+def delete_session(session_id: str, directory: Path | None = None) -> bool:
+    """Delete a session file from the store.
+    
+    Contract:
+    - **Idempotent**: `delete_session(x)` followed by `delete_session(x)` is safe.
+      Second call returns False (not found), does not raise.
+    - **Race-safe**: Uses `missing_ok=True` on unlink to avoid TOCTOU between
+      exists-check and unlink. Concurrent deletion by another process is
+      treated as a no-op success (returns False for the losing caller).
+    - **Partial-failure surfaced**: If the file exists but cannot be removed
+      (permission denied, filesystem error, directory instead of file), raises
+      `SessionDeleteError` wrapping the underlying OSError. The session store
+      may be in an inconsistent state; caller should retry or escalate.
+    
+    Args:
+        session_id: The session ID to delete.
+        directory: Target session directory. Defaults to DEFAULT_SESSION_DIR.
+    
+    Returns:
+        True if this call deleted the session file.
+        False if the session did not exist (either never existed or was already deleted).
+    
+    Raises:
+        SessionDeleteError: if the session existed but deletion failed.
+    """
+    target_dir = directory or DEFAULT_SESSION_DIR
+    path = target_dir / f'{session_id}.json'
+    try:
+        # Python 3.8+: missing_ok=True avoids TOCTOU race
+        path.unlink(missing_ok=False)
+        return True
+    except FileNotFoundError:
+        # Either never existed or was concurrently deleted — both are no-ops
+        return False
+    except (PermissionError, IsADirectoryError, OSError) as exc:
+        raise SessionDeleteError(
+            f'session {session_id!r} exists in {target_dir} but could not be deleted: {exc}'
+        ) from exc
--- a/tests/test_cancel_observed_field.py
+++ b/tests/test_cancel_observed_field.py
@@ -0,0 +1,199 @@
+"""#164 Stage B — cancel_observed field coverage.
+
+Validates that the TurnResult.cancel_observed field correctly signals
+whether cancellation was observed during turn execution.
+
+Test coverage:
+1. Normal completion: cancel_observed=False (no timeout occurred)
+2. Timeout with cancel signaled: cancel_observed=True
+3. bootstrap JSON output exposes the field
+4. turn-loop JSON output exposes cancel_observed per turn
+5. Safe-to-reuse: after timeout with cancel_observed=True,
+   engine can accept fresh messages without state corruption
+"""
+
+from __future__ import annotations
+
+import json
+import subprocess
+import sys
+from pathlib import Path
+
+import pytest
+
+from src.query_engine import QueryEnginePort, TurnResult
+from src.runtime import PortRuntime
+
+
+CLI = [sys.executable, '-m', 'src.main']
+REPO_ROOT = Path(__file__).resolve().parent.parent
+
+
+class TestCancelObservedField:
+    """TurnResult.cancel_observed correctly signals cancellation observation."""
+
+    def test_default_value_is_false(self) -> None:
+        """New TurnResult defaults to cancel_observed=False (backward compat)."""
+        from src.models import UsageSummary
+        result = TurnResult(
+            prompt='test',
+            output='ok',
+            matched_commands=(),
+            matched_tools=(),
+            permission_denials=(),
+            usage=UsageSummary(),
+            stop_reason='completed',
+        )
+        assert result.cancel_observed is False
+
+    def test_explicit_true_preserved(self) -> None:
+        """cancel_observed=True is preserved through construction."""
+        from src.models import UsageSummary
+        result = TurnResult(
+            prompt='test',
+            output='timed out',
+            matched_commands=(),
+            matched_tools=(),
+            permission_denials=(),
+            usage=UsageSummary(),
+            stop_reason='timeout',
+            cancel_observed=True,
+        )
+        assert result.cancel_observed is True
+
+    def test_normal_completion_cancel_observed_false(self) -> None:
+        """Normal turn completion → cancel_observed=False."""
+        runtime = PortRuntime()
+        results = runtime.run_turn_loop('hello', max_turns=1)
+        assert len(results) >= 1
+        assert results[0].cancel_observed is False
+
+    def test_bootstrap_json_includes_cancel_observed(self) -> None:
+        """bootstrap JSON envelope includes cancel_observed in turn result."""
+        result = subprocess.run(
+            CLI + ['bootstrap', 'hello', '--output-format', 'json'],
+            cwd=REPO_ROOT,
+            capture_output=True,
+            text=True,
+        )
+        assert result.returncode == 0
+        envelope = json.loads(result.stdout)
+        assert 'turn' in envelope
+        assert 'cancel_observed' in envelope['turn'], (
+            f"bootstrap turn must include cancel_observed (SCHEMAS.md contract). "
+            f"Got keys: {list(envelope['turn'].keys())}"
+        )
+        # Normal completion → False
+        assert envelope['turn']['cancel_observed'] is False
+
+    def test_turn_loop_json_per_turn_cancel_observed(self) -> None:
+        """turn-loop JSON envelope includes cancel_observed per turn (#164 Stage B closure)."""
+        result = subprocess.run(
+            CLI + ['turn-loop', 'hello', '--max-turns', '1', '--output-format', 'json'],
+            cwd=REPO_ROOT,
+            capture_output=True,
+            text=True,
+        )
+        assert result.returncode == 0, f"stderr: {result.stderr}"
+        envelope = json.loads(result.stdout)
+        # Common fields from wrap_json_envelope
+        assert envelope['command'] == 'turn-loop'
+        assert envelope['schema_version'] == '1.0'
+        # Turn-loop-specific fields
+        assert 'turns' in envelope
+        assert len(envelope['turns']) >= 1
+        for idx, turn in enumerate(envelope['turns']):
+            assert 'cancel_observed' in turn, (
+                f"Turn {idx} missing cancel_observed: {list(turn.keys())}"
+            )
+        # final_cancel_observed convenience field
+        assert 'final_cancel_observed' in envelope
+        assert isinstance(envelope['final_cancel_observed'], bool)
+
+
+class TestCancelObservedSafeReuseSemantics:
+    """After timeout with cancel_observed=True, engine state is safe to reuse."""
+
+    def test_timeout_result_cancel_observed_true_when_signaled(self) -> None:
+        """#164 Stage B: timeout path passes cancel_event.is_set() to result."""
+        # Force a timeout with max_turns=3 and timeout=0.0001 (instant)
+        runtime = PortRuntime()
+        results = runtime.run_turn_loop(
+            'hello', max_turns=3, timeout_seconds=0.0001,
+            continuation_prompt='keep going',
+        )
+        # Last result should be timeout (pre-start path since timeout is instant)
+        assert results, 'timeout path should still produce a result'
+        last = results[-1]
+        assert last.stop_reason == 'timeout'
+        # cancel_observed=True because the timeout path explicitly sets cancel_event
+        assert last.cancel_observed is True, (
+            f"timeout path must signal cancel_observed=True; got {last.cancel_observed}. "
+            f"stop_reason={last.stop_reason}"
+        )
+
+    def test_engine_messages_not_corrupted_by_timeout(self) -> None:
+        """After timeout with cancel_observed, engine.mutable_messages is consistent.
+
+        #164 Stage B contract: safe-to-reuse means after a timeout-with-cancel,
+        the engine has not committed a ghost turn and can accept fresh input.
+        """
+        engine = QueryEnginePort.from_workspace()
+        # Track initial state
+        initial_message_count = len(engine.mutable_messages)
+
+        # Simulate a direct submit_message call with cancellation
+        import threading
+        cancel_event = threading.Event()
+        cancel_event.set()  # Pre-set: first checkpoint fires
+        result = engine.submit_message(
+            'test', ('cmd1',), ('tool1',),
+            denied_tools=(), cancel_event=cancel_event,
+        )
+
+        # Cancelled turn should not commit mutation
+        assert result.stop_reason == 'cancelled', (
+            f"expected cancelled; got {result.stop_reason}"
+        )
+        # mutable_messages should not have grown
+        assert len(engine.mutable_messages) == initial_message_count, (
+            f"engine.mutable_messages grew after cancelled turn "
+            f"(was {initial_message_count}, now {len(engine.mutable_messages)})"
+        )
+
+        # Engine should accept a fresh message now
+        fresh = engine.submit_message('fresh prompt', ('cmd1',), ('tool1',))
+        assert fresh.stop_reason in ('completed', 'max_budget_reached'), (
+            f"expected engine reusable; got {fresh.stop_reason}"
+        )
+
+
+class TestCancelObservedSchemaCompliance:
+    """SCHEMAS.md contract for cancel_observed field."""
+
+    def test_cancel_observed_is_bool_not_nullable(self) -> None:
+        """cancel_observed is always bool (never null/missing) per SCHEMAS.md."""
+        result = subprocess.run(
+            CLI + ['bootstrap', 'test', '--output-format', 'json'],
+            cwd=REPO_ROOT,
+            capture_output=True,
+            text=True,
+        )
+        envelope = json.loads(result.stdout)
+        cancel_observed = envelope['turn']['cancel_observed']
+        assert isinstance(cancel_observed, bool), (
+            f"cancel_observed must be bool; got {type(cancel_observed)}"
+        )
+
+    def test_turn_loop_envelope_has_final_cancel_observed(self) -> None:
+        """turn-loop JSON exposes final_cancel_observed convenience field."""
+        result = subprocess.run(
+            CLI + ['turn-loop', 'test', '--max-turns', '1', '--output-format', 'json'],
+            cwd=REPO_ROOT,
+            capture_output=True,
+            text=True,
+        )
+        assert result.returncode == 0
+        envelope = json.loads(result.stdout)
+        assert 'final_cancel_observed' in envelope
+        assert isinstance(envelope['final_cancel_observed'], bool)
--- a/tests/test_cli_parity_audit.py
+++ b/tests/test_cli_parity_audit.py
@@ -0,0 +1,333 @@
+"""Cross-surface CLI parity audit (ROADMAP #171).
+
+Prevents future drift of the unified JSON envelope contract across
+claw-code's CLI surface. Instead of requiring humans to notice when
+a new command skips --output-format, this test introspects the parser
+at runtime and verifies every command in the declared clawable-surface
+list supports --output-format {text,json}.
+
+When a new clawable-surface command is added:
+  1. Implement --output-format on the subparser (normal feature work).
+  2. Add the command name to CLAWABLE_SURFACES below.
+  3. This test passes automatically.
+
+When a developer adds a new clawable-surface command but forgets
+--output-format, the test fails with a concrete message pointing at
+the missing flag. Claws no longer need to eyeball parity; the contract
+is enforced at test time.
+
+Three classes of commands:
+  - CLAWABLE_SURFACES: MUST accept --output-format (inspect/lifecycle/exec/diagnostic)
+  - OPT_OUT_SURFACES: explicitly exempt (simulation/mode commands, human-first diagnostic)
+  - Any command in parser not listed in either: test FAILS with classification request
+
+This is operationalised parity — a machine-first CLI enforced by a
+machine-first test.
+"""
+
+from __future__ import annotations
+
+import subprocess
+import sys
+from pathlib import Path
+
+import pytest
+
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+
+from src.main import build_parser  # noqa: E402
+
+
+# Commands that MUST accept --output-format {text,json}.
+# These are the machine-first surfaces — session lifecycle, execution,
+# inspect, diagnostic inventory.
+CLAWABLE_SURFACES = frozenset({
+    # Session lifecycle (#160, #165, #166)
+    'list-sessions',
+    'delete-session',
+    'load-session',
+    'flush-transcript',
+    # Inspect (#167)
+    'show-command',
+    'show-tool',
+    # Execution/work-verb (#168)
+    'exec-command',
+    'exec-tool',
+    'route',
+    'bootstrap',
+    # Diagnostic inventory (#169, #170)
+    'command-graph',
+    'tool-pool',
+    'bootstrap-graph',
+    # Turn-loop with JSON output (#164 Stage B, #174)
+    'turn-loop',
+})
+
+# Commands explicitly exempt from --output-format requirement.
+# Rationale must be explicit — either the command is human-first
+# (rich Markdown docs/reports), simulation-only, or has a dedicated
+# JSON mode flag under a different name.
+OPT_OUT_SURFACES = frozenset({
+    # Rich-Markdown report commands (planned future: JSON schema)
+    'summary',            # full workspace summary (Markdown)
+    'manifest',           # workspace manifest (Markdown)
+    'parity-audit',       # TypeScript archive comparison (Markdown)
+    'setup-report',       # startup/prefetch report (Markdown)
+    # List commands with their own query/filter surface (not JSON yet)
+    'subsystems',         # use --limit
+    'commands',           # use --query / --limit / --no-plugin-commands
+    'tools',              # use --query / --limit / --simple-mode
+    # Simulation/debug surfaces (not claw-orchestrated)
+    'remote-mode',
+    'ssh-mode',
+    'teleport-mode',
+    'direct-connect-mode',
+    'deep-link-mode',
+})
+
+
+def _discover_subcommands_and_flags() -> dict[str, frozenset[str]]:
+    """Introspect the argparse tree to discover every subcommand and its flags.
+
+    Returns:
+      {subcommand_name: frozenset of option strings including --output-format
+       if registered}
+    """
+    parser = build_parser()
+    subcommand_flags: dict[str, frozenset[str]] = {}
+    for action in parser._actions:
+        if not hasattr(action, 'choices') or not action.choices:
+            continue
+        if action.dest != 'command':
+            continue
+        for name, subp in action.choices.items():
+            flags: set[str] = set()
+            for a in subp._actions:
+                if a.option_strings:
+                    flags.update(a.option_strings)
+            subcommand_flags[name] = frozenset(flags)
+    return subcommand_flags
+
+
+class TestClawableSurfaceParity:
+    """Every clawable-surface command MUST accept --output-format {text,json}.
+
+    This is the invariant that codifies 'claws can treat the CLI as a
+    unified protocol without special-casing'.
+    """
+
+    def test_all_clawable_surfaces_accept_output_format(self) -> None:
+        """All commands in CLAWABLE_SURFACES must have --output-format registered."""
+        subcommand_flags = _discover_subcommands_and_flags()
+        missing = []
+        for cmd in CLAWABLE_SURFACES:
+            if cmd not in subcommand_flags:
+                missing.append(f'{cmd}: not registered in parser')
+            elif '--output-format' not in subcommand_flags[cmd]:
+                missing.append(f'{cmd}: missing --output-format flag')
+        assert not missing, (
+            'Clawable-surface parity violation. Every command in '
+            'CLAWABLE_SURFACES must accept --output-format. Failures:\n'
+            + '\n'.join(f'  - {m}' for m in missing)
+        )
+
+    @pytest.mark.parametrize('cmd_name', sorted(CLAWABLE_SURFACES))
+    def test_clawable_surface_output_format_choices(self, cmd_name: str) -> None:
+        """Every clawable surface must accept exactly {text, json} choices."""
+        parser = build_parser()
+        for action in parser._actions:
+            if not hasattr(action, 'choices') or not action.choices:
+                continue
+            if action.dest != 'command':
+                continue
+            if cmd_name not in action.choices:
+                continue
+            subp = action.choices[cmd_name]
+            for a in subp._actions:
+                if '--output-format' in a.option_strings:
+                    assert a.choices == ['text', 'json'], (
+                        f'{cmd_name}: --output-format choices are {a.choices}, '
+                        f'expected [text, json]'
+                    )
+                    assert a.default == 'text', (
+                        f'{cmd_name}: --output-format default is {a.default!r}, '
+                        f'expected \'text\' for backward compat'
+                    )
+                    return
+        pytest.fail(f'{cmd_name}: no --output-format flag found')
+
+
+class TestCommandClassificationCoverage:
+    """Every registered subcommand must be classified as either CLAWABLE or OPT_OUT.
+
+    If a new command is added to the parser but forgotten in both sets, this
+    test fails loudly — forcing an explicit classification decision.
+    """
+
+    def test_every_registered_command_is_classified(self) -> None:
+        subcommand_flags = _discover_subcommands_and_flags()
+        all_classified = CLAWABLE_SURFACES | OPT_OUT_SURFACES
+        unclassified = set(subcommand_flags.keys()) - all_classified
+        assert not unclassified, (
+            'Unclassified subcommands detected. Every new command must be '
+            'explicitly added to either CLAWABLE_SURFACES (must accept '
+            '--output-format) or OPT_OUT_SURFACES (explicitly exempt with '
+            'rationale). Unclassified:\n'
+            + '\n'.join(f'  - {cmd}' for cmd in sorted(unclassified))
+        )
+
+    def test_no_command_in_both_sets(self) -> None:
+        """Sanity: a command cannot be both clawable AND opt-out."""
+        overlap = CLAWABLE_SURFACES & OPT_OUT_SURFACES
+        assert not overlap, (
+            f'Classification conflict: commands appear in both sets: {overlap}'
+        )
+
+    def test_all_classified_commands_actually_exist(self) -> None:
+        """No typos — every command in our sets must actually be registered."""
+        subcommand_flags = _discover_subcommands_and_flags()
+        ghosts = (CLAWABLE_SURFACES | OPT_OUT_SURFACES) - set(subcommand_flags.keys())
+        assert not ghosts, (
+            f'Phantom commands in classification sets (not in parser): {ghosts}. '
+            'Update CLAWABLE_SURFACES / OPT_OUT_SURFACES if commands were removed.'
+        )
+
+
+class TestJsonOutputContractEndToEnd:
+    """Verify the contract AT RUNTIME — not just parser-level, but actual execution.
+
+    Each clawable command must, when invoked with --output-format json,
+    produce parseable JSON on stdout (for success cases).
+    """
+
+    # Minimal invocation args for each clawable command (to hit success path)
+    RUNTIME_INVOCATIONS = {
+        'list-sessions': [],
+        # delete-session/load-session: skip (need state setup, covered by dedicated tests)
+        'show-command': ['add-dir'],
+        'show-tool': ['BashTool'],
+        'exec-command': ['add-dir', 'hi'],
+        'exec-tool': ['BashTool', '{}'],
+        'route': ['review'],
+        'bootstrap': ['hello'],
+        'command-graph': [],
+        'tool-pool': [],
+        'bootstrap-graph': [],
+        # flush-transcript: skip (creates files, covered by dedicated tests)
+    }
+
+    @pytest.mark.parametrize('cmd_name,cmd_args', sorted(RUNTIME_INVOCATIONS.items()))
+    def test_command_emits_parseable_json(self, cmd_name: str, cmd_args: list[str]) -> None:
+        """End-to-end: invoking with --output-format json yields valid JSON."""
+        import json
+        result = subprocess.run(
+            [sys.executable, '-m', 'src.main', cmd_name, *cmd_args, '--output-format', 'json'],
+            cwd=Path(__file__).resolve().parent.parent,
+            capture_output=True,
+            text=True,
+        )
+        # Accept exit 0 (success) or 1 (typed not-found) — both must still produce JSON
+        assert result.returncode in (0, 1), (
+            f'{cmd_name}: unexpected exit {result.returncode}\n'
+            f'stderr: {result.stderr}\n'
+            f'stdout: {result.stdout[:200]}'
+        )
+        try:
+            json.loads(result.stdout)
+        except json.JSONDecodeError as e:
+            pytest.fail(
+                f'{cmd_name} {cmd_args} --output-format json did not produce '
+                f'parseable JSON: {e}\nOutput: {result.stdout[:200]}'
+            )
+
+
+class TestOptOutSurfaceRejection:
+    """Cycle #30: OPT_OUT surfaces must REJECT --output-format, not silently accept.
+    
+    OPT_OUT_AUDIT.md classifies 12 surfaces as intentionally exempt from the
+    JSON envelope contract. This test LOCKS that rejection so accidental
+    drift (e.g., a developer adds --output-format to summary without thinking)
+    doesn't silently promote an OPT_OUT surface to CLAWABLE.
+    
+    Relationship to existing tests:
+    - test_clawable_surface_has_output_format: asserts CLAWABLE surfaces accept it
+    - TestOptOutSurfaceRejection: asserts OPT_OUT surfaces REJECT it
+    
+    Together, these two test classes form a complete parity check:
+    every surface is either IN or OUT, and both cases are explicitly tested.
+    
+    If an OPT_OUT surface is promoted to CLAWABLE intentionally:
+    1. Move it from OPT_OUT_SURFACES to CLAWABLE_SURFACES
+    2. Update OPT_OUT_AUDIT.md with promotion rationale
+    3. Remove from this test's expected rejections
+    4. Both sets of tests continue passing
+    """
+
+    @pytest.mark.parametrize('cmd_name', sorted(OPT_OUT_SURFACES))
+    def test_opt_out_surface_rejects_output_format(self, cmd_name: str) -> None:
+        """OPT_OUT surfaces must NOT accept --output-format flag.
+        
+        Passing --output-format to an OPT_OUT surface should produce an
+        'unrecognized arguments' error from argparse.
+        """
+        result = subprocess.run(
+            [sys.executable, '-m', 'src.main', cmd_name, '--output-format', 'json'],
+            cwd=Path(__file__).resolve().parent.parent,
+            capture_output=True,
+            text=True,
+        )
+        # Should fail — argparse exit 2 in text mode, exit 1 in JSON mode
+        # (both modes normalize to "unrecognized arguments" message)
+        assert result.returncode != 0, (
+            f'{cmd_name} unexpectedly accepted --output-format json. '
+            f'If this is intentional (promotion to CLAWABLE), move from '
+            f'OPT_OUT_SURFACES to CLAWABLE_SURFACES and update OPT_OUT_AUDIT.md. '
+            f'Output: {result.stdout[:200]}\nStderr: {result.stderr[:200]}'
+        )
+        # Verify the error is specifically about --output-format
+        error_text = result.stdout + result.stderr
+        assert '--output-format' in error_text or 'unrecognized' in error_text, (
+            f'{cmd_name} failed but error not about --output-format. '
+            f'Something else is broken:\n'
+            f'stdout: {result.stdout[:300]}\nstderr: {result.stderr[:300]}'
+        )
+
+    def test_opt_out_set_matches_audit_document(self) -> None:
+        """OPT_OUT_SURFACES constant must exactly match OPT_OUT_AUDIT.md listing.
+        
+        This test reads OPT_OUT_AUDIT.md and verifies the constant doesn't
+        drift from the documentation.
+        """
+        audit_path = Path(__file__).resolve().parent.parent / 'OPT_OUT_AUDIT.md'
+        audit_text = audit_path.read_text()
+        
+        # Expected 12 surfaces per audit doc
+        expected_surfaces = {
+            # Group A: Rich-Markdown Reports (4)
+            'summary', 'manifest', 'parity-audit', 'setup-report',
+            # Group B: List Commands (3)
+            'subsystems', 'commands', 'tools',
+            # Group C: Simulation/Debug (5)
+            'remote-mode', 'ssh-mode', 'teleport-mode',
+            'direct-connect-mode', 'deep-link-mode',
+        }
+        
+        assert OPT_OUT_SURFACES == expected_surfaces, (
+            f'OPT_OUT_SURFACES drift from expected 12 surfaces per audit:\n'
+            f'  Expected: {sorted(expected_surfaces)}\n'
+            f'  Actual:   {sorted(OPT_OUT_SURFACES)}'
+        )
+        
+        # Each surface should be mentioned in audit doc
+        missing_from_audit = [s for s in OPT_OUT_SURFACES if s not in audit_text]
+        assert not missing_from_audit, (
+            f'OPT_OUT surfaces not mentioned in OPT_OUT_AUDIT.md: {missing_from_audit}'
+        )
+
+    def test_opt_out_count_matches_declared(self) -> None:
+        """OPT_OUT_AUDIT.md declares '12 surfaces'. Constant must match."""
+        assert len(OPT_OUT_SURFACES) == 12, (
+            f'OPT_OUT_SURFACES has {len(OPT_OUT_SURFACES)} items, '
+            f'but OPT_OUT_AUDIT.md declares 12 total surfaces. '
+            f'Update either the audit doc or the constant.'
+        )
--- a/tests/test_command_graph_tool_pool_output_format.py
+++ b/tests/test_command_graph_tool_pool_output_format.py
@@ -0,0 +1,70 @@
+"""Tests for --output-format on command-graph and tool-pool (ROADMAP #169).
+
+Diagnostic inventory surfaces now speak the CLI family's JSON contract.
+"""
+
+from __future__ import annotations
+
+import json
+import subprocess
+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+
+
+def _run(args: list[str]) -> subprocess.CompletedProcess:
+    return subprocess.run(
+        [sys.executable, '-m', 'src.main', *args],
+        cwd=Path(__file__).resolve().parent.parent,
+        capture_output=True,
+        text=True,
+    )
+
+
+class TestCommandGraphOutputFormat:
+    def test_command_graph_json(self) -> None:
+        result = _run(['command-graph', '--output-format', 'json'])
+        assert result.returncode == 0, result.stderr
+
+        envelope = json.loads(result.stdout)
+        assert 'builtins_count' in envelope
+        assert 'plugin_like_count' in envelope
+        assert 'skill_like_count' in envelope
+        assert 'total_count' in envelope
+        assert envelope['total_count'] == (
+            envelope['builtins_count'] + envelope['plugin_like_count'] + envelope['skill_like_count']
+        )
+        assert isinstance(envelope['builtins'], list)
+        if envelope['builtins']:
+            assert set(envelope['builtins'][0].keys()) == {'name', 'source_hint'}
+
+    def test_command_graph_text_backward_compat(self) -> None:
+        result = _run(['command-graph'])
+        assert result.returncode == 0
+        assert '# Command Graph' in result.stdout
+        assert 'Builtins:' in result.stdout
+        # Not JSON
+        assert not result.stdout.strip().startswith('{')
+
+
+class TestToolPoolOutputFormat:
+    def test_tool_pool_json(self) -> None:
+        result = _run(['tool-pool', '--output-format', 'json'])
+        assert result.returncode == 0, result.stderr
+
+        envelope = json.loads(result.stdout)
+        assert 'simple_mode' in envelope
+        assert 'include_mcp' in envelope
+        assert 'tool_count' in envelope
+        assert 'tools' in envelope
+        assert envelope['tool_count'] == len(envelope['tools'])
+        if envelope['tools']:
+            assert set(envelope['tools'][0].keys()) == {'name', 'source_hint'}
+
+    def test_tool_pool_text_backward_compat(self) -> None:
+        result = _run(['tool-pool'])
+        assert result.returncode == 0
+        assert '# Tool Pool' in result.stdout
+        assert 'Simple mode:' in result.stdout
+        assert not result.stdout.strip().startswith('{')
--- a/tests/test_cross_channel_consistency.py
+++ b/tests/test_cross_channel_consistency.py
@@ -0,0 +1,242 @@
+"""Cycle #27 cross-channel consistency audit (post-#181).
+
+After #181 fix (envelope.exit_code must match process exit), this test
+class systematizes the three-layer protocol invariant framework:
+
+1. Structural compliance: Does the envelope exist? (#178)
+2. Quality compliance: Is stderr silent + message truthful? (#179)
+3. Cross-channel consistency: Do multiple channels agree? (#181 + this)
+
+This file captures cycle #27's proactive invariant audit proving that
+envelope fields match their corresponding reality channels:
+
+- envelope.command ↔ argv dispatch
+- envelope.output_format ↔ --output-format flag
+- envelope.timestamp ↔ actual wall clock
+- envelope.found/handled/deleted ↔ operational truth (no error block mismatch)
+
+All tests passing = no drift detected.
+"""
+
+from __future__ import annotations
+
+import json
+import subprocess
+from datetime import datetime, timezone
+from pathlib import Path
+
+import pytest
+
+import sys
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+
+
+def _run(args: list[str]) -> subprocess.CompletedProcess:
+    """Run claw-code command and capture output."""
+    return subprocess.run(
+        ['python3', '-m', 'src.main'] + args,
+        cwd=Path(__file__).parent.parent,
+        capture_output=True,
+        text=True,
+    )
+
+
+class TestCrossChannelConsistency:
+    """Cycle #27: envelope fields must match reality channels.
+    
+    These are distinct from structural/quality tests. A command can
+    emit structurally valid JSON with clean stderr but still lie about
+    its own output_format or exit code (as #181 proved).
+    """
+
+    def test_envelope_command_matches_dispatch(self) -> None:
+        """Envelope.command must equal the dispatched subcommand."""
+        commands_to_test = [
+            'show-command',
+            'show-tool',
+            'list-sessions',
+            'exec-command',
+            'exec-tool',
+            'delete-session',
+        ]
+        failures = []
+        for cmd in commands_to_test:
+            # Dispatch varies by arity
+            if cmd == 'show-command':
+                args = [cmd, 'nonexistent', '--output-format', 'json']
+            elif cmd == 'show-tool':
+                args = [cmd, 'nonexistent', '--output-format', 'json']
+            elif cmd == 'exec-command':
+                args = [cmd, 'unknown', 'test', '--output-format', 'json']
+            elif cmd == 'exec-tool':
+                args = [cmd, 'unknown', '{}', '--output-format', 'json']
+            else:
+                args = [cmd, '--output-format', 'json']
+            
+            result = _run(args)
+            try:
+                envelope = json.loads(result.stdout)
+            except json.JSONDecodeError:
+                failures.append(f'{cmd}: JSON parse error')
+                continue
+            
+            if envelope.get('command') != cmd:
+                failures.append(
+                    f'{cmd}: envelope.command={envelope.get("command")}, '
+                    f'expected {cmd}'
+                )
+        assert not failures, (
+            'Envelope.command must match dispatched subcommand:\n' +
+            '\n'.join(failures)
+        )
+
+    def test_envelope_output_format_matches_flag(self) -> None:
+        """Envelope.output_format must match --output-format flag."""
+        result = _run(['list-sessions', '--output-format', 'json'])
+        envelope = json.loads(result.stdout)
+        assert envelope['output_format'] == 'json', (
+            f'output_format mismatch: flag=json, envelope={envelope["output_format"]}'
+        )
+
+    def test_envelope_timestamp_is_recent(self) -> None:
+        """Envelope.timestamp must be recent (generated at call time)."""
+        result = _run(['list-sessions', '--output-format', 'json'])
+        envelope = json.loads(result.stdout)
+        ts_str = envelope.get('timestamp')
+        assert ts_str, 'no timestamp field'
+        
+        ts = datetime.fromisoformat(ts_str.replace('Z', '+00:00'))
+        now = datetime.now(timezone.utc)
+        delta = abs((now - ts).total_seconds())
+        
+        assert delta < 5, f'timestamp off by {delta}s (should be <5s)'
+
+    def test_envelope_exit_code_matches_process_exit(self) -> None:
+        """Cycle #26/#181: envelope.exit_code == process exit code.
+        
+        This is a critical invariant. Claws that trust the envelope
+        field must get the truth, not a lie.
+        """
+        cases = [
+            (['show-command', 'nonexistent', '--output-format', 'json'], 1),
+            (['show-tool', 'nonexistent', '--output-format', 'json'], 1),
+            (['list-sessions', '--output-format', 'json'], 0),
+            (['delete-session', 'any-id', '--output-format', 'json'], 0),
+        ]
+        failures = []
+        for args, expected_exit in cases:
+            result = _run(args)
+            if result.returncode != expected_exit:
+                failures.append(
+                    f'{args[0]}: process exit {result.returncode}, '
+                    f'expected {expected_exit}'
+                )
+                continue
+            
+            envelope = json.loads(result.stdout)
+            if envelope['exit_code'] != result.returncode:
+                failures.append(
+                    f'{args[0]}: process exit {result.returncode}, '
+                    f'envelope.exit_code {envelope["exit_code"]}'
+                )
+        
+        assert not failures, (
+            'Envelope.exit_code must match process exit:\n' +
+            '\n'.join(failures)
+        )
+
+    def test_envelope_boolean_fields_match_error_presence(self) -> None:
+        """found/handled/deleted fields must correlate with error block.
+        
+        - If field is True, no error block should exist
+        - If field is False + operational error, error block must exist
+        - If field is False + idempotent (delete nonexistent), no error block
+        """
+        cases = [
+            # (args, bool_field, expected_value, expect_error_block)
+            (['show-command', 'nonexistent', '--output-format', 'json'],
+             'found', False, True),
+            (['exec-command', 'unknown', 'test', '--output-format', 'json'],
+             'handled', False, True),
+            (['delete-session', 'any-id', '--output-format', 'json'],
+             'deleted', False, False),  # idempotent, no error
+        ]
+        failures = []
+        for args, field, expected_val, expect_error in cases:
+            result = _run(args)
+            envelope = json.loads(result.stdout)
+            
+            actual_val = envelope.get(field)
+            has_error = 'error' in envelope
+            
+            if actual_val != expected_val:
+                failures.append(
+                    f'{args[0]}: {field}={actual_val}, expected {expected_val}'
+                )
+            if expect_error and not has_error:
+                failures.append(
+                    f'{args[0]}: expected error block, but none present'
+                )
+            elif not expect_error and has_error:
+                failures.append(
+                    f'{args[0]}: unexpected error block present'
+                )
+        
+        assert not failures, (
+            'Boolean fields must correlate with error block:\n' +
+            '\n'.join(failures)
+        )
+
+
+class TestTextVsJsonModeDivergence:
+    """Cycle #29: Document known text-mode vs JSON-mode exit code divergence.
+    
+    ERROR_HANDLING.md specifies the exit code contract applies ONLY when
+    --output-format json is set. Text mode follows argparse defaults (e.g.,
+    exit 2 for parse errors) while JSON mode normalizes to the contract
+    (exit 1 for parse errors).
+    
+    This test class LOCKS the expected divergence so:
+    1. Documentation stays aligned with implementation
+    2. Future changes to text mode behavior are caught as intentional
+    3. Claws consuming subprocess output can trust the docs
+    """
+
+    def test_unknown_command_text_mode_exits_2(self) -> None:
+        """Text mode: argparse default exit 2 for unknown subcommand."""
+        result = _run(['nonexistent-cmd'])
+        assert result.returncode == 2, (
+            f'text mode should exit 2 (argparse default), got {result.returncode}'
+        )
+
+    def test_unknown_command_json_mode_exits_1(self) -> None:
+        """JSON mode: normalized exit 1 for parse error (#178)."""
+        result = _run(['nonexistent-cmd', '--output-format', 'json'])
+        assert result.returncode == 1, (
+            f'JSON mode should exit 1 (protocol contract), got {result.returncode}'
+        )
+        envelope = json.loads(result.stdout)
+        assert envelope['error']['kind'] == 'parse'
+
+    def test_missing_required_arg_text_mode_exits_2(self) -> None:
+        """Text mode: argparse default exit 2 for missing required arg."""
+        result = _run(['exec-command'])  # missing name + prompt
+        assert result.returncode == 2, (
+            f'text mode should exit 2, got {result.returncode}'
+        )
+
+    def test_missing_required_arg_json_mode_exits_1(self) -> None:
+        """JSON mode: normalized exit 1 for parse error."""
+        result = _run(['exec-command', '--output-format', 'json'])
+        assert result.returncode == 1, (
+            f'JSON mode should exit 1, got {result.returncode}'
+        )
+
+    def test_success_path_identical_in_both_modes(self) -> None:
+        """Success exit codes are identical in both modes."""
+        text_result = _run(['list-sessions'])
+        json_result = _run(['list-sessions', '--output-format', 'json'])
+        assert text_result.returncode == json_result.returncode == 0, (
+            f'success exit should be 0 in both modes: '
+            f'text={text_result.returncode}, json={json_result.returncode}'
+        )
--- a/tests/test_exec_route_bootstrap_output_format.py
+++ b/tests/test_exec_route_bootstrap_output_format.py
@@ -0,0 +1,306 @@
+"""Tests for --output-format on exec-command/exec-tool/route/bootstrap (ROADMAP #168).
+
+Closes the final JSON-parity gap across the CLI family. After #160/#165/
+#166/#167, the session-lifecycle and inspect CLI commands all spoke JSON;
+this batch extends that contract to the exec, route, and bootstrap
+surfaces — the commands claws actually invoke to DO work, not just inspect
+state.
+
+Verifies:
+- exec-command / exec-tool: JSON envelope with handled + source_hint on
+  success; {name, handled:false, error:{kind,message,retryable}} on
+  not-found
+- route: JSON envelope with match_count + matches list
+- bootstrap: JSON envelope with setup, routed_matches, turn, messages,
+  persisted_session_path
+- All 4 preserve legacy text mode byte-identically
+- Exit codes unchanged (0 success, 1 exec-not-found)
+"""
+
+from __future__ import annotations
+
+import json
+import subprocess
+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+
+
+def _run(args: list[str]) -> subprocess.CompletedProcess:
+    return subprocess.run(
+        [sys.executable, '-m', 'src.main', *args],
+        cwd=Path(__file__).resolve().parent.parent,
+        capture_output=True,
+        text=True,
+    )
+
+
+class TestExecCommandOutputFormat:
+    def test_exec_command_found_json(self) -> None:
+        result = _run(['exec-command', 'add-dir', 'hello', '--output-format', 'json'])
+        assert result.returncode == 0, result.stderr
+
+        envelope = json.loads(result.stdout)
+        assert envelope['handled'] is True
+        assert envelope['name'] == 'add-dir'
+        assert envelope['prompt'] == 'hello'
+        assert 'source_hint' in envelope
+        assert 'message' in envelope
+        assert 'error' not in envelope
+
+    def test_exec_command_not_found_json(self) -> None:
+        result = _run(['exec-command', 'nonexistent-cmd', 'hi', '--output-format', 'json'])
+        assert result.returncode == 1
+
+        envelope = json.loads(result.stdout)
+        assert envelope['handled'] is False
+        assert envelope['name'] == 'nonexistent-cmd'
+        assert envelope['prompt'] == 'hi'
+        assert envelope['error']['kind'] == 'command_not_found'
+        assert envelope['error']['retryable'] is False
+        assert 'source_hint' not in envelope
+
+    def test_exec_command_text_backward_compat(self) -> None:
+        result = _run(['exec-command', 'add-dir', 'hello'])
+        assert result.returncode == 0
+        # Single line prose (unchanged from pre-#168)
+        assert result.stdout.count('\n') == 1
+        assert 'add-dir' in result.stdout
+
+
+class TestExecToolOutputFormat:
+    def test_exec_tool_found_json(self) -> None:
+        result = _run(['exec-tool', 'BashTool', '{"cmd":"ls"}', '--output-format', 'json'])
+        assert result.returncode == 0, result.stderr
+
+        envelope = json.loads(result.stdout)
+        assert envelope['handled'] is True
+        assert envelope['name'] == 'BashTool'
+        assert envelope['payload'] == '{"cmd":"ls"}'
+        assert 'source_hint' in envelope
+        assert 'error' not in envelope
+
+    def test_exec_tool_not_found_json(self) -> None:
+        result = _run(['exec-tool', 'NotATool', '{}', '--output-format', 'json'])
+        assert result.returncode == 1
+
+        envelope = json.loads(result.stdout)
+        assert envelope['handled'] is False
+        assert envelope['name'] == 'NotATool'
+        assert envelope['error']['kind'] == 'tool_not_found'
+        assert envelope['error']['retryable'] is False
+
+    def test_exec_tool_text_backward_compat(self) -> None:
+        result = _run(['exec-tool', 'BashTool', '{}'])
+        assert result.returncode == 0
+        assert result.stdout.count('\n') == 1
+
+
+class TestRouteOutputFormat:
+    def test_route_json_envelope(self) -> None:
+        result = _run(['route', 'review mcp', '--limit', '3', '--output-format', 'json'])
+        assert result.returncode == 0
+
+        envelope = json.loads(result.stdout)
+        assert envelope['prompt'] == 'review mcp'
+        assert envelope['limit'] == 3
+        assert 'match_count' in envelope
+        assert 'matches' in envelope
+        assert envelope['match_count'] == len(envelope['matches'])
+        # Every match has required keys
+        for m in envelope['matches']:
+            assert set(m.keys()) == {'kind', 'name', 'score', 'source_hint'}
+            assert m['kind'] in ('command', 'tool')
+
+    def test_route_json_no_matches(self) -> None:
+        # Very unusual string should yield zero matches
+        result = _run(['route', 'zzzzzzzzzqqqqq', '--output-format', 'json'])
+        assert result.returncode == 0
+
+        envelope = json.loads(result.stdout)
+        assert envelope['match_count'] == 0
+        assert envelope['matches'] == []
+
+    def test_route_text_backward_compat(self) -> None:
+        """Text mode tab-separated output unchanged from pre-#168."""
+        result = _run(['route', 'review mcp', '--limit', '2'])
+        assert result.returncode == 0
+        # Each non-empty line has exactly 3 tabs (kind\tname\tscore\tsource_hint)
+        for line in result.stdout.strip().split('\n'):
+            if line:
+                assert line.count('\t') == 3
+
+
+class TestBootstrapOutputFormat:
+    def test_bootstrap_json_envelope(self) -> None:
+        result = _run(['bootstrap', 'review MCP', '--limit', '2', '--output-format', 'json'])
+        assert result.returncode == 0, result.stderr
+
+        envelope = json.loads(result.stdout)
+        # Required top-level keys
+        required = {
+            'prompt', 'limit', 'setup', 'routed_matches',
+            'command_execution_messages', 'tool_execution_messages',
+            'turn', 'persisted_session_path',
+        }
+        assert required.issubset(envelope.keys())
+        # Setup sub-envelope
+        assert 'python_version' in envelope['setup']
+        assert 'platform_name' in envelope['setup']
+        # Turn sub-envelope
+        assert 'stop_reason' in envelope['turn']
+        assert 'prompt' in envelope['turn']
+
+    def test_bootstrap_text_is_markdown(self) -> None:
+        """Text mode produces Markdown (unchanged from pre-#168)."""
+        result = _run(['bootstrap', 'hello', '--limit', '2'])
+        assert result.returncode == 0
+        # Markdown headers
+        assert '# Runtime Session' in result.stdout
+        assert '## Setup' in result.stdout
+        assert '## Routed Matches' in result.stdout
+
+
+class TestFamilyWideJsonParity:
+    """After #167 and #168, ALL inspect/exec/route/lifecycle commands
+    support --output-format. Verify the full family is now parity-complete."""
+
+    FAMILY_SURFACES = [
+        # (cmd_args, expected_to_parse_json)
+        (['show-command', 'add-dir'], True),
+        (['show-tool', 'BashTool'], True),
+        (['exec-command', 'add-dir', 'hi'], True),
+        (['exec-tool', 'BashTool', '{}'], True),
+        (['route', 'review'], True),
+        (['bootstrap', 'hello'], True),
+    ]
+
+    def test_all_family_commands_accept_output_format_json(self) -> None:
+        """Every family command accepts --output-format json and emits parseable JSON."""
+        failures = []
+        for args_base, should_parse in self.FAMILY_SURFACES:
+            result = _run([*args_base, '--output-format', 'json'])
+            if result.returncode not in (0, 1):
+                failures.append(f'{args_base}: exit {result.returncode} — {result.stderr}')
+                continue
+            try:
+                json.loads(result.stdout)
+            except json.JSONDecodeError as e:
+                failures.append(f'{args_base}: not parseable JSON ({e}): {result.stdout[:100]}')
+        assert not failures, (
+            'CLI family JSON parity gap:\n' + '\n'.join(failures)
+        )
+
+    def test_all_family_commands_text_mode_unchanged(self) -> None:
+        """Omitting --output-format defaults to text for every family command."""
+        # Sanity: just verify each runs without error in text mode
+        for args_base, _ in self.FAMILY_SURFACES:
+            result = _run(args_base)
+            assert result.returncode in (0, 1), (
+                f'{args_base} failed in text mode: {result.stderr}'
+            )
+            # Output should not be JSON-shaped (no leading {)
+            assert not result.stdout.strip().startswith('{')
+
+
+class TestEnvelopeExitCodeMatchesProcessExit:
+    """#181: Envelope exit_code field must match actual process exit code.
+    
+    Regression test for the protocol violation where exec-command/exec-tool
+    not-found cases returned exit code 1 from the process but emitted
+    envelopes with exit_code: 0 (default wrap_json_envelope). Claws reading
+    the envelope would misclassify failures as successes.
+    
+    Contract (from ERROR_HANDLING.md):
+    - Exit code 0 = success
+    - Exit code 1 = error/not-found
+    - Envelope MUST reflect process exit
+    """
+
+    def test_exec_command_not_found_envelope_exit_matches(self) -> None:
+        """exec-command 'unknown-name' must have exit_code=1 in envelope."""
+        result = _run(['exec-command', 'nonexistent-cmd-name', 'test-prompt', '--output-format', 'json'])
+        assert result.returncode == 1, f'process exit should be 1, got {result.returncode}'
+        envelope = json.loads(result.stdout)
+        assert envelope['exit_code'] == 1, (
+            f'envelope.exit_code mismatch: process=1, envelope={envelope["exit_code"]}'
+        )
+        assert envelope['handled'] is False
+        assert envelope['error']['kind'] == 'command_not_found'
+
+    def test_exec_tool_not_found_envelope_exit_matches(self) -> None:
+        """exec-tool 'unknown-tool' must have exit_code=1 in envelope."""
+        result = _run(['exec-tool', 'nonexistent-tool-name', '{}', '--output-format', 'json'])
+        assert result.returncode == 1, f'process exit should be 1, got {result.returncode}'
+        envelope = json.loads(result.stdout)
+        assert envelope['exit_code'] == 1, (
+            f'envelope.exit_code mismatch: process=1, envelope={envelope["exit_code"]}'
+        )
+        assert envelope['handled'] is False
+        assert envelope['error']['kind'] == 'tool_not_found'
+
+    def test_all_commands_exit_code_invariant(self) -> None:
+        """Audit: for every clawable command, envelope.exit_code == process exit.
+        
+        This is a stronger invariant than 'emits JSON'. Claws dispatching on
+        the envelope's exit_code field must get the truth, not a lie.
+        """
+        # Sample cases known to return non-zero
+        cases = [
+            # command, expected_exit, justification
+            (['show-command', 'nonexistent-abc'], 1, 'not-found inventory lookup'),
+            (['show-tool', 'nonexistent-xyz'], 1, 'not-found inventory lookup'),
+            (['exec-command', 'nonexistent-1', 'test'], 1, 'not-found execution'),
+            (['exec-tool', 'nonexistent-2', '{}'], 1, 'not-found execution'),
+        ]
+        mismatches = []
+        for args, expected_exit, reason in cases:
+            result = _run([*args, '--output-format', 'json'])
+            if result.returncode != expected_exit:
+                mismatches.append(
+                    f'{args}: expected process exit {expected_exit} ({reason}), '
+                    f'got {result.returncode}'
+                )
+                continue
+            try:
+                envelope = json.loads(result.stdout)
+            except json.JSONDecodeError as e:
+                mismatches.append(f'{args}: JSON parse failed: {e}')
+                continue
+            if envelope.get('exit_code') != result.returncode:
+                mismatches.append(
+                    f'{args}: envelope.exit_code={envelope.get("exit_code")} '
+                    f'!= process exit={result.returncode} ({reason})'
+                )
+        assert not mismatches, (
+            'Envelope exit_code must match process exit code:\n' + 
+            '\n'.join(mismatches)
+        )
+
+
+class TestMetadataFlags:
+    """Cycle #28: --version flag implementation (#180 gap closure)."""
+
+    def test_version_flag_returns_version_text(self) -> None:
+        """--version returns version string and exits successfully."""
+        result = _run(['--version'])
+        assert result.returncode == 0
+        assert 'claw-code' in result.stdout
+        assert '1.0.0' in result.stdout
+
+    def test_help_flag_returns_help_text(self) -> None:
+        """--help returns help text and exits successfully."""
+        result = _run(['--help'])
+        assert result.returncode == 0
+        assert 'usage:' in result.stdout
+        assert 'Python porting workspace' in result.stdout
+
+    def test_help_still_works_after_version_added(self) -> None:
+        """Verify -h and --help both work (no regression)."""
+        result_short = _run(['-h'])
+        result_long = _run(['--help'])
+        assert result_short.returncode == 0
+        assert result_long.returncode == 0
+        assert 'usage:' in result_short.stdout
+        assert 'usage:' in result_long.stdout
--- a/tests/test_flush_transcript_cli.py
+++ b/tests/test_flush_transcript_cli.py
@@ -0,0 +1,206 @@
+"""Tests for flush-transcript CLI parity with the #160/#165 lifecycle triplet (ROADMAP #166).
+
+Verifies that session *creation* now accepts the same flag family as session
+management (list/delete/load):
+- --directory DIR (alternate storage location)
+- --output-format {text,json} (structured output)
+- --session-id ID (deterministic IDs for claw checkpointing)
+
+Also verifies backward compat: default text output unchanged byte-for-byte.
+"""
+
+from __future__ import annotations
+
+import json
+import subprocess
+import sys
+from pathlib import Path
+
+import pytest
+
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+
+
+_REPO_ROOT = Path(__file__).resolve().parent.parent
+
+
+def _run_cli(*args: str) -> subprocess.CompletedProcess[str]:
+    return subprocess.run(
+        [sys.executable, '-m', 'src.main', *args],
+        capture_output=True, text=True, cwd=str(_REPO_ROOT),
+    )
+
+
+class TestDirectoryFlag:
+    def test_flush_transcript_writes_to_custom_directory(self, tmp_path: Path) -> None:
+        result = _run_cli(
+            'flush-transcript', 'hello world',
+            '--directory', str(tmp_path),
+        )
+        assert result.returncode == 0, result.stderr
+        # Exactly one session file should exist in the directory
+        files = list(tmp_path.glob('*.json'))
+        assert len(files) == 1
+        # And the legacy text output points to that file
+        assert str(files[0]) in result.stdout
+
+
+class TestSessionIdFlag:
+    def test_explicit_session_id_is_respected(self, tmp_path: Path) -> None:
+        result = _run_cli(
+            'flush-transcript', 'hello',
+            '--directory', str(tmp_path),
+            '--session-id', 'deterministic-id-42',
+        )
+        assert result.returncode == 0, result.stderr
+        expected_path = tmp_path / 'deterministic-id-42.json'
+        assert expected_path.exists(), (
+            f'session file not created at deterministic path: {expected_path}'
+        )
+        # And it should contain the ID we asked for
+        data = json.loads(expected_path.read_text())
+        assert data['session_id'] == 'deterministic-id-42'
+
+    def test_auto_session_id_when_flag_omitted(self, tmp_path: Path) -> None:
+        """Without --session-id, engine still auto-generates a UUID (backward compat)."""
+        result = _run_cli(
+            'flush-transcript', 'hello',
+            '--directory', str(tmp_path),
+        )
+        assert result.returncode == 0
+        files = list(tmp_path.glob('*.json'))
+        assert len(files) == 1
+        # The filename (minus .json) should be a 32-char hex UUID
+        stem = files[0].stem
+        assert len(stem) == 32
+        assert all(c in '0123456789abcdef' for c in stem)
+
+
+class TestOutputFormatFlag:
+    def test_json_mode_emits_structured_envelope(self, tmp_path: Path) -> None:
+        result = _run_cli(
+            'flush-transcript', 'hello',
+            '--directory', str(tmp_path),
+            '--session-id', 'beta',
+            '--output-format', 'json',
+        )
+        assert result.returncode == 0
+        data = json.loads(result.stdout)
+        assert data['session_id'] == 'beta'
+        assert data['flushed'] is True
+        assert data['path'].endswith('beta.json')
+        # messages_count and token counts should be present and typed
+        assert isinstance(data['messages_count'], int)
+        assert isinstance(data['input_tokens'], int)
+        assert isinstance(data['output_tokens'], int)
+
+    def test_text_mode_byte_identical_to_pre_166_output(self, tmp_path: Path) -> None:
+        """Legacy text output must not change — claws may be parsing it."""
+        result = _run_cli(
+            'flush-transcript', 'hello',
+            '--directory', str(tmp_path),
+        )
+        assert result.returncode == 0
+        lines = result.stdout.strip().split('\n')
+        # Line 1: path ending in .json
+        assert lines[0].endswith('.json')
+        # Line 2: exact legacy format
+        assert lines[1] == 'flushed=True'
+
+
+class TestBackwardCompat:
+    def test_no_flags_default_behaviour(self, tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None:
+        """Running with no flags still works (default dir, text mode, auto UUID)."""
+        import os
+        env = os.environ.copy()
+        env['PYTHONPATH'] = str(_REPO_ROOT)
+        result = subprocess.run(
+            [sys.executable, '-m', 'src.main', 'flush-transcript', 'hello'],
+            capture_output=True, text=True, cwd=str(tmp_path), env=env,
+        )
+        assert result.returncode == 0, result.stderr
+        # Default dir is `.port_sessions` in CWD
+        sessions_dir = tmp_path / '.port_sessions'
+        assert sessions_dir.exists()
+        assert len(list(sessions_dir.glob('*.json'))) == 1
+
+
+class TestLifecycleIntegration:
+    """#166's real value: the triplet + creation command are now a coherent family."""
+
+    def test_create_then_list_then_load_then_delete_roundtrip(
+        self, tmp_path: Path,
+    ) -> None:
+        """End-to-end: flush → list → load → delete, all via the same --directory."""
+        # 1. Create
+        create_result = _run_cli(
+            'flush-transcript', 'roundtrip test',
+            '--directory', str(tmp_path),
+            '--session-id', 'rt-session',
+            '--output-format', 'json',
+        )
+        assert create_result.returncode == 0
+        assert json.loads(create_result.stdout)['session_id'] == 'rt-session'
+
+        # 2. List
+        list_result = _run_cli(
+            'list-sessions',
+            '--directory', str(tmp_path),
+            '--output-format', 'json',
+        )
+        assert list_result.returncode == 0
+        list_data = json.loads(list_result.stdout)
+        assert 'rt-session' in list_data['sessions']
+
+        # 3. Load
+        load_result = _run_cli(
+            'load-session', 'rt-session',
+            '--directory', str(tmp_path),
+            '--output-format', 'json',
+        )
+        assert load_result.returncode == 0
+        assert json.loads(load_result.stdout)['loaded'] is True
+
+        # 4. Delete
+        delete_result = _run_cli(
+            'delete-session', 'rt-session',
+            '--directory', str(tmp_path),
+            '--output-format', 'json',
+        )
+        assert delete_result.returncode == 0
+
+        # 5. Verify gone
+        verify_result = _run_cli(
+            'load-session', 'rt-session',
+            '--directory', str(tmp_path),
+            '--output-format', 'json',
+        )
+        assert verify_result.returncode == 1
+        assert json.loads(verify_result.stdout)['error']['kind'] == 'session_not_found'
+
+
+class TestFullFamilyParity:
+    """All four session-lifecycle CLI commands accept the same core flag pair.
+
+    This is the #166 acceptance test: flush-transcript joins the family.
+    """
+
+    @pytest.mark.parametrize(
+        'command',
+        ['list-sessions', 'delete-session', 'load-session', 'flush-transcript'],
+    )
+    def test_all_four_accept_directory_flag(self, command: str) -> None:
+        help_text = _run_cli(command, '--help').stdout
+        assert '--directory' in help_text, (
+            f'{command} missing --directory flag (#166 parity gap)'
+        )
+
+    @pytest.mark.parametrize(
+        'command',
+        ['list-sessions', 'delete-session', 'load-session', 'flush-transcript'],
+    )
+    def test_all_four_accept_output_format_flag(self, command: str) -> None:
+        help_text = _run_cli(command, '--help').stdout
+        assert '--output-format' in help_text, (
+            f'{command} missing --output-format flag (#166 parity gap)'
+        )
--- a/tests/test_json_envelope_field_consistency.py
+++ b/tests/test_json_envelope_field_consistency.py
@@ -0,0 +1,213 @@
+"""JSON envelope field consistency validation (ROADMAP #173 prep).
+
+This test suite validates that clawable-surface commands' JSON output
+follows the contract defined in SCHEMAS.md. Currently, commands emit
+command-specific envelopes without the canonical common fields
+(timestamp, command, exit_code, output_format, schema_version).
+
+This test documents the current gap and validates the consistency
+of what IS there, providing a baseline for #173 (common field wrapping).
+
+Phase 1 (this test): Validate consistency within each command's envelope.
+Phase 2 (future #173): Wrap all 13 commands with canonical common fields.
+"""
+
+from __future__ import annotations
+
+import json
+import subprocess
+import sys
+from pathlib import Path
+from typing import Any
+
+import pytest
+
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+
+from src.main import build_parser  # noqa: E402
+
+
+# Expected fields for each clawable command's JSON envelope.
+# These are the command-specific fields (not including common fields yet).
+# Entries are (command_name, required_fields, optional_fields).
+ENVELOPE_CONTRACTS = {
+    'list-sessions': (
+        {'count', 'sessions'},
+        set(),
+    ),
+    'delete-session': (
+        {'session_id', 'deleted', 'directory'},
+        set(),
+    ),
+    'load-session': (
+        {'session_id', 'loaded', 'directory', 'path'},
+        set(),
+    ),
+    'flush-transcript': (
+        {'session_id', 'path', 'flushed', 'messages_count', 'input_tokens', 'output_tokens'},
+        set(),
+    ),
+    'show-command': (
+        {'name', 'found', 'source_hint', 'responsibility'},
+        set(),
+    ),
+    'show-tool': (
+        {'name', 'found', 'source_hint'},
+        set(),
+    ),
+    'exec-command': (
+        {'name', 'prompt', 'handled', 'message', 'source_hint'},
+        set(),
+    ),
+    'exec-tool': (
+        {'name', 'payload', 'handled', 'message', 'source_hint'},
+        set(),
+    ),
+    'route': (
+        {'prompt', 'limit', 'match_count', 'matches'},
+        set(),
+    ),
+    'bootstrap': (
+        {'prompt', 'setup', 'routed_matches', 'turn', 'persisted_session_path'},
+        set(),
+    ),
+    'command-graph': (
+        {'builtins_count', 'plugin_like_count', 'skill_like_count', 'total_count', 'builtins', 'plugin_like', 'skill_like'},
+        set(),
+    ),
+    'tool-pool': (
+        {'simple_mode', 'include_mcp', 'tool_count', 'tools'},
+        set(),
+    ),
+    'bootstrap-graph': (
+        {'stages', 'note'},
+        set(),
+    ),
+}
+
+
+class TestJsonEnvelopeConsistency:
+    """Validate current command envelopes match their declared contracts.
+
+    This is a consistency check, not a conformance check. Once #173 adds
+    common fields to all commands, these tests will auto-pass the common
+    field assertions and verify command-specific fields stay consistent.
+    """
+
+    @pytest.mark.parametrize('cmd_name,contract', sorted(ENVELOPE_CONTRACTS.items()))
+    def test_command_json_fields_present(self, cmd_name: str, contract: tuple[set[str], set[str]]) -> None:
+        required, optional = contract
+        """Command's JSON envelope must include all required fields."""
+        # Get minimal invocation args for this command
+        test_invocations = {
+            'list-sessions': [],
+            'show-command': ['add-dir'],
+            'show-tool': ['BashTool'],
+            'exec-command': ['add-dir', 'hi'],
+            'exec-tool': ['BashTool', '{}'],
+            'route': ['review'],
+            'bootstrap': ['hello'],
+            'command-graph': [],
+            'tool-pool': [],
+            'bootstrap-graph': [],
+        }
+        
+        if cmd_name not in test_invocations:
+            pytest.skip(f'{cmd_name} requires session setup; skipped')
+        
+        cmd_args = test_invocations[cmd_name]
+        result = subprocess.run(
+            [sys.executable, '-m', 'src.main', cmd_name, *cmd_args, '--output-format', 'json'],
+            cwd=Path(__file__).resolve().parent.parent,
+            capture_output=True,
+            text=True,
+        )
+        
+        if result.returncode not in (0, 1):
+            pytest.fail(f'{cmd_name}: unexpected exit {result.returncode}\nstderr: {result.stderr}')
+        
+        try:
+            envelope = json.loads(result.stdout)
+        except json.JSONDecodeError as e:
+            pytest.fail(f'{cmd_name}: invalid JSON: {e}\nOutput: {result.stdout[:200]}')
+        
+        # Check required fields (command-specific)
+        missing = required - set(envelope.keys())
+        if missing:
+            pytest.fail(
+                f'{cmd_name} envelope missing required fields: {missing}\n'
+                f'Expected: {required}\nGot: {set(envelope.keys())}'
+            )
+        
+        # Check that extra fields are accounted for (warn if unknown)
+        known = required | optional
+        extra = set(envelope.keys()) - known
+        if extra:
+            # Warn but don't fail — there may be new fields added
+            pytest.warns(UserWarning, match=f'extra fields in {cmd_name}: {extra}')
+
+    def test_envelope_field_value_types(self) -> None:
+        """Smoke test: envelope fields have expected types (bool, int, str, list, dict, null)."""
+        result = subprocess.run(
+            [sys.executable, '-m', 'src.main', 'list-sessions', '--output-format', 'json'],
+            cwd=Path(__file__).resolve().parent.parent,
+            capture_output=True,
+            text=True,
+        )
+        
+        envelope = json.loads(result.stdout)
+        
+        # Spot check a few fields
+        assert isinstance(envelope.get('count'), int), 'count should be int'
+        assert isinstance(envelope.get('sessions'), list), 'sessions should be list'
+
+
+class TestJsonEnvelopeCommonFieldPrep:
+    """Validation stubs for common fields (part of #173 implementation).
+
+    These tests will activate once wrap_json_envelope() is applied to all
+    13 clawable commands. Currently they document the expected contract.
+    """
+
+    def test_all_envelopes_include_timestamp(self) -> None:
+        """Every clawable envelope must include ISO 8601 UTC timestamp."""
+        result = subprocess.run(
+            [sys.executable, '-m', 'src.main', 'command-graph', '--output-format', 'json'],
+            cwd=Path(__file__).resolve().parent.parent,
+            capture_output=True,
+            text=True,
+        )
+        envelope = json.loads(result.stdout)
+        assert 'timestamp' in envelope, 'Missing timestamp field'
+        # Verify ISO 8601 format (ends with Z for UTC)
+        assert envelope['timestamp'].endswith('Z'), f'Timestamp not UTC: {envelope["timestamp"]}'
+
+    def test_all_envelopes_include_command(self) -> None:
+        """Every envelope must echo the command name."""
+        test_cases = [
+            ('list-sessions', []),
+            ('command-graph', []),
+            ('bootstrap', ['hello']),
+        ]
+        for cmd_name, cmd_args in test_cases:
+            result = subprocess.run(
+                [sys.executable, '-m', 'src.main', cmd_name, *cmd_args, '--output-format', 'json'],
+                cwd=Path(__file__).resolve().parent.parent,
+                capture_output=True,
+                text=True,
+            )
+            envelope = json.loads(result.stdout)
+            assert envelope.get('command') == cmd_name, f'{cmd_name} envelope.command mismatch'
+
+    def test_all_envelopes_include_exit_code_and_schema_version(self) -> None:
+        """Every envelope must include exit_code and schema_version."""
+        result = subprocess.run(
+            [sys.executable, '-m', 'src.main', 'tool-pool', '--output-format', 'json'],
+            cwd=Path(__file__).resolve().parent.parent,
+            capture_output=True,
+            text=True,
+        )
+        envelope = json.loads(result.stdout)
+        assert 'exit_code' in envelope, 'Missing exit_code'
+        assert 'schema_version' in envelope, 'Missing schema_version'
+        assert envelope['schema_version'] == '1.0', 'Wrong schema_version'
--- a/tests/test_load_session_cli.py
+++ b/tests/test_load_session_cli.py
@@ -0,0 +1,183 @@
+"""Tests for load-session CLI parity with list-sessions/delete-session (ROADMAP #165).
+
+Verifies the session-lifecycle CLI triplet is now symmetric:
+- --directory DIR accepted (alternate storage locations reachable)
+- --output-format {text,json} accepted
+- Not-found emits typed JSON error envelope, never a Python traceback
+- Corrupted session file distinguished from not-found via 'kind'
+- Legacy text-mode output unchanged (backward compat)
+"""
+
+from __future__ import annotations
+
+import json
+import subprocess
+import sys
+from pathlib import Path
+
+import pytest
+
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+
+from src.session_store import StoredSession, save_session  # noqa: E402
+
+
+_REPO_ROOT = Path(__file__).resolve().parent.parent
+
+
+def _run_cli(
+    *args: str, cwd: Path | None = None,
+) -> subprocess.CompletedProcess[str]:
+    """Always invoke the CLI with cwd=repo-root so ``python -m src.main``
+    can resolve the ``src`` package, regardless of where the test's
+    tmp_path is.
+    """
+    return subprocess.run(
+        [sys.executable, '-m', 'src.main', *args],
+        capture_output=True,
+        text=True,
+        cwd=str(cwd) if cwd else str(_REPO_ROOT),
+    )
+
+
+def _make_session(session_id: str) -> StoredSession:
+    return StoredSession(
+        session_id=session_id, messages=('hi',), input_tokens=1, output_tokens=2,
+    )
+
+
+class TestDirectoryFlagParity:
+    def test_load_session_accepts_directory_flag(self, tmp_path: Path) -> None:
+        save_session(_make_session('alpha'), tmp_path)
+        result = _run_cli('load-session', 'alpha', '--directory', str(tmp_path))
+        assert result.returncode == 0, result.stderr
+        assert 'alpha' in result.stdout
+
+    def test_load_session_without_directory_uses_cwd_default(
+        self, tmp_path: Path,
+    ) -> None:
+        """When --directory is omitted, fall back to .port_sessions in CWD.
+
+        Subprocess CWD must still be able to import ``src.main``, so we use
+        ``cwd=tmp_path`` which means ``python -m src.main`` needs ``src/`` on
+        sys.path. We set PYTHONPATH to the repo root via env.
+        """
+        sessions_dir = tmp_path / '.port_sessions'
+        sessions_dir.mkdir()
+        save_session(_make_session('beta'), sessions_dir)
+        import os
+        env = os.environ.copy()
+        env['PYTHONPATH'] = str(_REPO_ROOT)
+        result = subprocess.run(
+            [sys.executable, '-m', 'src.main', 'load-session', 'beta'],
+            capture_output=True, text=True, cwd=str(tmp_path), env=env,
+        )
+        assert result.returncode == 0, result.stderr
+        assert 'beta' in result.stdout
+
+
+class TestOutputFormatFlagParity:
+    def test_json_mode_on_success(self, tmp_path: Path) -> None:
+        save_session(
+            StoredSession(
+                session_id='gamma', messages=('x', 'y'),
+                input_tokens=5, output_tokens=7,
+            ),
+            tmp_path,
+        )
+        result = _run_cli(
+            'load-session', 'gamma',
+            '--directory', str(tmp_path),
+            '--output-format', 'json',
+        )
+        assert result.returncode == 0
+        data = json.loads(result.stdout)
+        # Verify common envelope fields (SCHEMAS.md contract)
+        assert 'timestamp' in data
+        assert data['command'] == 'load-session'
+        assert data['exit_code'] == 0
+        assert data['schema_version'] == '1.0'
+        # Verify command-specific fields
+        assert data['session_id'] == 'gamma'
+        assert data['loaded'] is True
+        assert data['messages_count'] == 2
+        assert data['input_tokens'] == 5
+        assert data['output_tokens'] == 7
+
+    def test_text_mode_unchanged_on_success(self, tmp_path: Path) -> None:
+        """Legacy text output must be byte-identical for backward compat."""
+        save_session(_make_session('delta'), tmp_path)
+        result = _run_cli('load-session', 'delta', '--directory', str(tmp_path))
+        assert result.returncode == 0
+        lines = result.stdout.strip().split('\n')
+        assert lines == ['delta', '1 messages', 'in=1 out=2']
+
+
+class TestNotFoundTypedError:
+    def test_not_found_json_envelope(self, tmp_path: Path) -> None:
+        """Not-found emits structured JSON, never a Python traceback."""
+        result = _run_cli(
+            'load-session', 'missing',
+            '--directory', str(tmp_path),
+            '--output-format', 'json',
+        )
+        assert result.returncode == 1
+        assert 'Traceback' not in result.stderr, (
+            'regression #165: raw traceback leaked to stderr'
+        )
+        assert 'SessionNotFoundError' not in result.stdout, (
+            'regression #165: internal class name leaked into CLI output'
+        )
+        data = json.loads(result.stdout)
+        assert data['session_id'] == 'missing'
+        assert data['loaded'] is False
+        assert data['error']['kind'] == 'session_not_found'
+        assert data['error']['retryable'] is False
+        # directory field is populated so claws know where we looked
+        assert 'directory' in data['error']
+
+    def test_not_found_text_mode_no_traceback(self, tmp_path: Path) -> None:
+        """Text mode on not-found must not dump a Python stack either."""
+        result = _run_cli(
+            'load-session', 'missing', '--directory', str(tmp_path),
+        )
+        assert result.returncode == 1
+        assert 'Traceback' not in result.stderr
+        assert result.stdout.startswith('error:')
+
+
+class TestLoadFailedDistinctFromNotFound:
+    def test_corrupted_session_file_surfaces_distinct_kind(
+        self, tmp_path: Path,
+    ) -> None:
+        """A corrupted JSON file must emit kind='session_load_failed', not 'session_not_found'."""
+        (tmp_path / 'broken.json').write_text('{ not valid json')
+        result = _run_cli(
+            'load-session', 'broken',
+            '--directory', str(tmp_path),
+            '--output-format', 'json',
+        )
+        assert result.returncode == 1
+        data = json.loads(result.stdout)
+        assert data['error']['kind'] == 'session_load_failed'
+        assert data['error']['retryable'] is True, (
+            'corrupted file is potentially retryable (fs glitch) unlike not-found'
+        )
+
+
+class TestTripletParityConsistency:
+    """All three #160 CLI commands should accept the same flag pair."""
+
+    @pytest.mark.parametrize('command', ['list-sessions', 'delete-session', 'load-session'])
+    def test_all_three_accept_directory_flag(self, command: str) -> None:
+        help_text = _run_cli(command, '--help').stdout
+        assert '--directory' in help_text, (
+            f'{command} missing --directory flag (#165 parity gap)'
+        )
+
+    @pytest.mark.parametrize('command', ['list-sessions', 'delete-session', 'load-session'])
+    def test_all_three_accept_output_format_flag(self, command: str) -> None:
+        help_text = _run_cli(command, '--help').stdout
+        assert '--output-format' in help_text, (
+            f'{command} missing --output-format flag (#165 parity gap)'
+        )
--- a/tests/test_parse_error_envelope.py
+++ b/tests/test_parse_error_envelope.py
@@ -0,0 +1,239 @@
+"""#178 — argparse-level errors emit JSON envelope when --output-format json is requested.
+
+Before #178:
+  $ claw nonexistent --output-format json
+  usage: main.py [-h] {summary,manifest,...} ...
+  main.py: error: argument command: invalid choice: 'nonexistent' (choose from ...)
+  [exit 2, argparse dumps help to stderr, no JSON envelope]
+
+After #178:
+  $ claw nonexistent --output-format json
+  {"timestamp": "...", "command": "nonexistent", "exit_code": 1, ...,
+   "error": {"kind": "parse", "operation": "argparse", ...}}
+  [exit 1, JSON envelope on stdout, matches SCHEMAS.md contract]
+
+Contract:
+- text mode: unchanged (argparse still dumps help to stderr, exit code 2)
+- JSON mode: envelope matches SCHEMAS.md 'error' shape, exit code 1
+- Parse errors use error.kind='parse' (distinct from runtime/session/etc.)
+"""
+
+from __future__ import annotations
+
+import json
+import subprocess
+import sys
+from pathlib import Path
+
+import pytest
+
+CLI = [sys.executable, '-m', 'src.main']
+REPO_ROOT = Path(__file__).resolve().parent.parent
+
+
+class TestParseErrorJsonEnvelope:
+    """Argparse errors emit JSON envelope when --output-format json is requested."""
+
+    def test_unknown_command_json_mode_emits_envelope(self) -> None:
+        """Unknown command + --output-format json → parse-error envelope."""
+        result = subprocess.run(
+            CLI + ['nonexistent-command', '--output-format', 'json'],
+            cwd=REPO_ROOT,
+            capture_output=True,
+            text=True,
+        )
+        assert result.returncode == 1, f"expected exit 1; got {result.returncode}"
+        envelope = json.loads(result.stdout)
+        # Common fields
+        assert envelope['schema_version'] == '1.0'
+        assert envelope['output_format'] == 'json'
+        assert envelope['exit_code'] == 1
+        # Error envelope shape
+        assert envelope['error']['kind'] == 'parse'
+        assert envelope['error']['operation'] == 'argparse'
+        assert envelope['error']['retryable'] is False
+        assert envelope['error']['target'] == 'nonexistent-command'
+        assert 'hint' in envelope['error']
+
+    def test_unknown_command_json_equals_syntax(self) -> None:
+        """--output-format=json syntax also works."""
+        result = subprocess.run(
+            CLI + ['nonexistent-command', '--output-format=json'],
+            cwd=REPO_ROOT,
+            capture_output=True,
+            text=True,
+        )
+        assert result.returncode == 1
+        envelope = json.loads(result.stdout)
+        assert envelope['error']['kind'] == 'parse'
+
+    def test_unknown_command_text_mode_unchanged(self) -> None:
+        """Text mode (default) preserves argparse behavior: help to stderr, exit 2."""
+        result = subprocess.run(
+            CLI + ['nonexistent-command'],
+            cwd=REPO_ROOT,
+            capture_output=True,
+            text=True,
+        )
+        assert result.returncode == 2, f"text mode must preserve argparse exit 2; got {result.returncode}"
+        # stderr should have argparse error (help + error message)
+        assert 'invalid choice' in result.stderr
+        # stdout should be empty (no JSON leaked)
+        assert result.stdout == ''
+
+    def test_invalid_flag_json_mode_emits_envelope(self) -> None:
+        """Invalid flag at top level + --output-format json → envelope."""
+        result = subprocess.run(
+            CLI + ['--invalid-top-level-flag', '--output-format', 'json'],
+            cwd=REPO_ROOT,
+            capture_output=True,
+            text=True,
+        )
+        # argparse might reject before --output-format is parsed; still emit envelope
+        assert result.returncode == 1, f"got {result.returncode}: {result.stderr}"
+        envelope = json.loads(result.stdout)
+        assert envelope['error']['kind'] == 'parse'
+
+    def test_missing_command_no_json_flag_behaves_normally(self) -> None:
+        """No --output-format flag + missing command → normal argparse behavior."""
+        result = subprocess.run(
+            CLI,
+            cwd=REPO_ROOT,
+            capture_output=True,
+            text=True,
+        )
+        # argparse exits 2 when required subcommand is missing
+        assert result.returncode == 2
+        assert 'required' in result.stderr.lower() or 'the following arguments are required' in result.stderr.lower()
+
+    def test_valid_command_unaffected(self) -> None:
+        """Valid commands still work normally (no regression)."""
+        result = subprocess.run(
+            CLI + ['list-sessions', '--output-format', 'json'],
+            cwd=REPO_ROOT,
+            capture_output=True,
+            text=True,
+        )
+        assert result.returncode == 0
+        envelope = json.loads(result.stdout)
+        assert envelope['command'] == 'list-sessions'
+        assert 'sessions' in envelope
+
+    def test_parse_error_envelope_contains_common_fields(self) -> None:
+        """Parse-error envelope must include all common fields per SCHEMAS.md."""
+        result = subprocess.run(
+            CLI + ['bogus', '--output-format', 'json'],
+            cwd=REPO_ROOT,
+            capture_output=True,
+            text=True,
+        )
+        envelope = json.loads(result.stdout)
+        # All common fields required by SCHEMAS.md
+        for field in ('timestamp', 'command', 'exit_code', 'output_format', 'schema_version'):
+            assert field in envelope, f"common field '{field}' missing from parse-error envelope"
+
+
+class TestParseErrorSchemaCompliance:
+    """Parse-error envelope matches SCHEMAS.md error shape."""
+
+    def test_error_kind_is_parse(self) -> None:
+        """error.kind='parse' distinguishes argparse errors from runtime errors."""
+        result = subprocess.run(
+            CLI + ['unknown', '--output-format', 'json'],
+            cwd=REPO_ROOT,
+            capture_output=True,
+            text=True,
+        )
+        envelope = json.loads(result.stdout)
+        assert envelope['error']['kind'] == 'parse'
+
+    def test_error_retryable_false(self) -> None:
+        """Parse errors are never retryable (typo won't magically fix itself)."""
+        result = subprocess.run(
+            CLI + ['unknown', '--output-format', 'json'],
+            cwd=REPO_ROOT,
+            capture_output=True,
+            text=True,
+        )
+        envelope = json.loads(result.stdout)
+        assert envelope['error']['retryable'] is False
+
+
+class TestParseErrorStderrHygiene:
+    """#179: JSON mode must fully suppress argparse stderr output.
+
+    Before #179: stderr leaked argparse usage + error text even when --output-format json.
+    After #179: stderr is silent; envelope carries the real error message verbatim.
+    """
+
+    def test_json_mode_stderr_is_silent_on_unknown_command(self) -> None:
+        """Unknown command in JSON mode: stderr empty."""
+        result = subprocess.run(
+            CLI + ['nonexistent-cmd', '--output-format', 'json'],
+            cwd=REPO_ROOT,
+            capture_output=True,
+            text=True,
+        )
+        assert result.stderr == '', (
+            f"JSON mode stderr must be empty; got:\n{result.stderr!r}"
+        )
+
+    def test_json_mode_stderr_is_silent_on_missing_arg(self) -> None:
+        """Missing required arg in JSON mode: stderr empty (no argparse usage leak)."""
+        result = subprocess.run(
+            CLI + ['load-session', '--output-format', 'json'],
+            cwd=REPO_ROOT,
+            capture_output=True,
+            text=True,
+        )
+        assert result.stderr == '', (
+            f"JSON mode stderr must be empty on missing arg; got:\n{result.stderr!r}"
+        )
+
+    def test_json_mode_envelope_carries_real_argparse_message(self) -> None:
+        """#179: envelope.error.message contains argparse's actual text, not generic rejection."""
+        result = subprocess.run(
+            CLI + ['load-session', '--output-format', 'json'],
+            cwd=REPO_ROOT,
+            capture_output=True,
+            text=True,
+        )
+        envelope = json.loads(result.stdout)
+        # Real argparse message: 'the following arguments are required: session_id'
+        msg = envelope['error']['message']
+        assert 'session_id' in msg, (
+            f"envelope.error.message must carry real argparse text mentioning missing arg; got: {msg!r}"
+        )
+        assert 'required' in msg.lower(), (
+            f"envelope.error.message must indicate what is required; got: {msg!r}"
+        )
+
+    def test_json_mode_envelope_carries_invalid_choice_details(self) -> None:
+        """#179: unknown command envelope includes valid-choice list from argparse."""
+        result = subprocess.run(
+            CLI + ['typo-command', '--output-format', 'json'],
+            cwd=REPO_ROOT,
+            capture_output=True,
+            text=True,
+        )
+        envelope = json.loads(result.stdout)
+        msg = envelope['error']['message']
+        assert 'invalid choice' in msg.lower(), (
+            f"envelope must mention 'invalid choice'; got: {msg!r}"
+        )
+        # Should include at least one valid command name for discoverability
+        assert 'bootstrap' in msg or 'summary' in msg, (
+            f"envelope must include valid choices for discoverability; got: {msg!r}"
+        )
+
+    def test_text_mode_stderr_preserved_on_unknown_command(self) -> None:
+        """Text mode: argparse stderr behavior unchanged (backward compat)."""
+        result = subprocess.run(
+            CLI + ['nonexistent-cmd'],
+            cwd=REPO_ROOT,
+            capture_output=True,
+            text=True,
+        )
+        # Text mode still dumps argparse help to stderr
+        assert 'invalid choice' in result.stderr
+        assert result.returncode == 2
--- a/tests/test_porting_workspace.py
+++ b/tests/test_porting_workspace.py
@@ -173,6 +173,105 @@ class PortingWorkspaceTests(unittest.TestCase):
        self.assertIn(session_id, result.stdout)
        self.assertIn('messages', result.stdout)

+    def test_list_sessions_cli_runs(self) -> None:
+        """#160: list-sessions CLI enumerates stored sessions in text + json."""
+        import json
+        import tempfile
+        from src.session_store import StoredSession, save_session
+
+        with tempfile.TemporaryDirectory() as tmp:
+            tmp_path = Path(tmp)
+            for sid in ['alpha', 'bravo']:
+                save_session(
+                    StoredSession(session_id=sid, messages=('hi',), input_tokens=1, output_tokens=2),
+                    tmp_path,
+                )
+            # text mode
+            text_result = subprocess.run(
+                [sys.executable, '-m', 'src.main', 'list-sessions', '--directory', str(tmp_path)],
+                check=True, capture_output=True, text=True,
+            )
+            self.assertIn('alpha', text_result.stdout)
+            self.assertIn('bravo', text_result.stdout)
+            # json mode
+            json_result = subprocess.run(
+                [sys.executable, '-m', 'src.main', 'list-sessions',
+                 '--directory', str(tmp_path), '--output-format', 'json'],
+                check=True, capture_output=True, text=True,
+            )
+            data = json.loads(json_result.stdout)
+            # Verify common envelope fields (SCHEMAS.md contract)
+            self.assertIn('timestamp', data)
+            self.assertEqual(data['command'], 'list-sessions')
+            self.assertEqual(data['schema_version'], '1.0')
+            # Verify command-specific fields
+            self.assertEqual(data['sessions'], ['alpha', 'bravo'])
+            self.assertEqual(data['count'], 2)
+
+    def test_delete_session_cli_idempotent(self) -> None:
+        """#160: delete-session CLI is idempotent (not-found is exit 0, status=not_found)."""
+        import json
+        import tempfile
+        from src.session_store import StoredSession, save_session
+
+        with tempfile.TemporaryDirectory() as tmp:
+            tmp_path = Path(tmp)
+            save_session(
+                StoredSession(session_id='once', messages=('hi',), input_tokens=1, output_tokens=2),
+                tmp_path,
+            )
+            # first delete: success
+            first = subprocess.run(
+                [sys.executable, '-m', 'src.main', 'delete-session', 'once',
+                 '--directory', str(tmp_path), '--output-format', 'json'],
+                capture_output=True, text=True,
+            )
+            self.assertEqual(first.returncode, 0)
+            envelope_first = json.loads(first.stdout)
+            # Verify common envelope fields (SCHEMAS.md contract)
+            self.assertIn('timestamp', envelope_first)
+            self.assertEqual(envelope_first['command'], 'delete-session')
+            self.assertEqual(envelope_first['exit_code'], 0)
+            self.assertEqual(envelope_first['schema_version'], '1.0')
+            # Verify command-specific fields
+            self.assertEqual(envelope_first['session_id'], 'once')
+            self.assertEqual(envelope_first['deleted'], True)
+            self.assertEqual(envelope_first['status'], 'deleted')
+            # second delete: idempotent, still exit 0
+            second = subprocess.run(
+                [sys.executable, '-m', 'src.main', 'delete-session', 'once',
+                 '--directory', str(tmp_path), '--output-format', 'json'],
+                capture_output=True, text=True,
+            )
+            self.assertEqual(second.returncode, 0)
+            envelope_second = json.loads(second.stdout)
+            self.assertEqual(envelope_second['session_id'], 'once')
+            self.assertEqual(envelope_second['deleted'], False)
+            self.assertEqual(envelope_second['status'], 'not_found')
+
+    def test_delete_session_cli_partial_failure_exit_1(self) -> None:
+        """#160: partial-failure (permission error) surfaces as exit 1 + typed JSON error."""
+        import json
+        import tempfile
+
+        with tempfile.TemporaryDirectory() as tmp:
+            tmp_path = Path(tmp)
+            bad = tmp_path / 'locked.json'
+            bad.mkdir()
+            try:
+                result = subprocess.run(
+                    [sys.executable, '-m', 'src.main', 'delete-session', 'locked',
+                     '--directory', str(tmp_path), '--output-format', 'json'],
+                    capture_output=True, text=True,
+                )
+                self.assertEqual(result.returncode, 1)
+                data = json.loads(result.stdout)
+                self.assertFalse(data['deleted'])
+                self.assertEqual(data['error']['kind'], 'session_delete_failed')
+                self.assertTrue(data['error']['retryable'])
+            finally:
+                bad.rmdir()
+
    def test_tool_permission_filtering_cli_runs(self) -> None:
        result = subprocess.run(
            [sys.executable, '-m', 'src.main', 'tools', '--limit', '10', '--deny-prefix', 'mcp'],
--- a/tests/test_run_turn_loop_cancellation.py
+++ b/tests/test_run_turn_loop_cancellation.py
@@ -0,0 +1,156 @@
+"""Tests for run_turn_loop timeout triggering cooperative cancel (ROADMAP #164 Stage A).
+
+End-to-end integration: when the wall-clock timeout fires in run_turn_loop,
+the runtime must signal the cancel_event so any in-flight submit_message
+thread sees it at its next safe checkpoint and returns without mutating
+state.
+
+This closes the gap filed in #164: #161's timeout bounded caller wait but
+did not prevent ghost turns.
+"""
+
+from __future__ import annotations
+
+import sys
+import threading
+import time
+from pathlib import Path
+from unittest.mock import patch
+
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+
+from src.models import UsageSummary  # noqa: E402
+from src.query_engine import TurnResult  # noqa: E402
+from src.runtime import PortRuntime  # noqa: E402
+
+
+def _completed(prompt: str) -> TurnResult:
+    return TurnResult(
+        prompt=prompt,
+        output='ok',
+        matched_commands=(),
+        matched_tools=(),
+        permission_denials=(),
+        usage=UsageSummary(),
+        stop_reason='completed',
+    )
+
+
+class TestTimeoutPropagatesCancelEvent:
+    def test_runtime_passes_cancel_event_to_submit_message(self) -> None:
+        """submit_message receives a cancel_event when a deadline is in play."""
+        runtime = PortRuntime()
+        captured_event: list[threading.Event | None] = []
+
+        def _capture(prompt, commands, tools, denials, cancel_event=None):
+            captured_event.append(cancel_event)
+            return _completed(prompt)
+
+        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
+            engine = mock_factory.return_value
+            engine.submit_message.side_effect = _capture
+
+            runtime.run_turn_loop(
+                'hello', max_turns=1, timeout_seconds=5.0,
+            )
+
+            # Runtime passed a real Event object, not None
+            assert len(captured_event) == 1
+            assert isinstance(captured_event[0], threading.Event)
+
+    def test_legacy_no_timeout_does_not_pass_cancel_event(self) -> None:
+        """Without timeout_seconds, the cancel_event is None (legacy behaviour)."""
+        runtime = PortRuntime()
+        captured_kwargs: list[dict] = []
+
+        def _capture(prompt, commands, tools, denials):
+            # Legacy call signature: no cancel_event kwarg
+            captured_kwargs.append({'prompt': prompt})
+            return _completed(prompt)
+
+        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
+            engine = mock_factory.return_value
+            engine.submit_message.side_effect = _capture
+
+            runtime.run_turn_loop('hello', max_turns=1)
+
+            # Legacy path didn't pass cancel_event at all
+            assert len(captured_kwargs) == 1
+
+    def test_timeout_sets_cancel_event_before_returning(self) -> None:
+        """When timeout fires mid-call, the event is set and the still-running
+        thread would see 'cancelled' if it checks before returning."""
+        runtime = PortRuntime()
+        observed_events_at_checkpoint: list[bool] = []
+        release = threading.Event()  # test-side release so the thread doesn't leak forever
+
+        def _slow_submit(prompt, commands, tools, denials, cancel_event=None):
+            # Simulate provider work: block until either cancel or a test-side release.
+            # If cancel fires, check if the event is observably set.
+            start = time.monotonic()
+            while time.monotonic() - start < 2.0:
+                if cancel_event is not None and cancel_event.is_set():
+                    observed_events_at_checkpoint.append(True)
+                    return TurnResult(
+                        prompt=prompt, output='',
+                        matched_commands=(), matched_tools=(),
+                        permission_denials=(), usage=UsageSummary(),
+                        stop_reason='cancelled',
+                    )
+                if release.is_set():
+                    break
+                time.sleep(0.05)
+            return _completed(prompt)
+
+        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
+            engine = mock_factory.return_value
+            engine.submit_message.side_effect = _slow_submit
+
+            # Tight deadline: 0.2s, submit will be mid-loop when timeout fires
+            start = time.monotonic()
+            results = runtime.run_turn_loop(
+                'hello', max_turns=1, timeout_seconds=0.2,
+            )
+            elapsed = time.monotonic() - start
+            release.set()  # let the background thread exit cleanly
+
+            # Runtime returned a timeout TurnResult to the caller
+            assert results[-1].stop_reason == 'timeout'
+            # And it happened within a reasonable window of the deadline
+            assert elapsed < 1.5, f'runtime did not honour deadline: {elapsed:.2f}s'
+
+            # Give the background thread a moment to observe the cancel.
+            # We don't assert on it directly (thread-level observability is
+            # timing-dependent), but the contract is: the event IS set, so any
+            # cooperative checkpoint will see it.
+            time.sleep(0.3)
+
+
+class TestCancelEventSharedAcrossTurns:
+    """Event is created once per run_turn_loop invocation and shared across turns."""
+
+    def test_same_event_threaded_to_every_submit_message(self) -> None:
+        runtime = PortRuntime()
+        captured_events: list[threading.Event] = []
+
+        def _capture(prompt, commands, tools, denials, cancel_event=None):
+            if cancel_event is not None:
+                captured_events.append(cancel_event)
+            return _completed(prompt)
+
+        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
+            engine = mock_factory.return_value
+            engine.submit_message.side_effect = _capture
+
+            runtime.run_turn_loop(
+                'hello', max_turns=3, timeout_seconds=5.0,
+                continuation_prompt='continue',
+            )
+
+            # All 3 turns received the same event object (same identity)
+            assert len(captured_events) == 3
+            assert all(e is captured_events[0] for e in captured_events), (
+                'runtime must share one cancel_event across turns, not create '
+                'a new one per turn \u2014 otherwise a late-arriving cancel on turn '
+                'N-1 cannot affect turn N'
+            )
--- a/tests/test_run_turn_loop_continuation.py
+++ b/tests/test_run_turn_loop_continuation.py
@@ -0,0 +1,161 @@
+"""Tests for run_turn_loop continuation contract (ROADMAP #163).
+
+The deprecated ``f'{prompt} [turn N]'`` suffix injection is gone. Verifies:
+- No ``[turn N]`` string ever lands in a submitted prompt
+- Default (``continuation_prompt=None``) stops the loop after turn 0
+- Explicit ``continuation_prompt`` is submitted verbatim on subsequent turns
+- The first turn always gets the original prompt, not the continuation
+"""
+
+from __future__ import annotations
+
+import subprocess
+import sys
+from pathlib import Path
+from unittest.mock import patch
+
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+
+from src.models import UsageSummary  # noqa: E402
+from src.query_engine import TurnResult  # noqa: E402
+from src.runtime import PortRuntime  # noqa: E402
+
+
+def _completed_result(prompt: str) -> TurnResult:
+    return TurnResult(
+        prompt=prompt,
+        output='ok',
+        matched_commands=(),
+        matched_tools=(),
+        permission_denials=(),
+        usage=UsageSummary(),
+        stop_reason='completed',
+    )
+
+
+class TestNoTurnSuffixInjection:
+    """Core acceptance: no prompt submitted to the engine ever contains '[turn N]'."""
+
+    def test_default_path_submits_original_prompt_only(self) -> None:
+        runtime = PortRuntime()
+        submitted: list[str] = []
+
+        def _capture(prompt, commands, tools, denials):
+            submitted.append(prompt)
+            return _completed_result(prompt)
+
+        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
+            engine = mock_factory.return_value
+            engine.submit_message.side_effect = _capture
+
+            runtime.run_turn_loop('investigate this bug', max_turns=3)
+
+            # Without continuation_prompt, only turn 0 should run
+            assert submitted == ['investigate this bug']
+            # And no '[turn N]' suffix anywhere
+            for p in submitted:
+                assert '[turn' not in p, f'found [turn suffix in submitted prompt: {p!r}'
+
+    def test_with_continuation_prompt_no_turn_suffix(self) -> None:
+        runtime = PortRuntime()
+        submitted: list[str] = []
+
+        def _capture(prompt, commands, tools, denials):
+            submitted.append(prompt)
+            return _completed_result(prompt)
+
+        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
+            engine = mock_factory.return_value
+            engine.submit_message.side_effect = _capture
+
+            runtime.run_turn_loop(
+                'investigate this bug',
+                max_turns=3,
+                continuation_prompt='Continue.',
+            )
+
+            # Turn 0 = original, turns 1-2 = continuation, verbatim
+            assert submitted == ['investigate this bug', 'Continue.', 'Continue.']
+            # No harness-injected suffix anywhere
+            for p in submitted:
+                assert '[turn' not in p
+                assert not p.endswith(']')
+
+
+class TestContinuationDefaultStopsAfterTurnZero:
+    def test_default_continuation_returns_one_result(self) -> None:
+        runtime = PortRuntime()
+        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
+            engine = mock_factory.return_value
+            engine.submit_message.side_effect = lambda p, *_: _completed_result(p)
+
+            results = runtime.run_turn_loop('x', max_turns=5)
+            assert len(results) == 1
+            assert results[0].prompt == 'x'
+
+    def test_default_continuation_does_not_call_engine_twice(self) -> None:
+        runtime = PortRuntime()
+        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
+            engine = mock_factory.return_value
+            engine.submit_message.side_effect = lambda p, *_: _completed_result(p)
+
+            runtime.run_turn_loop('x', max_turns=10)
+            # Exactly one submit_message call despite max_turns=10
+            assert engine.submit_message.call_count == 1
+
+
+class TestExplicitContinuationBehaviour:
+    def test_first_turn_always_uses_original_prompt(self) -> None:
+        runtime = PortRuntime()
+        captured: list[str] = []
+
+        def _capture(prompt, *_):
+            captured.append(prompt)
+            return _completed_result(prompt)
+
+        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
+            engine = mock_factory.return_value
+            engine.submit_message.side_effect = _capture
+
+            runtime.run_turn_loop(
+                'original task', max_turns=2, continuation_prompt='keep going'
+            )
+
+            assert captured[0] == 'original task'
+            assert captured[1] == 'keep going'
+
+    def test_continuation_respects_max_turns(self) -> None:
+        runtime = PortRuntime()
+        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
+            engine = mock_factory.return_value
+            engine.submit_message.side_effect = lambda p, *_: _completed_result(p)
+
+            runtime.run_turn_loop('x', max_turns=3, continuation_prompt='go')
+            assert engine.submit_message.call_count == 3
+
+
+class TestCLIContinuationFlag:
+    def test_cli_default_runs_one_turn(self) -> None:
+        """Without --continuation-prompt, CLI should emit exactly '## Turn 1'."""
+        result = subprocess.run(
+            [sys.executable, '-m', 'src.main', 'turn-loop', 'review MCP tool',
+             '--max-turns', '3', '--structured-output'],
+            check=True, capture_output=True, text=True,
+        )
+        assert '## Turn 1' in result.stdout
+        assert '## Turn 2' not in result.stdout
+        assert '[turn' not in result.stdout
+
+    def test_cli_with_continuation_runs_multiple_turns(self) -> None:
+        """With --continuation-prompt, CLI should run up to max_turns."""
+        result = subprocess.run(
+            [sys.executable, '-m', 'src.main', 'turn-loop', 'review MCP tool',
+             '--max-turns', '2', '--structured-output',
+             '--continuation-prompt', 'continue'],
+            check=True, capture_output=True, text=True,
+        )
+        assert '## Turn 1' in result.stdout
+        assert '## Turn 2' in result.stdout
+        # The continuation text is visible (it's submitted as the turn prompt)
+        # but no harness-injected [turn N] suffix
+        assert '[turn' not in result.stdout
--- a/tests/test_run_turn_loop_permissions.py
+++ b/tests/test_run_turn_loop_permissions.py
@@ -0,0 +1,95 @@
+"""Tests for run_turn_loop permission denials parity (ROADMAP #159).
+
+Verifies that multi-turn sessions have the same security posture as
+single-turn bootstrap_session: denied_tools are inferred from matches
+and threaded through every turn, not hardcoded empty.
+"""
+
+from __future__ import annotations
+
+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+
+from src.runtime import PortRuntime  # noqa: E402
+
+
+class TestPermissionDenialsInTurnLoop:
+    """#159: permission denials must be non-empty in run_turn_loop,
+    matching what bootstrap_session produces for the same prompt.
+    """
+
+    def test_turn_loop_surfaces_permission_denials_like_bootstrap(self) -> None:
+        """Symmetry check: turn_loop and bootstrap_session infer the same denials."""
+        runtime = PortRuntime()
+        prompt = 'run bash ls'
+
+        # Single-turn via bootstrap
+        bootstrap_result = runtime.bootstrap_session(prompt)
+        bootstrap_denials = bootstrap_result.turn_result.permission_denials
+
+        # Multi-turn via run_turn_loop (single turn, no continuation)
+        loop_results = runtime.run_turn_loop(prompt, max_turns=1)
+        loop_denials = loop_results[0].permission_denials
+
+        # Both should infer denials for bash-family tools
+        assert len(bootstrap_denials) > 0, (
+            'bootstrap_session should deny bash-family tools'
+        )
+        assert len(loop_denials) > 0, (
+            f'#159 regression: run_turn_loop returned empty denials; '
+            f'expected {len(bootstrap_denials)} like bootstrap_session'
+        )
+
+        # The denial kinds should match (both deny the same tools)
+        bootstrap_denied_names = {d.tool_name for d in bootstrap_denials}
+        loop_denied_names = {d.tool_name for d in loop_denials}
+        assert bootstrap_denied_names == loop_denied_names, (
+            f'asymmetric denials: bootstrap denied {bootstrap_denied_names}, '
+            f'loop denied {loop_denied_names}'
+        )
+
+    def test_turn_loop_with_continuation_preserves_denials(self) -> None:
+        """Denials are inferred once at loop start, then passed to every turn."""
+        runtime = PortRuntime()
+        from unittest.mock import patch
+
+        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
+            from src.models import UsageSummary
+            from src.query_engine import TurnResult
+
+            engine = mock_factory.return_value
+            submitted_denials: list[tuple] = []
+
+            def _capture(prompt, commands, tools, denials):
+                submitted_denials.append(denials)
+                return TurnResult(
+                    prompt=prompt,
+                    output='ok',
+                    matched_commands=(),
+                    matched_tools=(),
+                    permission_denials=denials,  # echo back the denials
+                    usage=UsageSummary(),
+                    stop_reason='completed',
+                )
+
+            engine.submit_message.side_effect = _capture
+
+            loop_results = runtime.run_turn_loop(
+                'run bash rm', max_turns=2, continuation_prompt='continue'
+            )
+
+            # Both turn 0 and turn 1 should have received the same denials
+            assert len(submitted_denials) == 2
+            assert submitted_denials[0] == submitted_denials[1], (
+                'denials should be consistent across all turns'
+            )
+            # And they should be non-empty (bash is destructive)
+            assert len(submitted_denials[0]) > 0, (
+                'turn-loop denials were empty — #159 regression'
+            )
+
+            # Turn results should reflect the denials that were passed
+            for result in loop_results:
+                assert len(result.permission_denials) > 0
--- a/tests/test_run_turn_loop_timeout.py
+++ b/tests/test_run_turn_loop_timeout.py
@@ -0,0 +1,179 @@
+"""Tests for run_turn_loop wall-clock timeout (ROADMAP #161).
+
+Covers:
+- timeout_seconds=None preserves legacy unbounded behaviour
+- timeout_seconds=X aborts a hung turn and emits stop_reason='timeout'
+- Timeout budget is total wall-clock across all turns, not per-turn
+- Already-exhausted budget short-circuits before the first turn runs
+- Legacy path still runs without a ThreadPoolExecutor in the way
+"""
+
+from __future__ import annotations
+
+import sys
+import time
+from pathlib import Path
+from unittest.mock import patch
+
+import pytest
+
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+
+from src.models import UsageSummary  # noqa: E402
+from src.query_engine import TurnResult  # noqa: E402
+from src.runtime import PortRuntime  # noqa: E402
+
+
+def _completed_result(prompt: str) -> TurnResult:
+    return TurnResult(
+        prompt=prompt,
+        output='ok',
+        matched_commands=(),
+        matched_tools=(),
+        permission_denials=(),
+        usage=UsageSummary(),
+        stop_reason='completed',
+    )
+
+
+class TestLegacyUnboundedBehaviour:
+    def test_no_timeout_preserves_existing_behaviour(self) -> None:
+        """timeout_seconds=None must not change legacy path at all."""
+        results = PortRuntime().run_turn_loop('review MCP tool', max_turns=2)
+        assert len(results) >= 1
+        for r in results:
+            assert r.stop_reason in {'completed', 'max_turns_reached', 'max_budget_reached'}
+            assert r.stop_reason != 'timeout'
+
+
+class TestTimeoutAbortsHungTurn:
+    def test_hung_submit_message_times_out(self) -> None:
+        """A stalled submit_message must be aborted and emit stop_reason='timeout'."""
+        runtime = PortRuntime()
+
+        # #164 Stage A: runtime now passes cancel_event as a 5th positional
+        # arg on the timeout path, so mocks must accept it (even if they ignore it).
+        def _hang(prompt, commands, tools, denials, cancel_event=None):
+            time.sleep(5.0)  # would block the loop
+            return _completed_result(prompt)
+
+        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
+            engine = mock_factory.return_value
+            engine.config = None  # attribute-assigned in run_turn_loop
+            engine.submit_message.side_effect = _hang
+
+            start = time.monotonic()
+            results = runtime.run_turn_loop(
+                'review MCP tool', max_turns=3, timeout_seconds=0.3
+            )
+            elapsed = time.monotonic() - start
+
+            # Must exit well under the 5s hang
+            assert elapsed < 1.5, f'run_turn_loop did not honor timeout: {elapsed:.2f}s'
+            assert len(results) == 1
+            assert results[-1].stop_reason == 'timeout'
+
+
+class TestTimeoutBudgetIsTotal:
+    def test_budget_is_cumulative_across_turns(self) -> None:
+        """timeout_seconds is total wall-clock across all turns, not per-turn.
+
+        #163 interaction: multi-turn behaviour now requires an explicit
+        ``continuation_prompt``; otherwise the loop stops after turn 0 and
+        the cumulative-budget contract is trivially satisfied. We supply one
+        here so the test actually exercises the cross-turn deadline.
+        """
+        runtime = PortRuntime()
+        call_count = {'n': 0}
+
+        def _slow(prompt, commands, tools, denials, cancel_event=None):
+            call_count['n'] += 1
+            time.sleep(0.4)  # each turn burns 0.4s
+            return _completed_result(prompt)
+
+        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
+            engine = mock_factory.return_value
+            engine.submit_message.side_effect = _slow
+
+            start = time.monotonic()
+            # 0.6s budget, 0.4s per turn. First turn completes (~0.4s),
+            # second turn times out before finishing.
+            results = runtime.run_turn_loop(
+                'review MCP tool',
+                max_turns=5,
+                timeout_seconds=0.6,
+                continuation_prompt='continue',
+            )
+            elapsed = time.monotonic() - start
+
+            # Should exit at around 0.6s, not 2.0s (5 turns * 0.4s)
+            assert elapsed < 1.5, f'cumulative budget not honored: {elapsed:.2f}s'
+            # Last result should be the timeout
+            assert results[-1].stop_reason == 'timeout'
+
+
+class TestExhaustedBudget:
+    def test_zero_timeout_short_circuits_first_turn(self) -> None:
+        """timeout_seconds=0 emits timeout before the first submit_message call."""
+        runtime = PortRuntime()
+
+        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
+            engine = mock_factory.return_value
+            # submit_message should never be called when budget is already 0
+            engine.submit_message.side_effect = AssertionError(
+                'submit_message should not run when budget is exhausted'
+            )
+
+            results = runtime.run_turn_loop(
+                'review MCP tool', max_turns=3, timeout_seconds=0.0
+            )
+
+            assert len(results) == 1
+            assert results[0].stop_reason == 'timeout'
+
+
+class TestTimeoutResultShape:
+    def test_timeout_result_has_correct_prompt_and_matches(self) -> None:
+        """Synthetic TurnResult on timeout must carry the turn's prompt + routed matches."""
+        runtime = PortRuntime()
+
+        def _hang(prompt, commands, tools, denials, cancel_event=None):
+            time.sleep(5.0)
+            return _completed_result(prompt)
+
+        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
+            engine = mock_factory.return_value
+            engine.submit_message.side_effect = _hang
+
+            results = runtime.run_turn_loop(
+                'review MCP tool', max_turns=2, timeout_seconds=0.2
+            )
+
+            timeout_result = results[-1]
+            assert timeout_result.stop_reason == 'timeout'
+            assert timeout_result.prompt == 'review MCP tool'
+            # matched_commands / matched_tools should still be populated from routing,
+            # so downstream transcripts don't lose the routing context.
+            # These may be empty tuples depending on routing; they must be tuples.
+            assert isinstance(timeout_result.matched_commands, tuple)
+            assert isinstance(timeout_result.matched_tools, tuple)
+            assert isinstance(timeout_result.usage, UsageSummary)
+
+
+class TestNegativeTimeoutTreatedAsExhausted:
+    def test_negative_timeout_short_circuits(self) -> None:
+        """A negative budget should behave identically to exhausted."""
+        runtime = PortRuntime()
+
+        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
+            engine = mock_factory.return_value
+            engine.submit_message.side_effect = AssertionError(
+                'submit_message should not run when budget is negative'
+            )
+
+            results = runtime.run_turn_loop(
+                'review MCP tool', max_turns=3, timeout_seconds=-1.0
+            )
+
+            assert len(results) == 1
+            assert results[0].stop_reason == 'timeout'
--- a/tests/test_session_store.py
+++ b/tests/test_session_store.py
@@ -0,0 +1,173 @@
+"""Tests for session_store CRUD surface (ROADMAP #160).
+
+Covers:
+- list_sessions enumeration
+- session_exists boolean check
+- delete_session idempotency + race-safety + partial-failure contract
+- SessionNotFoundError typing (KeyError subclass)
+- SessionDeleteError typing (OSError subclass)
+"""
+
+from __future__ import annotations
+
+import sys
+from pathlib import Path
+
+import pytest
+
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent / 'src'))
+
+from session_store import (  # noqa: E402
+    StoredSession,
+    SessionDeleteError,
+    SessionNotFoundError,
+    delete_session,
+    list_sessions,
+    load_session,
+    save_session,
+    session_exists,
+)
+
+
+def _make_session(session_id: str) -> StoredSession:
+    return StoredSession(
+        session_id=session_id,
+        messages=('hello',),
+        input_tokens=1,
+        output_tokens=2,
+    )
+
+
+class TestListSessions:
+    def test_empty_directory_returns_empty_list(self, tmp_path: Path) -> None:
+        assert list_sessions(tmp_path) == []
+
+    def test_nonexistent_directory_returns_empty_list(self, tmp_path: Path) -> None:
+        missing = tmp_path / 'never-created'
+        assert list_sessions(missing) == []
+
+    def test_lists_saved_sessions_sorted(self, tmp_path: Path) -> None:
+        save_session(_make_session('charlie'), tmp_path)
+        save_session(_make_session('alpha'), tmp_path)
+        save_session(_make_session('bravo'), tmp_path)
+        assert list_sessions(tmp_path) == ['alpha', 'bravo', 'charlie']
+
+    def test_ignores_non_json_files(self, tmp_path: Path) -> None:
+        save_session(_make_session('real'), tmp_path)
+        (tmp_path / 'notes.txt').write_text('ignore me')
+        (tmp_path / 'data.yaml').write_text('ignore me too')
+        assert list_sessions(tmp_path) == ['real']
+
+
+class TestSessionExists:
+    def test_returns_true_for_saved_session(self, tmp_path: Path) -> None:
+        save_session(_make_session('present'), tmp_path)
+        assert session_exists('present', tmp_path) is True
+
+    def test_returns_false_for_missing_session(self, tmp_path: Path) -> None:
+        assert session_exists('absent', tmp_path) is False
+
+    def test_returns_false_for_nonexistent_directory(self, tmp_path: Path) -> None:
+        missing = tmp_path / 'never-created'
+        assert session_exists('anything', missing) is False
+
+
+class TestLoadSession:
+    def test_raises_typed_error_on_missing(self, tmp_path: Path) -> None:
+        with pytest.raises(SessionNotFoundError) as exc_info:
+            load_session('nonexistent', tmp_path)
+        assert 'nonexistent' in str(exc_info.value)
+
+    def test_not_found_error_is_keyerror_subclass(self, tmp_path: Path) -> None:
+        """Orchestrators catching KeyError should still work."""
+        with pytest.raises(KeyError):
+            load_session('nonexistent', tmp_path)
+
+    def test_not_found_error_is_not_filenotfounderror(self, tmp_path: Path) -> None:
+        """Callers can distinguish 'not found' from IO errors."""
+        with pytest.raises(SessionNotFoundError):
+            load_session('nonexistent', tmp_path)
+        # Specifically, it should NOT match bare FileNotFoundError alone
+        # (SessionNotFoundError inherits from KeyError, not FileNotFoundError)
+        assert not issubclass(SessionNotFoundError, FileNotFoundError)
+
+
+class TestDeleteSessionIdempotency:
+    """Contract: delete_session(x) followed by delete_session(x) must be safe."""
+
+    def test_first_delete_returns_true(self, tmp_path: Path) -> None:
+        save_session(_make_session('to-delete'), tmp_path)
+        assert delete_session('to-delete', tmp_path) is True
+
+    def test_second_delete_returns_false_no_raise(self, tmp_path: Path) -> None:
+        """Idempotency: deleting an already-deleted session is a no-op."""
+        save_session(_make_session('once'), tmp_path)
+        delete_session('once', tmp_path)
+        # Second call must not raise
+        assert delete_session('once', tmp_path) is False
+
+    def test_delete_nonexistent_returns_false_no_raise(self, tmp_path: Path) -> None:
+        """Never-existed session is treated identically to already-deleted."""
+        assert delete_session('never-existed', tmp_path) is False
+
+    def test_delete_removes_only_target(self, tmp_path: Path) -> None:
+        save_session(_make_session('keep'), tmp_path)
+        save_session(_make_session('remove'), tmp_path)
+        delete_session('remove', tmp_path)
+        assert list_sessions(tmp_path) == ['keep']
+
+
+class TestDeleteSessionPartialFailure:
+    """Contract: file exists but cannot be removed -> SessionDeleteError."""
+
+    def test_partial_failure_raises_session_delete_error(self, tmp_path: Path) -> None:
+        """If a directory exists where a session file should be, unlink fails."""
+        bad_path = tmp_path / 'locked.json'
+        bad_path.mkdir()
+        try:
+            with pytest.raises(SessionDeleteError) as exc_info:
+                delete_session('locked', tmp_path)
+            # Underlying cause should be wrapped
+            assert exc_info.value.__cause__ is not None
+            assert isinstance(exc_info.value.__cause__, OSError)
+        finally:
+            bad_path.rmdir()
+
+    def test_delete_error_is_oserror_subclass(self, tmp_path: Path) -> None:
+        """Callers catching OSError should still work for retries."""
+        bad_path = tmp_path / 'locked.json'
+        bad_path.mkdir()
+        try:
+            with pytest.raises(OSError):
+                delete_session('locked', tmp_path)
+        finally:
+            bad_path.rmdir()
+
+
+class TestRaceSafety:
+    """Contract: delete_session must be race-safe between exists-check and unlink."""
+
+    def test_concurrent_deletion_returns_false_not_raises(
+        self, tmp_path: Path, monkeypatch
+    ) -> None:
+        """If another process deletes between exists-check and unlink, return False."""
+        save_session(_make_session('racy'), tmp_path)
+        # Simulate: file disappears right before unlink (concurrent deletion)
+        path = tmp_path / 'racy.json'
+        path.unlink()
+        # Now delete_session should return False, not raise
+        assert delete_session('racy', tmp_path) is False
+
+
+class TestRoundtrip:
+    def test_save_list_load_delete_cycle(self, tmp_path: Path) -> None:
+        session = _make_session('lifecycle')
+        save_session(session, tmp_path)
+        assert 'lifecycle' in list_sessions(tmp_path)
+        assert session_exists('lifecycle', tmp_path)
+        loaded = load_session('lifecycle', tmp_path)
+        assert loaded.session_id == 'lifecycle'
+        assert loaded.messages == ('hello',)
+        assert delete_session('lifecycle', tmp_path) is True
+        assert not session_exists('lifecycle', tmp_path)
+        assert list_sessions(tmp_path) == []
--- a/tests/test_show_command_tool_output_format.py
+++ b/tests/test_show_command_tool_output_format.py
@@ -0,0 +1,203 @@
+"""Tests for --output-format flag on show-command and show-tool (ROADMAP #167).
+
+Verifies parity with session-lifecycle CLI family (#160/#165/#166):
+- show-command and show-tool now accept --output-format {text,json}
+- Found case returns success with JSON envelope: {name, found: true, source_hint, responsibility}
+- Not-found case returns typed error envelope: {name, found: false, error: {kind, message, retryable}}
+- Legacy text output (default) unchanged for backward compat
+- Exit code 0 on success, 1 on not-found (matching load-session contract)
+"""
+
+from __future__ import annotations
+
+import json
+import subprocess
+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+
+
+class TestShowCommandOutputFormat:
+    """show-command --output-format {text,json} parity with session-lifecycle family."""
+
+    def test_show_command_found_json(self) -> None:
+        """show-command with found entry returns JSON envelope."""
+        result = subprocess.run(
+            [sys.executable, '-m', 'src.main', 'show-command', 'add-dir', '--output-format', 'json'],
+            cwd=Path(__file__).resolve().parent.parent,
+            capture_output=True,
+            text=True,
+        )
+        assert result.returncode == 0, f'Expected exit 0, got {result.returncode}: {result.stderr}'
+
+        envelope = json.loads(result.stdout)
+        assert envelope['found'] is True
+        assert envelope['name'] == 'add-dir'
+        assert 'source_hint' in envelope
+        assert 'responsibility' in envelope
+        # No error field when found
+        assert 'error' not in envelope
+
+    def test_show_command_not_found_json(self) -> None:
+        """show-command with missing entry returns typed error envelope."""
+        result = subprocess.run(
+            [sys.executable, '-m', 'src.main', 'show-command', 'nonexistent-cmd', '--output-format', 'json'],
+            cwd=Path(__file__).resolve().parent.parent,
+            capture_output=True,
+            text=True,
+        )
+        assert result.returncode == 1, f'Expected exit 1 on not-found, got {result.returncode}'
+
+        envelope = json.loads(result.stdout)
+        assert envelope['found'] is False
+        assert envelope['name'] == 'nonexistent-cmd'
+        assert envelope['error']['kind'] == 'command_not_found'
+        assert envelope['error']['retryable'] is False
+        # No source_hint/responsibility when not found
+        assert 'source_hint' not in envelope or envelope.get('source_hint') is None
+        assert 'responsibility' not in envelope or envelope.get('responsibility') is None
+
+    def test_show_command_text_mode_backward_compat(self) -> None:
+        """show-command text mode (default) is unchanged from pre-#167."""
+        result = subprocess.run(
+            [sys.executable, '-m', 'src.main', 'show-command', 'add-dir'],
+            cwd=Path(__file__).resolve().parent.parent,
+            capture_output=True,
+            text=True,
+        )
+        assert result.returncode == 0
+
+        # Text output is newline-separated (name, source_hint, responsibility)
+        lines = result.stdout.strip().split('\n')
+        assert len(lines) == 3
+        assert lines[0] == 'add-dir'
+        assert 'commands/add-dir/add-dir.tsx' in lines[1]
+
+    def test_show_command_text_mode_not_found(self) -> None:
+        """show-command text mode on not-found returns prose error."""
+        result = subprocess.run(
+            [sys.executable, '-m', 'src.main', 'show-command', 'missing'],
+            cwd=Path(__file__).resolve().parent.parent,
+            capture_output=True,
+            text=True,
+        )
+        assert result.returncode == 1
+        assert 'not found' in result.stdout.lower()
+        assert 'missing' in result.stdout
+
+    def test_show_command_default_is_text(self) -> None:
+        """Omitting --output-format defaults to text."""
+        result_implicit = subprocess.run(
+            [sys.executable, '-m', 'src.main', 'show-command', 'add-dir'],
+            cwd=Path(__file__).resolve().parent.parent,
+            capture_output=True,
+            text=True,
+        )
+        result_explicit = subprocess.run(
+            [sys.executable, '-m', 'src.main', 'show-command', 'add-dir', '--output-format', 'text'],
+            cwd=Path(__file__).resolve().parent.parent,
+            capture_output=True,
+            text=True,
+        )
+        assert result_implicit.stdout == result_explicit.stdout
+
+
+class TestShowToolOutputFormat:
+    """show-tool --output-format {text,json} parity with session-lifecycle family."""
+
+    def test_show_tool_found_json(self) -> None:
+        """show-tool with found entry returns JSON envelope."""
+        result = subprocess.run(
+            [sys.executable, '-m', 'src.main', 'show-tool', 'BashTool', '--output-format', 'json'],
+            cwd=Path(__file__).resolve().parent.parent,
+            capture_output=True,
+            text=True,
+        )
+        assert result.returncode == 0, f'Expected exit 0, got {result.returncode}: {result.stderr}'
+
+        envelope = json.loads(result.stdout)
+        assert envelope['found'] is True
+        assert envelope['name'] == 'BashTool'
+        assert 'source_hint' in envelope
+        assert 'responsibility' in envelope
+        assert 'error' not in envelope
+
+    def test_show_tool_not_found_json(self) -> None:
+        """show-tool with missing entry returns typed error envelope."""
+        result = subprocess.run(
+            [sys.executable, '-m', 'src.main', 'show-tool', 'NotARealTool', '--output-format', 'json'],
+            cwd=Path(__file__).resolve().parent.parent,
+            capture_output=True,
+            text=True,
+        )
+        assert result.returncode == 1, f'Expected exit 1 on not-found, got {result.returncode}'
+
+        envelope = json.loads(result.stdout)
+        assert envelope['found'] is False
+        assert envelope['name'] == 'NotARealTool'
+        assert envelope['error']['kind'] == 'tool_not_found'
+        assert envelope['error']['retryable'] is False
+
+    def test_show_tool_text_mode_backward_compat(self) -> None:
+        """show-tool text mode (default) is unchanged from pre-#167."""
+        result = subprocess.run(
+            [sys.executable, '-m', 'src.main', 'show-tool', 'BashTool'],
+            cwd=Path(__file__).resolve().parent.parent,
+            capture_output=True,
+            text=True,
+        )
+        assert result.returncode == 0
+
+        lines = result.stdout.strip().split('\n')
+        assert len(lines) == 3
+        assert lines[0] == 'BashTool'
+        assert 'tools/BashTool/BashTool.tsx' in lines[1]
+
+
+class TestShowCommandToolFormatParity:
+    """Verify symmetry between show-command and show-tool formats."""
+
+    def test_both_accept_output_format_flag(self) -> None:
+        """Both commands accept the same --output-format choices."""
+        # Just ensure both fail with invalid choice (they accept text/json)
+        result_cmd = subprocess.run(
+            [sys.executable, '-m', 'src.main', 'show-command', 'add-dir', '--output-format', 'invalid'],
+            cwd=Path(__file__).resolve().parent.parent,
+            capture_output=True,
+            text=True,
+        )
+        result_tool = subprocess.run(
+            [sys.executable, '-m', 'src.main', 'show-tool', 'BashTool', '--output-format', 'invalid'],
+            cwd=Path(__file__).resolve().parent.parent,
+            capture_output=True,
+            text=True,
+        )
+        # Both should fail with argument parser error
+        assert result_cmd.returncode != 0
+        assert result_tool.returncode != 0
+        assert 'invalid choice' in result_cmd.stderr
+        assert 'invalid choice' in result_tool.stderr
+
+    def test_json_envelope_shape_consistency(self) -> None:
+        """Both commands return consistent JSON envelope shape."""
+        cmd_result = subprocess.run(
+            [sys.executable, '-m', 'src.main', 'show-command', 'add-dir', '--output-format', 'json'],
+            cwd=Path(__file__).resolve().parent.parent,
+            capture_output=True,
+            text=True,
+        )
+        tool_result = subprocess.run(
+            [sys.executable, '-m', 'src.main', 'show-tool', 'BashTool', '--output-format', 'json'],
+            cwd=Path(__file__).resolve().parent.parent,
+            capture_output=True,
+            text=True,
+        )
+
+        cmd_envelope = json.loads(cmd_result.stdout)
+        tool_envelope = json.loads(tool_result.stdout)
+
+        # Same top-level keys for found=true case
+        assert set(cmd_envelope.keys()) == set(tool_envelope.keys())
+        assert cmd_envelope['found'] is True
+        assert tool_envelope['found'] is True
--- a/tests/test_submit_message_budget.py
+++ b/tests/test_submit_message_budget.py
@@ -0,0 +1,167 @@
+"""Tests for submit_message budget-overflow atomicity (ROADMAP #162).
+
+Covers:
+- Budget overflow returns stop_reason='max_budget_reached' without mutating session
+- mutable_messages, transcript_store, permission_denials, total_usage all unchanged
+- Session persisted after overflow does not contain the overflow turn
+- Engine remains usable after overflow: subsequent in-budget call succeeds
+- Normal (non-overflow) path still commits state as before
+"""
+
+from __future__ import annotations
+
+import sys
+from pathlib import Path
+
+import pytest
+
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+
+from src.models import PermissionDenial, UsageSummary  # noqa: E402
+from src.port_manifest import build_port_manifest  # noqa: E402
+from src.query_engine import QueryEngineConfig, QueryEnginePort  # noqa: E402
+from src.session_store import StoredSession, load_session, save_session  # noqa: E402
+
+
+def _make_engine(max_budget_tokens: int = 10) -> QueryEnginePort:
+    engine = QueryEnginePort(manifest=build_port_manifest())
+    engine.config = QueryEngineConfig(max_budget_tokens=max_budget_tokens)
+    return engine
+
+
+class TestBudgetOverflowDoesNotMutate:
+    """The core #162 contract: overflow must leave session state untouched."""
+
+    def test_mutable_messages_unchanged_on_overflow(self) -> None:
+        engine = _make_engine(max_budget_tokens=10)
+        pre_count = len(engine.mutable_messages)
+        overflow_prompt = ' '.join(['word'] * 50)
+        result = engine.submit_message(overflow_prompt)
+        assert result.stop_reason == 'max_budget_reached'
+        assert len(engine.mutable_messages) == pre_count
+
+    def test_transcript_unchanged_on_overflow(self) -> None:
+        engine = _make_engine(max_budget_tokens=10)
+        pre_count = len(engine.transcript_store.entries)
+        overflow_prompt = ' '.join(['word'] * 50)
+        result = engine.submit_message(overflow_prompt)
+        assert result.stop_reason == 'max_budget_reached'
+        assert len(engine.transcript_store.entries) == pre_count
+
+    def test_permission_denials_unchanged_on_overflow(self) -> None:
+        engine = _make_engine(max_budget_tokens=10)
+        pre_count = len(engine.permission_denials)
+        denials = (PermissionDenial(tool_name='bash', reason='gated in test'),)
+        overflow_prompt = ' '.join(['word'] * 50)
+        result = engine.submit_message(overflow_prompt, denied_tools=denials)
+        assert result.stop_reason == 'max_budget_reached'
+        assert len(engine.permission_denials) == pre_count
+
+    def test_total_usage_unchanged_on_overflow(self) -> None:
+        engine = _make_engine(max_budget_tokens=10)
+        pre_usage = engine.total_usage
+        overflow_prompt = ' '.join(['word'] * 50)
+        result = engine.submit_message(overflow_prompt)
+        assert result.stop_reason == 'max_budget_reached'
+        assert engine.total_usage == pre_usage
+
+    def test_turn_result_reports_pre_mutation_usage(self) -> None:
+        """The TurnResult.usage must reflect session state as-if overflow never happened."""
+        engine = _make_engine(max_budget_tokens=10)
+        pre_usage = engine.total_usage
+        overflow_prompt = ' '.join(['word'] * 50)
+        result = engine.submit_message(overflow_prompt)
+        assert result.stop_reason == 'max_budget_reached'
+        assert result.usage == pre_usage
+
+
+class TestOverflowPersistence:
+    """Session persisted after overflow must not contain the overflow turn."""
+
+    def test_persisted_session_empty_when_first_turn_overflows(
+        self, tmp_path: Path, monkeypatch
+    ) -> None:
+        """When the very first call overflows, persisted session has zero messages."""
+        monkeypatch.chdir(tmp_path)
+        engine = _make_engine(max_budget_tokens=10)
+        overflow_prompt = ' '.join(['word'] * 50)
+        result = engine.submit_message(overflow_prompt)
+        assert result.stop_reason == 'max_budget_reached'
+
+        path_str = engine.persist_session()
+        path = Path(path_str)
+        assert path.exists()
+        loaded = load_session(path.stem, path.parent)
+        assert loaded.messages == (), (
+            f'overflow turn poisoned session: {loaded.messages!r}'
+        )
+
+    def test_persisted_session_retains_only_successful_turns(
+        self, tmp_path: Path, monkeypatch
+    ) -> None:
+        """A successful turn followed by an overflow persists only the successful turn."""
+        monkeypatch.chdir(tmp_path)
+        # Budget large enough for one short turn but not a second big one.
+        # Token counting is whitespace-split (see UsageSummary.add_turn),
+        # so overflow prompts must contain many whitespace-separated words.
+        engine = QueryEnginePort(manifest=build_port_manifest())
+        engine.config = QueryEngineConfig(max_budget_tokens=50)
+
+        ok = engine.submit_message('short')
+        assert ok.stop_reason == 'completed'
+        assert 'short' in engine.mutable_messages
+
+        # 500 whitespace-separated tokens — definitely over a 50-token budget
+        overflow_prompt = ' '.join(['word'] * 500)
+        overflow = engine.submit_message(overflow_prompt)
+        assert overflow.stop_reason == 'max_budget_reached'
+
+        path = Path(engine.persist_session())
+        loaded = load_session(path.stem, path.parent)
+        assert loaded.messages == ('short',), (
+            f'expected only the successful turn, got {loaded.messages!r}'
+        )
+
+
+class TestEngineUsableAfterOverflow:
+    """After overflow, engine must still be usable — overflow is rejection, not corruption."""
+
+    def test_subsequent_in_budget_call_succeeds(self) -> None:
+        """After an overflow rejection, raising the budget and retrying works."""
+        engine = _make_engine(max_budget_tokens=10)
+        overflow_prompt = ' '.join(['word'] * 100)
+        overflow = engine.submit_message(overflow_prompt)
+        assert overflow.stop_reason == 'max_budget_reached'
+
+        # Raise the budget and retry — the engine should be in a clean state
+        engine.config = QueryEngineConfig(max_budget_tokens=10_000)
+        ok = engine.submit_message('short retry')
+        assert ok.stop_reason == 'completed'
+        assert 'short retry' in engine.mutable_messages
+        # The overflow prompt should never have been recorded
+        assert overflow_prompt not in engine.mutable_messages
+
+    def test_multiple_overflow_calls_remain_idempotent(self) -> None:
+        """Repeated overflow calls must not accumulate hidden state."""
+        engine = _make_engine(max_budget_tokens=10)
+        overflow_prompt = ' '.join(['word'] * 50)
+        for _ in range(5):
+            result = engine.submit_message(overflow_prompt)
+            assert result.stop_reason == 'max_budget_reached'
+        assert len(engine.mutable_messages) == 0
+        assert len(engine.transcript_store.entries) == 0
+        assert engine.total_usage == UsageSummary()
+
+
+class TestNormalPathStillCommits:
+    """Regression guard: non-overflow path must still mutate state as before."""
+
+    def test_in_budget_turn_commits_all_state(self) -> None:
+        engine = QueryEnginePort(manifest=build_port_manifest())
+        engine.config = QueryEngineConfig(max_budget_tokens=10_000)
+        result = engine.submit_message('review MCP tool')
+        assert result.stop_reason == 'completed'
+        assert len(engine.mutable_messages) == 1
+        assert len(engine.transcript_store.entries) == 1
+        assert engine.total_usage.input_tokens > 0
+        assert engine.total_usage.output_tokens > 0
--- a/tests/test_submit_message_cancellation.py
+++ b/tests/test_submit_message_cancellation.py
@@ -0,0 +1,220 @@
+"""Tests for cooperative cancellation in submit_message (ROADMAP #164 Stage A).
+
+Verifies that cancel_event enables safe early termination:
+- Event set before call => immediate return with stop_reason='cancelled'
+- Event set between budget check and commit => still 'cancelled', no mutation
+- Event set after commit => not observable (honest cooperative limit)
+- Legacy callers (cancel_event=None) see zero behaviour change
+- State is untouched on cancellation: mutable_messages, transcript_store,
+  permission_denials, total_usage all preserved
+
+This closes the #161 follow-up gap filed as #164: wedged provider threads
+can no longer silently commit ghost turns after the caller observed a
+timeout.
+"""
+
+from __future__ import annotations
+
+import sys
+import threading
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+
+from src.models import PermissionDenial  # noqa: E402
+from src.port_manifest import build_port_manifest  # noqa: E402
+from src.query_engine import QueryEngineConfig, QueryEnginePort, TurnResult  # noqa: E402
+
+
+def _fresh_engine(**config_overrides) -> QueryEnginePort:
+    config = QueryEngineConfig(**config_overrides) if config_overrides else QueryEngineConfig()
+    return QueryEnginePort(manifest=build_port_manifest(), config=config)
+
+
+class TestCancellationBeforeCall:
+    """Event set before submit_message is invoked => immediate 'cancelled'."""
+
+    def test_pre_set_event_returns_cancelled_immediately(self) -> None:
+        engine = _fresh_engine()
+        event = threading.Event()
+        event.set()
+
+        result = engine.submit_message('hello', cancel_event=event)
+
+        assert result.stop_reason == 'cancelled'
+        assert result.prompt == 'hello'
+        # Output is empty on pre-budget cancel (no synthesis)
+        assert result.output == ''
+
+    def test_pre_set_event_preserves_mutable_messages(self) -> None:
+        engine = _fresh_engine()
+        event = threading.Event()
+        event.set()
+
+        engine.submit_message('ghost turn', cancel_event=event)
+
+        assert engine.mutable_messages == [], (
+            'cancelled turn must not appear in mutable_messages'
+        )
+
+    def test_pre_set_event_preserves_transcript_store(self) -> None:
+        engine = _fresh_engine()
+        event = threading.Event()
+        event.set()
+
+        engine.submit_message('ghost turn', cancel_event=event)
+
+        assert engine.transcript_store.entries == [], (
+            'cancelled turn must not appear in transcript_store'
+        )
+
+    def test_pre_set_event_preserves_usage_counters(self) -> None:
+        engine = _fresh_engine()
+        initial_usage = engine.total_usage
+        event = threading.Event()
+        event.set()
+
+        engine.submit_message('expensive prompt ' * 100, cancel_event=event)
+
+        assert engine.total_usage == initial_usage, (
+            'cancelled turn must not increment token counters'
+        )
+
+    def test_pre_set_event_preserves_permission_denials(self) -> None:
+        engine = _fresh_engine()
+        event = threading.Event()
+        event.set()
+
+        denials = (PermissionDenial(tool_name='BashTool', reason='destructive'),)
+        engine.submit_message('run bash ls', denied_tools=denials, cancel_event=event)
+
+        assert engine.permission_denials == [], (
+            'cancelled turn must not extend permission_denials'
+        )
+
+
+class TestCancellationAfterBudgetCheck:
+    """Event set between budget projection and commit => 'cancelled', state intact.
+
+    This simulates the realistic racy case: engine starts computing output,
+    caller hits deadline, sets event. Engine observes at post-budget checkpoint
+    and returns cleanly.
+    """
+
+    def test_post_budget_cancel_returns_cancelled(self) -> None:
+        engine = _fresh_engine()
+        event = threading.Event()
+
+        # Patch: set the event after projection but before mutation. We do this
+        # by wrapping _format_output (called mid-submit) to set the event.
+        original_format = engine._format_output
+
+        def _set_then_format(*args, **kwargs):
+            result = original_format(*args, **kwargs)
+            event.set()  # trigger cancel right after output is built
+            return result
+
+        engine._format_output = _set_then_format  # type: ignore[method-assign]
+
+        result = engine.submit_message('hello', cancel_event=event)
+
+        assert result.stop_reason == 'cancelled'
+        # Output IS built here (we're past the pre-budget checkpoint), so it's
+        # not empty. The contract is about *state*, not output synthesis.
+        assert result.output != ''
+        # Critical: state still unchanged
+        assert engine.mutable_messages == []
+        assert engine.transcript_store.entries == []
+
+
+class TestCancellationAfterCommit:
+    """Event set after commit is not observable \u2014 honest cooperative limit."""
+
+    def test_post_commit_cancel_is_not_observable(self) -> None:
+        engine = _fresh_engine()
+        event = threading.Event()
+
+        # Event only set *after* submit_message returns. The first call has
+        # already committed before the event is set.
+        result = engine.submit_message('hello', cancel_event=event)
+        event.set()  # too late
+
+        assert result.stop_reason == 'completed', (
+            'cancel set after commit must not retroactively invalidate the turn'
+        )
+        assert engine.mutable_messages == ['hello']
+
+    def test_next_call_observes_cancel(self) -> None:
+        """The cancel_event persists \u2014 the next call on the same engine sees it."""
+        engine = _fresh_engine()
+        event = threading.Event()
+
+        engine.submit_message('first', cancel_event=event)
+        assert engine.mutable_messages == ['first']
+
+        event.set()
+        # Next call observes the cancel at entry
+        result = engine.submit_message('second', cancel_event=event)
+
+        assert result.stop_reason == 'cancelled'
+        # 'second' must NOT have been committed
+        assert engine.mutable_messages == ['first']
+
+
+class TestLegacyCallersUnchanged:
+    """cancel_event=None (default) => zero behaviour change from pre-#164."""
+
+    def test_no_event_submits_normally(self) -> None:
+        engine = _fresh_engine()
+        result = engine.submit_message('hello')
+
+        assert result.stop_reason == 'completed'
+        assert engine.mutable_messages == ['hello']
+
+    def test_no_event_with_budget_overflow_still_rejects_atomically(self) -> None:
+        """#162 atomicity contract survives when cancel_event is absent."""
+        engine = _fresh_engine(max_budget_tokens=1)
+        words = ' '.join(['word'] * 100)
+
+        result = engine.submit_message(words)  # no cancel_event
+
+        assert result.stop_reason == 'max_budget_reached'
+        assert engine.mutable_messages == []
+
+    def test_no_event_respects_max_turns(self) -> None:
+        """max_turns_reached contract survives when cancel_event is absent."""
+        engine = _fresh_engine(max_turns=1)
+        engine.submit_message('first')
+        result = engine.submit_message('second')  # no cancel_event
+
+        assert result.stop_reason == 'max_turns_reached'
+        assert engine.mutable_messages == ['first']
+
+
+class TestCancellationVsOtherStopReasons:
+    """cancel_event has a defined precedence relative to budget/turns."""
+
+    def test_cancel_precedes_max_turns_check(self) -> None:
+        """If cancel is set when capacity is also full, cancel wins (clearer signal)."""
+        engine = _fresh_engine(max_turns=0)  # immediately full
+        event = threading.Event()
+        event.set()
+
+        result = engine.submit_message('hello', cancel_event=event)
+
+        # cancel_event check is the very first thing in submit_message,
+        # so it fires before the max_turns check even sees capacity
+        assert result.stop_reason == 'cancelled'
+
+    def test_cancel_does_not_override_commit(self) -> None:
+        """Completed turn with late cancel still reports 'completed' \u2014 the
+        turn already succeeded; we don't lie about it."""
+        engine = _fresh_engine()
+        event = threading.Event()
+
+        # Event gets set after the mutation is done \u2014 submit_message doesn't
+        # re-check after commit
+        result = engine.submit_message('hello', cancel_event=event)
+        event.set()
+
+        assert result.stop_reason == 'completed'