feat: #164 Stage B CLOSURE — turn-loop JSON + cancel_observed coverage + CLAWABLE promotion

Closes all three gaebal-gajae-identified closure criteria for #164 Stage B:

1. turn-loop runtime surface exposes cancel_observed consistently
2. cancellation path tests validate safe-to-reuse semantics
3. turn-loop promoted from OPT_OUT to CLAWABLE surface

Changes:

src/main.py:
- turn-loop accepts --output-format {text,json}
- JSON envelope includes per-turn cancel_observed + final_cancel_observed
- All turn fields exposed: prompt, output, stop_reason, cancel_observed,
  matched_commands, matched_tools
- Exit code 2 on final timeout preserved

tests/test_cli_parity_audit.py:
- CLAWABLE_SURFACES now contains 14 commands (was 13)
- Removed 'turn-loop' from OPT_OUT_SURFACES
- Parametrized --output-format test auto-validates turn-loop JSON

tests/test_cancel_observed_field.py (new, 9 tests):
- TestCancelObservedField (5 tests): field contract
  - default False
  - explicit True preserved
  - normal completion → False
  - bootstrap JSON exposes field
  - turn-loop JSON exposes per-turn field
- TestCancelObservedSafeReuseSemantics (2 tests): reuse contract
  - timeout result has cancel_observed=True when signaled
  - engine.mutable_messages not corrupted after cancelled turn
  - engine accepts fresh message after cancellation
- TestCancelObservedSchemaCompliance (2 tests): SCHEMAS.md contract
  - cancel_observed is always bool
  - final_cancel_observed convenience field present

Closure criteria validated:
-  Field exposed in bootstrap JSON
-  Field exposed per-turn in turn-loop JSON
-  Field is always bool, never null
-  Safe-to-reuse: engine can accept fresh messages after cancellation
-  mutable_messages not corrupted by cancelled turn
-  turn-loop promoted from OPT_OUT (14 clawable commands now)

Protocol now distinguishes at runtime:
  timeout + cancel_observed=false → infra/wedge (escalate)
  timeout + cancel_observed=true → cooperative cancellation (safe to retry)

Test results: 182 → 192 passing, +10 tests, zero regression, 3 skipped unchanged.

Closes #164 Stage B. Stage C (async-native preemption) remains future work.
This commit is contained in:
YeonGyu-Kim
2026-04-22 19:49:20 +09:00
parent 97c4b130dc
commit 11f9e8a5a2
3 changed files with 237 additions and 7 deletions

View File

@@ -101,6 +101,12 @@ def build_parser() -> argparse.ArgumentParser:
'suffix that used to pollute the transcript.'
),
)
loop_parser.add_argument(
'--output-format',
choices=['text', 'json'],
default='text',
help='output format (#164 Stage B: JSON includes cancel_observed per turn)',
)
flush_parser = subparsers.add_parser(
'flush-transcript',
@@ -349,15 +355,40 @@ def main(argv: list[str] | None = None) -> int:
timeout_seconds=args.timeout_seconds,
continuation_prompt=args.continuation_prompt,
)
# Exit 2 when a timeout terminated the loop so claws can distinguish
# 'ran to completion' from 'hit wall-clock budget'.
loop_exit_code = 2 if results and results[-1].stop_reason == 'timeout' else 0
if args.output_format == 'json':
# #164 Stage B + #173: JSON envelope with per-turn cancel_observed
# Promotes turn-loop from OPT_OUT to CLAWABLE surface.
import json
envelope = {
'prompt': args.prompt,
'max_turns': args.max_turns,
'turns_completed': len(results),
'timeout_seconds': args.timeout_seconds,
'continuation_prompt': args.continuation_prompt,
'turns': [
{
'prompt': r.prompt,
'output': r.output,
'stop_reason': r.stop_reason,
'cancel_observed': r.cancel_observed,
'matched_commands': list(r.matched_commands),
'matched_tools': list(r.matched_tools),
}
for r in results
],
'final_stop_reason': results[-1].stop_reason if results else None,
'final_cancel_observed': results[-1].cancel_observed if results else False,
}
print(json.dumps(wrap_json_envelope(envelope, args.command, exit_code=loop_exit_code)))
return loop_exit_code
for idx, result in enumerate(results, start=1):
print(f'## Turn {idx}')
print(result.output)
print(f'stop_reason={result.stop_reason}')
# Exit 2 when a timeout terminated the loop so claws can distinguish
# 'ran to completion' from 'hit wall-clock budget'.
if results and results[-1].stop_reason == 'timeout':
return 2
return 0
return loop_exit_code
if args.command == 'flush-transcript':
from pathlib import Path as _Path
engine = QueryEnginePort.from_workspace()