fix: recover from llama.cpp context overflow and reqwest SSE decode failures

Extend auto-compaction error detection to handle additional error patterns from llama.cpp backends: 'Context size has been exceeded', 'exceed_context_size_error', 'exceeds the available context size'. Also recover from reqwest 'error decoding response body' errors — some llama.cpp instances return a non-SSE plaintext HTTP 500 on context overflow, causing the SSE deserializer to fail. Add dynamic threshold adaptation: parse server-reported context window size from error messages (e.g., '(81920 tokens)') and set the auto- compaction trigger at 70% of that value. This replaces the need for a hardcoded threshold, adapting automatically to any backend's limits. This patch was developed with assistance from OpenCode and local Qwen 3.6 API server.
2026-06-05 22:17:10 +08:00 · 2026-05-27 16:45:17 +02:00
parent 87b7e74770
commit 1d516be779
2 changed files with 104 additions and 4 deletions
--- a/rust/crates/runtime/src/conversation.rs
+++ b/rust/crates/runtime/src/conversation.rs
@@ -204,6 +204,13 @@ where
        self
    }

+    /// Update the auto-compaction threshold after construction. This allows the
+    /// caller to tune the threshold based on runtime information (e.g., the
+    /// server-returned context window size from a 400 error).
+    pub fn set_auto_compaction_input_tokens_threshold(&mut self, threshold: u32) {
+        self.auto_compaction_input_tokens_threshold = threshold;
+    }
+
    #[must_use]
    pub fn with_hook_abort_signal(mut self, hook_abort_signal: HookAbortSignal) -> Self {
        self.hook_abort_signal = hook_abort_signal;