fix: route local OpenAI-compatible models

2026-06-06 22:47:10 +08:00 · 2026-06-03 23:16:46 +09:00
parent 9522674c87
commit bcc5bfde9c
7 changed files with 264 additions and 40 deletions
--- a/docs/local-openai-compatible-providers.md
+++ b/docs/local-openai-compatible-providers.md
@@ -13,7 +13,7 @@ If you need the most polished daily-driver experience for a specific non-Claude

 ## OpenAI-compatible routing basics

-Set `OPENAI_BASE_URL` to the server’s `/v1` endpoint and set `OPENAI_API_KEY` to either the required token or a harmless placeholder for local servers that expect an Authorization header. The model name must match what the server exposes.
+Set `OPENAI_BASE_URL` to the server’s `/v1` endpoint and set `OPENAI_API_KEY` to either the required token or a harmless placeholder for local servers that expect an Authorization header. Authless local/private OpenAI-compatible servers can leave `OPENAI_API_KEY` unset. The model name must match what the server exposes.

 ```bash
 export OPENAI_BASE_URL="http://127.0.0.1:11434/v1"
@@ -24,8 +24,8 @@ claw --model "qwen3:latest" prompt "Reply exactly HELLO_WORLD_123"
 Routing notes:

 - Use the `openai/` prefix for OpenAI-compatible gateways when you need prefix routing to win over ambient Anthropic credentials, for example `--model "openai/gpt-4.1-mini"` with OpenRouter.
- For local servers, prefer the exact model ID reported by the server (`qwen3:latest`, `llama3.2`, `Qwen/Qwen2.5-Coder-7B-Instruct`, etc.). If your local gateway exposes slash-containing IDs, use that exact slug.
- If you have multiple provider keys in your environment, remove unrelated keys while smoke-testing a local route or choose a model prefix that unambiguously selects the intended provider.
+- For local servers, prefer the exact model ID reported by the server (`qwen3:latest`, `llama3.2`, etc.). If your local gateway exposes slash-containing IDs, prefix the exact slug with `local/` so Claw routes through OpenAI-compatible transport while sending the rest verbatim, for example `--model "local/Qwen/Qwen2.5-Coder-7B-Instruct"`.
+- If you have multiple provider keys in your environment, `OPENAI_BASE_URL` plus local-looking tags such as `llama3.2` or `qwen2.5-coder:7b` selects the local OpenAI-compatible route; use `local/` for slash-containing local IDs.
 - Tool workflows need model/server support for OpenAI-compatible tool calls. Plain prompt smoke tests can pass even when slash/tool workflows still fail because the server returns an incompatible tool-call shape.

 ## Raw `/v1/chat/completions` smoke test
@@ -58,11 +58,11 @@ In another shell:

 ```bash
 export OPENAI_BASE_URL="http://127.0.0.1:11434/v1"
-export OPENAI_API_KEY="local-dev-token"
+unset OPENAI_API_KEY
 claw --model "qwen3:latest" prompt "Reply exactly HELLO_WORLD_123"
 ```

-If Ollama is running without auth and your build accepts authless local OpenAI-compatible servers, `unset OPENAI_API_KEY` is also acceptable. Use a placeholder token rather than a real cloud API key for local testing.
+If Ollama is running without auth, `unset OPENAI_API_KEY` is acceptable. Use a placeholder token rather than a real cloud API key if your local server requires an Authorization header.

 ## llama.cpp server