Allowed
Hardware bucket, OS/runtime version, model tag, quantization, score, pass rate, failure counts, latency, and safe runtime settings.
SwarmMarshal publishes privacy-safe cloud and local model benchmark rows for the message pipeline. The desktop app uses these rows as candidate priors, then applies local calibration and routing policy before changing the machine's active model.
Promotion gates are strict: score at least 0.88, pass rate at least 90%, no critical failures, and no parse failures. Required settings travel with the recommendation.
| Hardware bucket | Model | Score | Pass | Latency | Confidence | Required settings |
|---|---|---|---|---|---|---|
| generic | OpenAI/gpt-5.5 |
0.994 | 100% | 13.8s | official-lab | |
| subscription-cli | CodexCli/codex-cli:default |
0.994 | 100% | 34.1s | official-lab | provider_kind=subscription-cli requires_subscription=True |
These are aggregate calibration results only. They never include raw messages, prompts, model responses, headers, extracted facts, or personal data.
| Scope | Hardware bucket | Model | Score | Pass | Latency | Gate | Notes |
|---|---|---|---|---|---|---|---|
| Cloud | generic | OpenAI/gpt-5.5 |
0.994 | 100% | 13.8s | Clears | Official seed-v4 production-pipeline calibration on 9 synthetic seed cases, including long-thread business, tech, and legal chains; aggregate metadata only. |
| Cloud | subscription-cli | CodexCli/codex-cli:default |
0.994 | 100% | 34.1s | Clears | Official seed-v4 production-pipeline calibration for subscription-backed CLI routes; requires the user's authenticated local CLI; aggregate metadata only. |
| Cloud | generic | Claude/claude-haiku-4-5-20251001 |
0.986 | 67% | 9.1s | Hold | Official seed-v4 production-pipeline calibration on 9 synthetic seed cases, including long-thread business, tech, and legal chains; aggregate metadata only. |
| Cloud | generic | OpenAI/gpt-5.4-mini |
0.980 | 56% | 4.7s | Hold | Official seed-v4 production-pipeline calibration on 9 synthetic seed cases, including long-thread business, tech, and legal chains; aggregate metadata only. |
| Cloud | generic | DeepSeek/deepseek-v4-flash |
0.971 | 56% | 10.4s | Hold | Official seed-v4 production-pipeline calibration on 9 synthetic seed cases, including long-thread business, tech, and legal chains; aggregate metadata only. |
| Cloud | generic | DeepSeek/deepseek-v4-pro |
0.925 | 56% | 127.5s | Hold | Official seed-v4 production-pipeline calibration on 9 synthetic seed cases, including long-thread business, tech, and legal chains; aggregate metadata only. |
| Cloud | generic | Claude/claude-opus-4-7 |
0.909 | 67% | 63.9s | Hold | Official seed-v4 production-pipeline calibration on 9 synthetic seed cases, including long-thread business, tech, and legal chains; aggregate metadata only. |
| Cloud | generic | Claude/claude-sonnet-4-6 |
0.907 | 67% | 82.1s | Hold | Official seed-v4 production-pipeline calibration on 9 synthetic seed cases, including long-thread business, tech, and legal chains; aggregate metadata only. |
| Cloud | generic | OpenAI/gpt-5.4-nano |
0.874 | 44% | 13.6s | Hold | Official seed-v4 production-pipeline calibration on 9 synthetic seed cases, including long-thread business, tech, and legal chains; aggregate metadata only. |
| Local | nvidia-4070-class-64gb | Ollama/qwen3:30b-a3b |
0.982 | 67% | 144.2s | Hold | Official seed-v4 production-pipeline calibration on NVIDIA 4070-class local hardware with long-thread business, tech, and legal chains; aggregate metadata only. |
| Local | nvidia-4070-class-64gb | Ollama/qwen2.5:14b |
0.982 | 67% | 144.8s | Hold | Official seed-v4 production-pipeline calibration on NVIDIA 4070-class local hardware with long-thread business, tech, and legal chains; aggregate metadata only. |
| Local | nvidia-4070-class-64gb | Ollama/qwen3.5:35b-a3b |
0.982 | 67% | 162.2s | Hold | Official seed-v4 production-pipeline calibration on NVIDIA 4070-class local hardware with long-thread business, tech, and legal chains; aggregate metadata only. |
| Cloud | subscription-cli | ClaudeCode/claude-code:sonnet |
0.990 | 78% | 63.3s | Hold | Official seed-v4 production-pipeline calibration for subscription-backed CLI routes; requires the user's authenticated local CLI; aggregate metadata only. |
| Cloud | subscription-cli | ClaudeCode/claude-code:opus |
0.990 | 78% | 19.3s | Hold | Official seed-v4 production-pipeline calibration for subscription-backed CLI routes; requires the user's authenticated local CLI; aggregate metadata only. |
| Cloud | subscription-cli | ClaudeCode/claude-code:haiku |
0.976 | 44% | 45.8s | Hold | Official seed-v4 production-pipeline calibration for subscription-backed CLI routes; requires the user's authenticated local CLI; aggregate metadata only. |
Hardware bucket, OS/runtime version, model tag, quantization, score, pass rate, failure counts, latency, and safe runtime settings.
Raw emails, prompts, model responses, headers, extracted facts, contact names, or any user-derived message content.
The desktop app checks this endpoint daily and caches the bundle locally. It only uses the bundle to rank candidates; local calibration and cooldown rules still decide whether a model is applied.