Technology

Under the hood.

A look at the runtime, sandbox, knowledge graph, peer protocol, and the small set of design choices that keep agents on rails.

Agent runtime

One configurable agent. Many profiles.

SwarmMarshal replaced 25 hand-coded specialist agents with a single tool-use engine. Each "agent" is a profile: system prompt + enabled tools + LLM provider + budget.

  1. User sends a message Goes to AgenticChatServiceV2, which delegates to the turn runner.
  2. Profile resolves and filters tools IProfileResolver picks the active profile and trims the tool catalog to its enabled set.
  3. Context injectors append blocks Life context, artifact summaries, and conversation state are appended to the system prompt per profile.
  4. Streaming tool-use loop IToolUseAgentEngine calls the LLM, emits text deltas + tool-call events, executes each tool, feeds the result back, repeats until the model stops.
  5. Persisted both ways Full tool-use blocks land in AgentTranscriptMessage; user/assistant pairs land in AgentChatTurn for the UI.
Max 40 messages in window 20 tool-use rounds 10-minute hard timeout AgentJournalEntry per turn
Wire protocols

Each LLM has its own dialect. The runtime speaks all of them.

Per-provider adapters normalize streaming tool-use into one TurnEvent stream the engine consumes.

Provider Endpoint Streaming Tool-call format
OpenAI /chat/completions SSE Indexed argument deltas
Ollama /chat/completions SSE Shared OpenAI-style adapter
Anthropic /v1/messages Named-event SSE tool_use + input_json_delta
Gemini :streamGenerateContent?alt=sse SSE Atomic functionCall parts; ids synthesized as gem-{n}
Grok (xAI) /chat/completions SSE OpenAI-compatible
DeepSeek /chat/completions SSE OpenAI-compatible
Hugging Face /chat/completions SSE OpenAI-compatible router

OpenAI, Ollama, Grok, DeepSeek, and Hugging Face all share OpenAIStyleToolUseProvider. Anthropic gets AnthropicToolUseProvider. Gemini gets GeminiToolUseProvider — Google emits whole functionCall objects atomically (no argument streaming), and the translator synthesizes ids since Gemini's API doesn't carry them.

Sandboxed primitives

A handful of primitive tools. Everything else composes from them.

Path-clamped to %LOCALAPPDATA%/SwarmMarshal/sandbox/. Path escapes throw UnauthorizedAccessException before any I/O happens.

http.request(method, url, body?)

HTTP with a configurable host allowlist. Returns status, headers, and body.

shell.exec(command, cwd?)

Shell command pinned to the sandbox directory. Stdout, stderr, and exit code come back.

fs.read_file(path)

Reads a file, clamped to the sandbox root. Path traversal throws before any I/O.

fs.write_file(path, content)

Writes a file, also sandbox-clamped. Atomic replace; parents auto-created.

fs.list_files(directory?)

Lists entries under the sandbox. Default lists the sandbox root.

code.run_csharp(source)

Ad-hoc C# via the existing code-execution skill. Compiles, runs, returns the result.

skills.run(skillId, args)

Generic runner for any registered skill — markdown SKILL.md or compiled C#.

catalog.search_tools(query)

Semantic search over the tool catalog so an agent can discover new tools at runtime.

Host allowlist for HTTP Sandbox cwd for shell Path resolver clamps fs.* catalog.search_tools is semantic
Memory

From inbox to graph.

Messages flow through entity extraction into a local graph of people, companies, topics, and projects. Each neighborhood can get an LLM-written summary, routed through local or cloud models according to your settings.

Ingest Messages indexed
Extract Entities + edges
Summarize Per-neighborhood

Hybrid search

Embeddings handle "the invoice from Acme last quarter." BM25 handles "ORD-4821." Results are fused so phrasing and exact match both work.

Semantic BM25 Cross-channel

IndexingStatusSkill

Agents can answer "how many messages left to process?" because the indexer exposes throughput, backlog, and ETA as a built-in skill.

Throughput Backlog ETA
Network

Two laptops, one swarm.

Pair devices over LAN. "Peer chat" is not chat with another human — it's chat with the agent on the other device.

Boss Desktop · approves work
Employee Laptop · runs jobs
  1. LAN discovery Devices find each other via local broadcast. No central server, no relay.
  2. Authenticated peer-data channel Generic exchange for credentials, OAuth tokens, and config. Mutually authenticated; payloads scoped per request.
  3. OAuth self-heal Token expired on the laptop while offline? When it sees the boss, it pulls a fresh token over the peer channel. No re-auth dialog.
  4. Firewall auto-mitigation The pairing flow patches Windows Defender or macOS firewall rules so the LAN handshake actually succeeds.
  5. Mailbox transport as fallback If LAN drops, employees still report via a shared mailbox bus. Agents resync when the LAN returns.
Tools, external

Speak Model Context Protocol in both directions.

SwarmMarshal is both an MCP client (consumes external connectors) and an MCP server (exposes its own tool modules to other clients).

Consume

Add an MCP server such as filesystem, GitHub, Slack, SQLite, browser search, or an internal tool. Its tools land next to the built-ins, and agents pick them through the same routing and approval flow.

Filesystem GitHub Custom

Expose

Point any MCP client at SwarmMarshal's built-in server and use eight tool modules from outside — useful when you want another agent platform to drive SwarmMarshal's state.

Built-in server Eight modules External clients
Skills

Markdown or C#. Drafts to catalog.

A skill is a callable function with metadata. Author by hand, or let the assistant draft one and queue it for review. The same runtime invokes both.

SKILL.md

YAML frontmatter (id, description, schema) plus a markdown body the LLM reads as instructions. Easy to author, easy to diff.

ISkill (C#)

Typed input + invoke method. Best when the skill needs deterministic logic, sandbox access, or fast loops.

  1. Draft An agent generates a SKILL.md from a plain-English description. Lands in the Drafts queue.
  2. Test The Skill Manager runs it in isolation against representative inputs. You can edit before promoting.
  3. Promote One click moves it to the catalog. From then on, every profile that includes skills.run can call it.
Prompt-injection hardening Trust tiers Sandboxed by default
LLM routing

Right-size every call.

Per-task model assignment with budgets, health checks, and Ollama auto-detect. No single "the model" — different tasks pick different tiers.

Auto-detect

Scans the local network for an Ollama install and registers detected models. Local-first when local works, cloud when it doesn't.

Per-task

Classification on a fast model, summaries on a smart one, drafting on whatever fits the budget. Each function picks its own tier.

Spend Guard

Hard caps per model, per agent, per day. Spend Guard cuts off the bill before it surprises you and pages the boss when it does.

Self-healing

Agents that fix their own infrastructure.

If the local AI stack is wedged, the model is undersized, or the error rate spikes, an agent runs a diagnostic skill and proposes a fix. Common repairs stay inside the app, with explicit approval where they change your machine.

fix-wedged-ollama

Detects Ollama hung on a previous request, resets the runner, and reschedules the pending turn.

diagnose-error-rate

Walks the journal, classifies failures by tool and provider, and surfaces the dominant root cause.

diagnose-slow-local-llm

Compares observed latency against the model's expected envelope and recommends an action.

swap-undersized-local-model

If the model can't keep up, proposes (and with approval, performs) a swap to a better-sized local model.

Guided setup Repairs common OAuth issues Patches firewalls Maintains local AI/browser capabilities
Open-source curious?

Run it locally. Read the runtime.

Everything described here is in the shipping app. Download the preview, then poke around.