Technology

Provenance

Every fact carries the message it came from.

Most assistant stacks accumulate orphaned, unsourced rows. SwarmMarshal treats provenance and cleanup as first-class, so an answer can always point back to the record behind it.

SourceRef on every durable record

A canonical shape — kind, account, record id, title, excerpt, observed-at, and deletion behavior — stored on the row itself, not bolted on later. The exact-source pair (account + message) is a runtime-enforced invariant.

Trust-gated knowledge

Unverified inbound claims land in a holding table. Promotion to durable memory requires user-authored, trusted-contact, or corroborated status — and rejected claims are suppressed so they aren't relearned.

Cleanup on source delete

Delete a message and the data derived from it tombstones or cascades. Every writer of source-backed data registers a deletion observer, so the citation behind an answer can never dangle.

Per-account isolated storage Local-first, on your machine Quotes kept separate from inference

Memory

From inbox to knowledge graph.

Messages flow through entity extraction into a local graph of people, companies, topics, and projects. Each neighborhood can get an LLM-written summary, routed through local or cloud models according to your settings — and temporal fields preserve the source message time, not ingestion time.

Ingest Messages indexed

Extract Entities + edges

Summarize Per-neighborhood

Hybrid search

Embeddings handle "the invoice from Acme last quarter." BM25 handles "ORD-4821." Results are fused so phrasing and exact match both work. Browsing and name lookup keep working even with no LLM or embeddings available.

Semantic BM25 Cross-channel

Answer packs

An assistant question blends timeline events, context facts, the knowledge-graph neighborhood, and direct message evidence — so receipts, charge amounts, deadlines, and dispute history come from real records, not recall.

Timeline Facts Message evidence

Assistant runtime

One configurable engine. Many profiles.

SwarmMarshal replaced 25 hand-coded specialist agents with a single tool-use engine. The assistant and any supervised helper are each just a profile: system prompt + enabled tools + LLM provider + budget.

You send a message Goes to AgenticChatServiceV2, which delegates to the turn runner.
Profile resolves and filters tools IProfileResolver picks the active profile and trims the tool catalog to its enabled set.
Context injectors append blocks Life context, artifact summaries, and conversation state are appended to the system prompt per profile.
Streaming tool-use loop IToolUseAgentEngine calls the LLM, emits text deltas + tool-call events, executes each tool, feeds the result back, repeats until the model stops.
Persisted both ways Full tool-use blocks land in AgentTranscriptMessage; user/assistant pairs land in AgentChatTurn for the UI.

Max 40 messages in window 20 tool-use rounds 10-minute hard timeout Journal entry per turn

Wire protocols

Each LLM has its own dialect. The runtime speaks all of them.

Per-provider adapters normalize streaming tool-use into one TurnEvent stream the engine consumes.

Provider	Endpoint	Streaming	Tool-call format
OpenAI	`/chat/completions`	SSE	Indexed argument deltas
Ollama	`/chat/completions`	SSE	Shared OpenAI-style adapter
Anthropic	`/v1/messages`	Named-event SSE	`tool_use` + `input_json_delta`
Gemini	`:streamGenerateContent?alt=sse`	SSE	Atomic `functionCall` parts; ids synthesized as `gem-{n}`
Grok (xAI)	`/chat/completions`	SSE	OpenAI-compatible
DeepSeek	`/chat/completions`	SSE	OpenAI-compatible
Hugging Face	`/chat/completions`	SSE	OpenAI-compatible router

OpenAI, Ollama, Grok, DeepSeek, and Hugging Face all share OpenAIStyleToolUseProvider. Anthropic gets AnthropicToolUseProvider. Gemini gets GeminiToolUseProvider — Google emits whole functionCall objects atomically (no argument streaming), and the translator synthesizes ids since Gemini's API doesn't carry them. Subscription CLI sessions (Claude Code, Codex) are first-class routes too.

Sandboxed primitives

A handful of primitive tools. Everything else composes from them.

Path-clamped to %LOCALAPPDATA%/SwarmMarshal/sandbox/. Path escapes throw UnauthorizedAccessException before any I/O happens. Consequential tools pass through a permission broker that surfaces inline approve/deny.

http.request(method, url, body?)

HTTP with a configurable host allowlist. Returns status, headers, and body.

shell.exec(command, cwd?)

Shell command pinned to the sandbox directory. Stdout, stderr, and exit code come back.

fs.read_file(path)

Reads a file, clamped to the sandbox root. Path traversal throws before any I/O.

fs.write_file(path, content)

Writes a file, also sandbox-clamped. Atomic replace; parents auto-created.

fs.list_files(directory?)

Lists entries under the sandbox. Default lists the sandbox root.

code.run_csharp(source)

Ad-hoc C# via the existing code-execution skill. Compiles, runs, returns the result.

skills.run(skillId, args)

Generic runner for any registered skill — markdown SKILL.md or compiled C#.

catalog.search_tools(query)

Semantic search over the tool catalog so an agent can discover new tools at runtime.

Host allowlist for HTTP Sandbox cwd for shell Path resolver clamps fs.* Approval broker on risky tools

Generative tooling

The agent writes its own tools — then stops needing the model.

The newest layer of the tool system: instead of calling an LLM for every repeated job, the agent authors a small, typed C# tool once. The model does the creative work one time; deterministic code does the repeated work forever. The principle underneath: type the data, code the behavior.

Authored, typed, sandboxed

Tools carry a declared contract — parameters, return shape, a sample input — and compile through Roslyn inside a shared sandbox. A safety gate blocks raw HTTP and filesystem access; all web egress goes through one auditable channel. Compiles are cached; execution is time-bounded.

Every run leaves a record

Each tool stamps its last run — when, ok-or-error, and the output. Nothing is an opaque blob: an "Inside" workbench shows the code, the contract, and the last result, and lets you edit or run any tool by hand.

Zero-LLM fast path

Once an agent has authored a bulk-fetch tool, repeat refreshes run the compiled tool directly — a 21-symbol portfolio refresh dropped from ~42 model calls to zero. If the tool breaks, the system self-heals back to the model to repair it.

The same substrate shows up in three places: Vibes apps author tools that call real APIs and keep their rows fresh; per-field display formatting is an authored snippet (structured output only — never raw HTML); and SwarmMarshal's Today "Generated" panels run in the identical sandbox with the identical run record.

Roslyn-compiled C# Safety-gated egress Declared parameters & returns Run records Self-heals via the model

Workers & swarm

Workers are processes. Your machines are the cluster.

Each hired worker is a full profile executed by its own supervised child process — heartbeat, restart backoff, orphan adoption, stop-on-exit. Paired same-owner machines form a swarm that shares state, work, and models over signed local traffic. No public gateway exists to expose.

Supervised lifecycles

Hiring is stage → approve → activate: the assistant provisions the worker's profile and connection-tested email account, then waits on a human approval task. Credentials never ride approval payloads; pasted secrets are redacted from transcripts before validation. Workers are machine-bound and never sync.

Peer compute

Peers advertise the local models they can serve; a remote-peer LLM provider forwards a completion to the machine with the hardware. A load-scored allocator hands scheduled tasks and background jobs to the least-busy node.

Sleep-tolerant sync

Journaled replication with per-origin high-water marks and LWW conflicts; snapshot bootstrap for new peers; automatic gap healing after long offline stretches; large files stream in resumable, hash-verified chunks. Short-TTL work claims stop two machines paying to process the same message.

Signed LAN envelopes only No public gateway No LLM heartbeat Per-worker spend caps

The worker & distributed-computing story →

Tools, external

Speak Model Context Protocol in both directions.

SwarmMarshal is both an MCP client (consumes external connectors) and an MCP server (exposes its own data to other agents).

Consume

Add an MCP server such as filesystem, GitHub, Slack, SQLite, browser search, or an internal tool. Its tools land next to the built-ins, and the assistant picks them through the same routing and approval flow.

Filesystem GitHub Custom

Expose

Point your own agent — Claude Code, Codex, or another MCP client — at SwarmMarshal's built-in server and query the same source-grounded data, with citations, behind an explicit external-use approval gate.

Built-in server Approval-gated Source refs returned

See the MCP tool catalog →

LLM routing

Right-size every call.

Per-task model assignment with budgets, health checks, and Ollama auto-detect. There is no single "the model" — different tasks pick different tiers, and sensitive work can be pinned local-only.

Auto-detect

Scans for a local Ollama install and registers detected models. Local-first when local works, cloud when it doesn't — with a fully local option for private work.

Per-task

Classification on a fast model, summaries on a smart one, high-stakes drafting on a frontier model. Each function type picks its own tier — and LocalOnly is enforced before any cloud advisor runs.

Spend Guard

Hard caps per model, per agent, per day. Spend Guard cuts off the bill before it surprises you and alerts you when it does.

Full routing model → Local benchmarks →

Memory hygiene

Long-lived memory needs operations discipline.

Memory is a durable asset, so it gets maintenance cycles, calibration gates, and deterministic safety filters — not just a prompt that hopes for the best.

Calibration gates

Prompt and model changes run through the message-pipeline calibration harness. Results are recorded by prompt hash and hardware key — the same model on different hardware is a different decision, and routing fails closed without passing proof.

Weekly reconciliation

A local maintenance pass marks stale claims, escalates contradictions, prunes low-confidence projections, and journals an auditable insight row. Conflicts surface for review instead of being silently merged.

Deterministic safety

When enrichment marks a message not knowledge-worthy, durable artifacts are dropped before persistence. Ungrounded quotes are dropped. The model classifies; the system gates the durable writes.

Self-healing

It fixes its own local infrastructure.

If the local AI stack is wedged, the model is undersized, or the error rate spikes, a diagnostic skill runs and proposes a fix. Common repairs stay inside the app, with explicit approval where they change your machine.

fix-wedged-ollama

Detects Ollama hung on a previous request, resets the runner, and reschedules the pending turn.

diagnose-error-rate

Walks the journal, classifies failures by tool and provider, and surfaces the dominant root cause.

diagnose-slow-local-llm

Compares observed latency against the model's expected envelope and recommends an action.

swap-undersized-local-model

If the model can't keep up, proposes (and with approval, performs) a swap to a better-sized local model.

Guided setup Repairs common OAuth issues Patches firewalls Maintains local AI/browser capabilities

Why every answer is grounded.