Email, chats, calendar records, contacts, tasks, notes, attachments, and agent transcripts land in local stores.
SwarmMarshal: local-first communication intelligence for agentic programmers.
SwarmMarshal is a desktop communication intelligence platform built around a strict thesis: agents are useful only when they can read the real sources, cite the evidence, choose the right model for the job, and ask before crossing consequential boundaries.
A source-grounded operating layer over messages, calendars, people, tasks, and agents.
SwarmMarshal is not primarily an email client with a chatbot attached. It is a local-first knowledge workbench that turns the user's communication history into an evidence layer for search, synthesis, drafting, timelines, legal review, and supervised automation.
The application runs as a .NET MAUI Blazor Hybrid desktop app for Windows and Apple Silicon Mac. It keeps provider data in local SQLite stores, enriches messages through a gated AI pipeline, builds source-backed context packs, and exposes those packs to the in-app assistant, supervised helper agents, user workflows, and external agents through MCP.
The critical engineering constraint is provenance. A model can write a summary, but durable knowledge must know where it came from. A route advisor can recommend a cheaper model, but the route must have proof for the function it will perform. A helper can draft or propose, but sends, deletes, external calls, and broad automation stay behind explicit user control.
The system is organized around four planes.
SwarmMarshal separates ingestion, interpretation, agent execution, and model intelligence so each subsystem can fail, recover, and be audited independently.
The message pipeline classifies risk, routing, commitments, facts, observations, and knowledge-worthiness.
Context packs blend source-backed facts, semantic evidence, timelines, graph neighborhoods, and citations.
The assistant, helper agents, MCP clients, and inbox workflows consume the same evidence layer under policy.
Desktop-first runtime
SwarmMarshal runs on the user's machine. Provider sync, search, local storage, LLM routing, tool approval, and diagnostics are not remote SaaS-only services.
Per-account stores
Messages are kept in account-scoped SQLite databases so large mailboxes remain operable and provider-specific sync state can be repaired without corrupting unrelated accounts.
Auditable operations
Agent turns, tool calls, route choices, model usage, proposals, approvals, and background work are journaled so failures can be explained instead of guessed at.
Communications become queryable local evidence.
The inbox is the largest knowledge source, but SwarmMarshal treats every communication surface as source material for the same evidence fabric.
| Input surface | How it is used | Agent-facing value |
|---|---|---|
| Gmail, Microsoft Graph, and IMAP sync into local account stores with message bodies, folders, provider metadata, attachments, and read/flag state. | Search, triage, reply drafting, timelines, commitments, receipts, disputes, and source citations. | |
| Chats | Apple Messages on macOS plus channel adapters such as Slack, Telegram, and Discord are normalized as communication sources without flooding email-only views. | Cross-channel history, relationship context, and "what happened where?" retrieval. |
| Calendar and tasks | Calendar events, invite state, deadlines, reminders, and task records are linked back to messages and contacts where possible. | Meeting prep, conflict detection, promise tracking, follow-up extraction, and daily briefs. |
| Contacts and people | Provider contacts, observed senders, phone/email identities, and relationship signals converge into durable local contact records. | Person context, sender teaching, VIP routing, history summaries, and safer drafting. |
| Agent work | Agent transcripts, proposals, artifacts, approvals, and run journals become inspectable records. | Replayable decisions, self-improvement signals, and tool-use audit trails. |
Message enrichment pipeline
Message enrichment is the load-bearing pass that turns raw communication into structured evidence. The pipeline separates cheap deterministic work from expensive model work. Sender policy checks, spam prechecks, provider header signals, and sync metadata can run immediately after ingest. The heavier enrichment pass runs in background workers, can be delegated to same-owner peers, and is guarded by route proof so the backlog does not quietly move to an untested or unexpectedly metered model.
Threat and routing
Authentication verdicts, provider spam headers, sender history, links, body text, and thread context combine into spam, phishing, scam, newsletter, and routing decisions.
Facts and commitments
The model extracts commitments, dates, entities, relations, life facts, memory candidates, and observations only when the source text can ground them.
Validation and repair
Structured output is treated as a draft. Deterministic validators check contract shape, enum legality, exact quote grounding, source coverage, and knowledge gates before persistence.
Profiles, tools, streams, permissions, and durable transcripts.
SwarmMarshal uses a unified tool-use engine rather than a set of hand-coded specialist loops. A "helper" is a profile with a role, model route, enabled tools, limits, and an activity log.
-
Resolve the profile The current surface and selected helper determine the system prompt, function type, budget, model class, and allowed tool catalog.
-
Build source-grounded context Context injectors add conversation state, artifacts, life context, source packs, and route-specific metadata before the model call.
-
Stream native tool use Provider-specific adapters normalize OpenAI-style, Anthropic, Gemini, Ollama, subscription CLI, and compatible APIs into one event stream.
-
Gate side effects Read-only tools can run silently; sends, destructive actions, external calls, writes, and elevated shell work require explicit approval.
-
Persist the turn Text, tool calls, tool results, approvals, errors, usage, and routing decisions are stored so the user can audit what happened.
Primitive tools
HTTP, shell, filesystem, code execution, skill execution, catalog search, and domain tools compose into agent behaviors. File tools are path-clamped to a sandbox root.
Supervised helpers
Employees can research, summarize, draft, monitor matters, prepare weekly briefs, and propose inbox rules. They do not need permission to think; they do need permission to act.
Agents author deterministic tools; the model becomes a compiler, not a dependency.
The most recent addition to the runtime inverts the usual agent economics. Rather than spending a model call on every repetition of a job, the agent writes a small, typed C# tool once — and from then on the job runs as ordinary deterministic code.
The design principle is type the data, code the behavior. Structure — fields, schemas, records — is declared so the system can leverage it. Behavior — fetching, transforming, formatting — is authored as sandboxed C# rather than accreted as typed product features, which keeps the behavior space unbounded without growing an inner platform.
Authored tools carry a declared contract: named, typed parameters, a return description, and a sample input. They compile through Roslyn inside a shared sandbox with a safety gate — raw HTTP clients and filesystem access are rejected at compile time, and all web egress flows through a single instrumented channel. Compilations are content-hash cached; execution is time-bounded. Every invocation stamps a run record (time, status, output) onto the tool, and a workbench surface exposes the code, the contract, and the last run for inspection, hand-testing, or edit.
The payoff is a zero-LLM fast path. When an agent registers its bulk-refresh tool, repeated refreshes execute the compiled tool directly — a portfolio applet that previously spent roughly two model calls per symbol per refresh now spends none. Failures self-heal: when a tool errors, the system falls back to the model to diagnose and re-author it.
The same substrate serves three surfaces today: tools inside Vibes applets, per-field display formatting (authored snippets that return structured display values, never raw HTML), and the Today dashboard's generated panels — all sharing one sandbox, one safety gate, and one run-record contract.
An agent workforce as operating-system processes, not a hosted gateway.
SwarmMarshal treats "hire an AI worker" as a systems problem: process isolation, supervised lifecycles, explicit provisioning, and a same-owner device swarm that shares work and models without ever exposing a public endpoint.
A worker is a full profile of its own — data root, email account, role configuration — living under the hiring profile and executed by a dedicated child host process with a curated service set. The supervisor tracks a per-worker heartbeat, restarts crashed workers with backoff, adopts orphans after an abnormal exit, staggers autostart, and stops every worker when the app closes. One worker wedging cannot take down its siblings or the host.
Provisioning is conversational but gated: the assistant stages the hire — profile, account (connection-tested first), role — and activation waits on an explicit human approval task. Credentials never travel through approval payloads, and secrets pasted into chat are registered with a redaction service before validation so they cannot persist in transcripts or logs. Workers are bound to the device that hired them and never replicate to peers.
Paired same-owner devices form the swarm: signed LAN envelopes, journaled sync with per-origin high-water marks, last-writer-wins conflict resolution, snapshot bootstrap for fresh peers, and automatic gap healing for machines that slept through compaction. On top of that fabric sit the distribution services: peers advertise which local models they can serve, a remote-peer LLM provider forwards completions to the machine with the hardware (a PC using a Mac's Ollama install — or a worker asking its boss to broker a cloud model without holding the key), and a load-scored work allocator hands scheduled tasks, research jobs, and background work to the least-busy node.
Duplicate work is prevented economically as well as logically: short-TTL advisory claims ensure only one peer pays to enrich a given message, results replicate through sync, and cloud calls are deduplicated rather than forwarded peer-to-peer. There is deliberately no LLM heartbeat — the always-on layer is deterministic schedulers and sync, and a model is invoked only when there is real work.
Model choice is an evidence pipeline, not a preference dropdown.
Model Scout studies cloud models, subscription CLI routes, installed local runtimes, hardware fit, public benchmark bundles, and function-specific golden prompt suites. It recommends only routes that have enough proof for the function they will run.
General model rankings are useful only as priors. SwarmMarshal stores capability, cost, context, latency, and value metadata for candidate models, then tests important functions against task-specific suites. The message pipeline is the strictest example: a model can be famous, cheap, or locally installed and still be rejected if it fails threat decisions, routing, grounded facts, observations, parse stability, or knowledge-worthiness cases.
The "golden prompts" are synthetic, protected calibration cases tied to a prompt hash and a suite hash. The same suite can be run against OpenAI, Anthropic, Gemini, DeepSeek, subscription CLI providers, and local Ollama models. Results are reduced to aggregate metadata such as weighted score, pass rate, critical failures, parse failures, latency, hardware bucket, and required runtime settings. Raw prompts, raw responses, headers, messages, facts, embeddings, and personal data are not published.
Model Scout lifecycle
-
Discover candidates Candidate lists come from public benchmark rows, external model catalogs, subscription CLI providers, installed Ollama inventory, and user-managed candidate catalogs.
-
Fingerprint the runtime For local models, Scout records installed model names, Ollama version, hardware bucket, RAM, VRAM, Apple Silicon status, and effective model memory.
-
Probe compatibility Local candidates must prove they can serve the expected format, JSON contract, non-streaming settings, and function-specific execution profile.
-
Run the golden suite Passing candidates are replayed across the function's calibration prompts. The scoring system tracks correctness, grounding, parse failures, threat misses, and latency.
-
Write an allow or deny row Results are stored by provider, model, function type, prompt hash, and hardware key so stale proof cannot leak across prompt or machine changes.
-
Apply through routing policy Model Scout updates function preferences and route advisors. Runtime calls consume those preferences; user-facing work does not benchmark inline.
| Function lane | Promotion evidence | Failure behavior |
|---|---|---|
| Message pipeline | Prompt SHA, suite SHA, score, pass rate, zero critical failures, parse stability, hardware key, and latency. | Route guard rejects unproven routes, chooses a proven free fallback when available, and pauses before surprise metered spend. |
| Agent chat | Published agent-chat benchmark rows, installed local match, subscription availability, health, and quality/cost tier. | Subscription-backed routes can win for user-waiting calls; local models are retained as fallback when benchmark-backed and healthy. |
| Embeddings | Embedding function rows, model class guardrails, local runtime inventory, and vector-search quality gates. | Vision-only or text-only models are blocked from the wrong lane rather than silently becoming defaults. |
| High-stakes synthesis | Frontier route selection, context quality, source citation availability, and function preference policy. | Drafts surface evidence and uncertainty; consequential sends or external actions remain approval-gated. |
The model receives an answer pack, not an unbounded mailbox.
SwarmMarshal treats context as a product subsystem. It retrieves, ranks, compresses, redacts, and source-binds evidence before the model is asked to synthesize an answer.
Hybrid retrieval
Exact local search catches identifiers, order numbers, dates, and names. Semantic search catches intent when the user remembers the meaning but not the words.
Graph neighborhoods
People, organizations, projects, topics, promises, decisions, and relationships are pulled as graph context with supporting source references.
Timeline and narrative
Timelines, matter summaries, meeting prep, legal packets, and dispute drafts are constructed from scoped source sets with claims mapped to evidence.
What a context pack carries
| Element | Purpose | Why agentic programmers care |
|---|---|---|
| Purpose and audience | Defines whether the pack is for the assistant, an internal helper, a peer device, or an external MCP tool. | Prevents one giant prompt path from leaking internal facts to the wrong consumer. |
| Items | Ranked facts, observations, graph summaries, search hits, timelines, and candidate evidence. | Keeps context compression deterministic and inspectable before synthesis. |
| Sources | Message, thread, attachment, contact, calendar, task, note, or artifact references with titles, excerpts, timestamps, and identifiers. | Lets the final answer cite exact records and lets the UI reopen the evidence. |
| Policy | Privacy class, redaction level, sharing scope, and side-effect permissions. | Turns "the prompt should be careful" into a data contract the caller can enforce. |
| Token budget | Estimated size and compression pressure before the model call. | Makes long-history questions possible without dumping a mailbox into context. |
SwarmMarshal is both an agent runtime and an evidence server for other agents.
The built-in MCP server gives Codex, Claude Code, desktop agents, and custom agent systems a typed way to query the same local evidence layer. The app also consumes approved MCP connectors as tools.
Expose source-grounded data
External agents can search messages, fetch thread context, inspect contacts, query knowledge graph facts, draft replies, and request action tools through a local stdio MCP server.
Consume external tools
Approved MCP servers appear in the same tool inventory as built-ins. The permission broker applies the same read-only and side-effect gates regardless of where a tool came from.
Prompt discipline is necessary, but data-layer guardrails do the hard work.
SwarmMarshal assumes models are probabilistic and tools are consequential. The architecture therefore puts durable checks around knowledge, model routing, sync, peer cooperation, and user-visible actions.
Facts, memories, observations, and summaries are not free-floating prose. They preserve enough source metadata to reopen the underlying message or record.
A single inbound message can be evidence, but durable memory requires grounding, trust, corroboration, or user review. Ungrounded quotes are dropped.
Agents may research and draft freely inside their allowed scope. Sending, deleting, moving broadly, external calls, and local writes cross approval gates.
Important functions prefer calibrated routes. If no proven route exists, SwarmMarshal warns or pauses instead of silently choosing a convenient paid or low-quality fallback.
Same-owner devices can share work through sync, leases, and RPC. Message-pipeline delegation sends identifiers to a peer that already has the local data, not a second raw-mail transport.
Database repair, backup, telemetry retention, sync diagnostics, LLM call history, and debug logs are user-visible enough to support real troubleshooting.
What the technology enables.
The user-facing product is deliberately broader than inbox automation. The inbox is the trust surface; the source-grounded assistant is the workbench.
| Capability | Feature behavior | Underlying technology |
|---|---|---|
| Ask your history | Ask plain-English questions and get answers cited back to messages, threads, files, tasks, or calendar records. | Hybrid search, context packs, semantic memory, knowledge graph neighborhoods, and source refs. |
| Legal and evidence mode | Build timelines, claim tables, contradiction lists, evidence packets, and dispute drafts with uncertainty called out. | Scoped source sets, citation-preserving synthesis, chronological normalization, and strict source coverage checks. |
| Unified inbox | Read, search, teach, classify, draft, and organize across email accounts and chat channels without losing account-native behavior. | Per-account stores, provider adapters, local search, sender policy, message enrichment, and safe rendering. |
| Today dashboard | Show what changed overnight, who is waiting, calendar pressure, drafts to review, and source-grounded facts that need attention. | Background sync, overnight activity aggregation, assistant context, task promotion, and agent journals. |
| Supervised helpers | Run role-based agents that research, draft, monitor, and propose repeatable workflows without silently acting. | Profiles, tool inventories, context injectors, tool-use streaming, permission broker, SpendGuard, and activity logs. |
| Model-aware routing | Choose local, subscription, or cloud models by function instead of using one global model for every task. | Model Scout, function preferences, route advisors, provider health, benchmark bundles, and budget policy. |
| External agent access | Let Codex, Claude Code, or a custom agent query and act across the user's local communication graph. | Local MCP server, typed tool catalog, redaction policy, source-grounded context tools, and approval gates. |
| Vibes and mini-apps | Describe a tracker or lightweight workflow and get a local database-backed applet with forms and records. | User app schemas, local data surfaces, source refs, agent-authored artifacts, and embedded Blazor pages. |
| Generative tooling | Agents author small deterministic C# tools — API fetchers, refreshers, formatters — that run repeatedly without further model calls. | Roslyn-compiled sandbox, compile-time safety gate, single instrumented HTTP channel, typed tool contracts, run records, and a tool workbench. |
| AI workers & device swarm | Hire role-based workers that run as supervised processes on owned machines, with work and models shared across paired devices. | Per-worker child processes with heartbeats and restart supervision, approval-gated provisioning, machine binding, signed LAN sync, peer LLM forwarding, and load-scored work allocation. |
Use SwarmMarshal as the evidence layer your agents were missing.
Run the desktop app, connect your accounts, calibrate your models, then point your own agent at the local MCP server.