Technical White Paper

Abstract

A source-grounded operating layer over messages, calendars, people, tasks, and agents.

SwarmMarshal is not primarily an email client with a chatbot attached. It is a local-first knowledge workbench that turns the user's communication history into an evidence layer for search, synthesis, drafting, timelines, legal review, and supervised automation.

The application runs as a .NET MAUI Blazor Hybrid desktop app for Windows and Apple Silicon Mac. It keeps provider data in local SQLite stores, enriches messages through a gated AI pipeline, builds source-backed context packs, and exposes those packs to the in-app assistant, supervised helper agents, user workflows, and external agents through MCP.

The critical engineering constraint is provenance. A model can write a summary, but durable knowledge must know where it came from. A route advisor can recommend a cheaper model, but the route must have proof for the function it will perform. A helper can draft or propose, but sends, deletes, external calls, and broad automation stay behind explicit user control.

Architecture

The system is organized around four planes.

SwarmMarshal separates ingestion, interpretation, agent execution, and model intelligence so each subsystem can fail, recover, and be audited independently.

1. Capture

Email, chats, calendar records, contacts, tasks, notes, attachments, and agent transcripts land in local stores.

2. Enrich

The message pipeline classifies risk, routing, commitments, facts, observations, and knowledge-worthiness.

3. Ground

Context packs blend source-backed facts, semantic evidence, timelines, graph neighborhoods, and citations.

4. Act

The assistant, helper agents, MCP clients, and inbox workflows consume the same evidence layer under policy.

Desktop-first runtime

SwarmMarshal runs on the user's machine. Provider sync, search, local storage, LLM routing, tool approval, and diagnostics are not remote SaaS-only services.

Per-account stores

Messages are kept in account-scoped SQLite databases so large mailboxes remain operable and provider-specific sync state can be repaired without corrupting unrelated accounts.

Auditable operations

Agent turns, tool calls, route choices, model usage, proposals, approvals, and background work are journaled so failures can be explained instead of guessed at.

Data plane

Communications become queryable local evidence.

The inbox is the largest knowledge source, but SwarmMarshal treats every communication surface as source material for the same evidence fabric.

Input surface	How it is used	Agent-facing value
Email	Gmail, Microsoft Graph, and IMAP sync into local account stores with message bodies, folders, provider metadata, attachments, and read/flag state.	Search, triage, reply drafting, timelines, commitments, receipts, disputes, and source citations.
Chats	Apple Messages on macOS plus channel adapters such as Slack, Telegram, and Discord are normalized as communication sources without flooding email-only views.	Cross-channel history, relationship context, and "what happened where?" retrieval.
Calendar and tasks	Calendar events, invite state, deadlines, reminders, and task records are linked back to messages and contacts where possible.	Meeting prep, conflict detection, promise tracking, follow-up extraction, and daily briefs.
Contacts and people	Provider contacts, observed senders, phone/email identities, and relationship signals converge into durable local contact records.	Person context, sender teaching, VIP routing, history summaries, and safer drafting.
Agent work	Agent transcripts, proposals, artifacts, approvals, and run journals become inspectable records.	Replayable decisions, self-improvement signals, and tool-use audit trails.

Message enrichment pipeline

Message enrichment is the load-bearing pass that turns raw communication into structured evidence. The pipeline separates cheap deterministic work from expensive model work. Sender policy checks, spam prechecks, provider header signals, and sync metadata can run immediately after ingest. The heavier enrichment pass runs in background workers, can be delegated to same-owner peers, and is guarded by route proof so the backlog does not quietly move to an untested or unexpectedly metered model.

Threat and routing

Authentication verdicts, provider spam headers, sender history, links, body text, and thread context combine into spam, phishing, scam, newsletter, and routing decisions.

Facts and commitments

The model extracts commitments, dates, entities, relations, life facts, memory candidates, and observations only when the source text can ground them.

Validation and repair

Structured output is treated as a draft. Deterministic validators check contract shape, enum legality, exact quote grounding, source coverage, and knowledge gates before persistence.

Assistant runtime

Profiles, tools, streams, permissions, and durable transcripts.

SwarmMarshal uses a unified tool-use engine rather than a set of hand-coded specialist loops. A "helper" is a profile with a role, model route, enabled tools, limits, and an activity log.

Resolve the profile The current surface and selected helper determine the system prompt, function type, budget, model class, and allowed tool catalog.
Build source-grounded context Context injectors add conversation state, artifacts, life context, source packs, and route-specific metadata before the model call.
Stream native tool use Provider-specific adapters normalize OpenAI-style, Anthropic, Gemini, Ollama, subscription CLI, and compatible APIs into one event stream.
Gate side effects Read-only tools can run silently; sends, destructive actions, external calls, writes, and elevated shell work require explicit approval.
Persist the turn Text, tool calls, tool results, approvals, errors, usage, and routing decisions are stored so the user can audit what happened.

Primitive tools

HTTP, shell, filesystem, code execution, skill execution, catalog search, and domain tools compose into agent behaviors. File tools are path-clamped to a sandbox root.

Read-only by default where possible Approval broker Tool schema inventory

Supervised helpers

Employees can research, summarize, draft, monitor matters, prepare weekly briefs, and propose inbox rules. They do not need permission to think; they do need permission to act.

Visible activity Role templates Draft-first automation

Generative tooling

Agents author deterministic tools; the model becomes a compiler, not a dependency.

The most recent addition to the runtime inverts the usual agent economics. Rather than spending a model call on every repetition of a job, the agent writes a small, typed C# tool once — and from then on the job runs as ordinary deterministic code.

The design principle is type the data, code the behavior. Structure — fields, schemas, records — is declared so the system can leverage it. Behavior — fetching, transforming, formatting — is authored as sandboxed C# rather than accreted as typed product features, which keeps the behavior space unbounded without growing an inner platform.

Authored tools carry a declared contract: named, typed parameters, a return description, and a sample input. They compile through Roslyn inside a shared sandbox with a safety gate — raw HTTP clients and filesystem access are rejected at compile time, and all web egress flows through a single instrumented channel. Compilations are content-hash cached; execution is time-bounded. Every invocation stamps a run record (time, status, output) onto the tool, and a workbench surface exposes the code, the contract, and the last run for inspection, hand-testing, or edit.

The payoff is a zero-LLM fast path. When an agent registers its bulk-refresh tool, repeated refreshes execute the compiled tool directly — a portfolio applet that previously spent roughly two model calls per symbol per refresh now spends none. Failures self-heal: when a tool errors, the system falls back to the model to diagnose and re-author it.

The same substrate serves three surfaces today: tools inside Vibes applets, per-field display formatting (authored snippets that return structured display values, never raw HTML), and the Today dashboard's generated panels — all sharing one sandbox, one safety gate, and one run-record contract.

Workers & the device swarm

An agent workforce as operating-system processes, not a hosted gateway.

SwarmMarshal treats "hire an AI worker" as a systems problem: process isolation, supervised lifecycles, explicit provisioning, and a same-owner device swarm that shares work and models without ever exposing a public endpoint.

A worker is a full profile of its own — data root, email account, role configuration — living under the hiring profile and executed by a dedicated child host process with a curated service set. The supervisor tracks a per-worker heartbeat, restarts crashed workers with backoff, adopts orphans after an abnormal exit, staggers autostart, and stops every worker when the app closes. One worker wedging cannot take down its siblings or the host.

Provisioning is conversational but gated: the assistant stages the hire — profile, account (connection-tested first), role — and activation waits on an explicit human approval task. Credentials never travel through approval payloads, and secrets pasted into chat are registered with a redaction service before validation so they cannot persist in transcripts or logs. Workers are bound to the device that hired them and never replicate to peers.

Paired same-owner devices form the swarm: signed LAN envelopes, journaled sync with per-origin high-water marks, last-writer-wins conflict resolution, snapshot bootstrap for fresh peers, and automatic gap healing for machines that slept through compaction. On top of that fabric sit the distribution services: peers advertise which local models they can serve, a remote-peer LLM provider forwards completions to the machine with the hardware (a PC using a Mac's Ollama install — or a worker asking its boss to broker a cloud model without holding the key), and a load-scored work allocator hands scheduled tasks, research jobs, and background work to the least-busy node.

Duplicate work is prevented economically as well as logically: short-TTL advisory claims ensure only one peer pays to enrich a given message, results replicate through sync, and cloud calls are deduplicated rather than forwarded peer-to-peer. There is deliberately no LLM heartbeat — the always-on layer is deterministic schedulers and sync, and a model is invoked only when there is real work.

Model Scout

Model choice is an evidence pipeline, not a preference dropdown.

Model Scout studies cloud models, subscription CLI routes, installed local runtimes, hardware fit, public benchmark bundles, and function-specific golden prompt suites. It recommends only routes that have enough proof for the function they will run.

General model rankings are useful only as priors. SwarmMarshal stores capability, cost, context, latency, and value metadata for candidate models, then tests important functions against task-specific suites. The message pipeline is the strictest example: a model can be famous, cheap, or locally installed and still be rejected if it fails threat decisions, routing, grounded facts, observations, parse stability, or knowledge-worthiness cases.

The "golden prompts" are synthetic, protected calibration cases tied to a prompt hash and a suite hash. The same suite can be run against OpenAI, Anthropic, Gemini, DeepSeek, subscription CLI providers, and local Ollama models. Results are reduced to aggregate metadata such as weighted score, pass rate, critical failures, parse failures, latency, hardware bucket, and required runtime settings. Raw prompts, raw responses, headers, messages, facts, embeddings, and personal data are not published.

Model Scout lifecycle

Discover candidates Candidate lists come from public benchmark rows, external model catalogs, subscription CLI providers, installed Ollama inventory, and user-managed candidate catalogs.
Fingerprint the runtime For local models, Scout records installed model names, Ollama version, hardware bucket, RAM, VRAM, Apple Silicon status, and effective model memory.
Probe compatibility Local candidates must prove they can serve the expected format, JSON contract, non-streaming settings, and function-specific execution profile.
Run the golden suite Passing candidates are replayed across the function's calibration prompts. The scoring system tracks correctness, grounding, parse failures, threat misses, and latency.
Write an allow or deny row Results are stored by provider, model, function type, prompt hash, and hardware key so stale proof cannot leak across prompt or machine changes.
Apply through routing policy Model Scout updates function preferences and route advisors. Runtime calls consume those preferences; user-facing work does not benchmark inline.

Function lane	Promotion evidence	Failure behavior
Message pipeline	Prompt SHA, suite SHA, score, pass rate, zero critical failures, parse stability, hardware key, and latency.	Route guard rejects unproven routes, chooses a proven free fallback when available, and pauses before surprise metered spend.
Agent chat	Published agent-chat benchmark rows, installed local match, subscription availability, health, and quality/cost tier.	Subscription-backed routes can win for user-waiting calls; local models are retained as fallback when benchmark-backed and healthy.
Embeddings	Embedding function rows, model class guardrails, local runtime inventory, and vector-search quality gates.	Vision-only or text-only models are blocked from the wrong lane rather than silently becoming defaults.
High-stakes synthesis	Frontier route selection, context quality, source citation availability, and function preference policy.	Drafts surface evidence and uncertainty; consequential sends or external actions remain approval-gated.

Context engine

The model receives an answer pack, not an unbounded mailbox.

SwarmMarshal treats context as a product subsystem. It retrieves, ranks, compresses, redacts, and source-binds evidence before the model is asked to synthesize an answer.

Hybrid retrieval

Exact local search catches identifiers, order numbers, dates, and names. Semantic search catches intent when the user remembers the meaning but not the words.

Graph neighborhoods

People, organizations, projects, topics, promises, decisions, and relationships are pulled as graph context with supporting source references.

Timeline and narrative

Timelines, matter summaries, meeting prep, legal packets, and dispute drafts are constructed from scoped source sets with claims mapped to evidence.

What a context pack carries

Element	Purpose	Why agentic programmers care
Purpose and audience	Defines whether the pack is for the assistant, an internal helper, a peer device, or an external MCP tool.	Prevents one giant prompt path from leaking internal facts to the wrong consumer.
Items	Ranked facts, observations, graph summaries, search hits, timelines, and candidate evidence.	Keeps context compression deterministic and inspectable before synthesis.
Sources	Message, thread, attachment, contact, calendar, task, note, or artifact references with titles, excerpts, timestamps, and identifiers.	Lets the final answer cite exact records and lets the UI reopen the evidence.
Policy	Privacy class, redaction level, sharing scope, and side-effect permissions.	Turns "the prompt should be careful" into a data contract the caller can enforce.
Token budget	Estimated size and compression pressure before the model call.	Makes long-history questions possible without dumping a mailbox into context.

MCP and composability

SwarmMarshal is both an agent runtime and an evidence server for other agents.

The built-in MCP server gives Codex, Claude Code, desktop agents, and custom agent systems a typed way to query the same local evidence layer. The app also consumes approved MCP connectors as tools.

Expose source-grounded data

External agents can search messages, fetch thread context, inspect contacts, query knowledge graph facts, draft replies, and request action tools through a local stdio MCP server.

Search tools Context tools Action tools Redaction levels

Consume external tools

Approved MCP servers appear in the same tool inventory as built-ins. The permission broker applies the same read-only and side-effect gates regardless of where a tool came from.

GitHub Filesystem Browser Custom tools

Trust model

Prompt discipline is necessary, but data-layer guardrails do the hard work.

SwarmMarshal assumes models are probabilistic and tools are consequential. The architecture therefore puts durable checks around knowledge, model routing, sync, peer cooperation, and user-visible actions.

Source references travel with facts

Facts, memories, observations, and summaries are not free-floating prose. They preserve enough source metadata to reopen the underlying message or record.

Knowledge promotion is gated

A single inbound message can be evidence, but durable memory requires grounding, trust, corroboration, or user review. Ungrounded quotes are dropped.

Actions are permissioned

Agents may research and draft freely inside their allowed scope. Sending, deleting, moving broadly, external calls, and local writes cross approval gates.

Model routes fail closed

Important functions prefer calibrated routes. If no proven route exists, SwarmMarshal warns or pauses instead of silently choosing a convenient paid or low-quality fallback.

Peer cooperation is explicit

Same-owner devices can share work through sync, leases, and RPC. Message-pipeline delegation sends identifiers to a peer that already has the local data, not a second raw-mail transport.

Local operations are inspectable

Database repair, backup, telemetry retention, sync diagnostics, LLM call history, and debug logs are user-visible enough to support real troubleshooting.

Capability map

What the technology enables.

The user-facing product is deliberately broader than inbox automation. The inbox is the trust surface; the source-grounded assistant is the workbench.

Capability	Feature behavior	Underlying technology
Ask your history	Ask plain-English questions and get answers cited back to messages, threads, files, tasks, or calendar records.	Hybrid search, context packs, semantic memory, knowledge graph neighborhoods, and source refs.
Legal and evidence mode	Build timelines, claim tables, contradiction lists, evidence packets, and dispute drafts with uncertainty called out.	Scoped source sets, citation-preserving synthesis, chronological normalization, and strict source coverage checks.
Unified inbox	Read, search, teach, classify, draft, and organize across email accounts and chat channels without losing account-native behavior.	Per-account stores, provider adapters, local search, sender policy, message enrichment, and safe rendering.
Today dashboard	Show what changed overnight, who is waiting, calendar pressure, drafts to review, and source-grounded facts that need attention.	Background sync, overnight activity aggregation, assistant context, task promotion, and agent journals.
Supervised helpers	Run role-based agents that research, draft, monitor, and propose repeatable workflows without silently acting.	Profiles, tool inventories, context injectors, tool-use streaming, permission broker, SpendGuard, and activity logs.
Model-aware routing	Choose local, subscription, or cloud models by function instead of using one global model for every task.	Model Scout, function preferences, route advisors, provider health, benchmark bundles, and budget policy.
External agent access	Let Codex, Claude Code, or a custom agent query and act across the user's local communication graph.	Local MCP server, typed tool catalog, redaction policy, source-grounded context tools, and approval gates.
Vibes and mini-apps	Describe a tracker or lightweight workflow and get a local database-backed applet with forms and records.	User app schemas, local data surfaces, source refs, agent-authored artifacts, and embedded Blazor pages.
Generative tooling	Agents author small deterministic C# tools — API fetchers, refreshers, formatters — that run repeatedly without further model calls.	Roslyn-compiled sandbox, compile-time safety gate, single instrumented HTTP channel, typed tool contracts, run records, and a tool workbench.
AI workers & device swarm	Hire role-based workers that run as supervised processes on owned machines, with work and models shared across paired devices.	Per-worker child processes with heartbeats and restart supervision, approval-gated provisioning, machine binding, signed LAN sync, peer LLM forwarding, and load-scored work allocation.

SwarmMarshal: local-first communication intelligence for agentic programmers.

A source-grounded operating layer over messages, calendars, people, tasks, and agents.

The system is organized around four planes.

Desktop-first runtime

Per-account stores

Auditable operations

Communications become queryable local evidence.

Message enrichment pipeline

Threat and routing

Facts and commitments

Validation and repair

Profiles, tools, streams, permissions, and durable transcripts.

Primitive tools

Supervised helpers

Agents author deterministic tools; the model becomes a compiler, not a dependency.

An agent workforce as operating-system processes, not a hosted gateway.

Model choice is an evidence pipeline, not a preference dropdown.

Model Scout lifecycle

The model receives an answer pack, not an unbounded mailbox.

Hybrid retrieval

Graph neighborhoods

Timeline and narrative

What a context pack carries

SwarmMarshal is both an agent runtime and an evidence server for other agents.

Expose source-grounded data

Consume external tools

Prompt discipline is necessary, but data-layer guardrails do the hard work.

What the technology enables.

Use SwarmMarshal as the evidence layer your agents were missing.