Skip to main content

Command Palette

Search for a command to run...

Bake It In: Building Agent Runtimes on Zero Trust from Day One

What harness builders save by avoiding a retrofit

Updated
6 min read
Bake It In: Building Agent Runtimes on Zero Trust from Day One

If you're building an agent harness, whether it's a product you sell or an internal runtime for your organization, architectural decisions you make early can be expensive to change later.

Most teams focus on the application layer first, and then bolt on support for agent authentication, key management, proxies for model and tool access, basic auditing, etc., and punt in-depth security to a separate infrastructure team.

This helps gain momentum, but often leads to headaches as usage scales.

The bolt-on problem

At NetFoundry, the anti-pattern we see is that a team builds a capable agent harness, ships it internally or to customers, and then spends the next quarter retrofitting security controls that should have been foundational.

The recurring symptoms include:

  • API keys shared across agents with no per-identity attribution

  • MCP servers that are reachable by anything on the network that knows the address or via easily discovered credentials

  • No way to answer "which agent accessed which model, when, and under what policy?"

  • A blast radius of "everything on the network" if any single agent is compromised

All of these are predictable when security is the last layer added.

Two audiences, same problem

We see two types of teams building harnesses, and they run into the same architectural wall from different directions.

Partners building harness products. Your product is the agent runtime. Your customers deploy it, run their agents on it, and expect it to be secure. If you're building the networking, identity, and governance layer yourself, you're maintaining infrastructure that isn't your core value. If you're skipping it, your customers inherit the security gaps, and savvy prospects will spot the gap and walk.

Enterprise platform teams building internal runtimes. You're building a harness tailored to your organization that directly incorporates your business rules, data access patterns, and compliance requirements. You need to enforce which teams can access which models, which agents can use which tools, and where data flows. Doing this entirely at the application layer means every policy is a code change.

The building blocks

We've built three products that compose into the infrastructure layer for agent harnesses. Each works standalone, but together, they share a single identity model - one cryptographic identity per agent that governs connectivity, model access, and tool access.

1. Agora SDK - connectivity and governance for the harness itself

The Agora SDK is how your harness gets zero-trust networking. It provides four construction paths depending on how your runtime is structured:

  • Daemon mode - standalone background service

  • Embedded mode - in-process, for runtimes that want full control

  • App/Run mode - handles scaffolding for A2A governance (catalogs, discovery, workgroups)

  • Standalone mode - for services that already own their lifecycle

A minimal agent is roughly 20 lines of Go. The SDK handles identity enrollment, heartbeating, tunnel lifecycle, and clean shutdown. A2A operations - catalog publish, session propose/accept - are SDK calls.

For harness builders, this means your agents get cryptographic identity (X.509, not API keys), mutual authentication on every connection, end-to-end encryption (libsodium for data path, mTLS for auth), and dark-by-default connectivity (no open listening ports, invisible services) without your harness having to implement any of that.

2. LLM Gateway - governed model access

Agents in your harness need to call LLMs. The LLM Gateway provides:

  • Identity-based virtual API keys - each agent authenticates with its Ziti identity, not a shared API key. Per-identity budgets, model restrictions, and audit trails flow from the identity.

  • Semantic routing - a 3-layer cascade (heuristics, embeddings, LLM classifier) routes requests to the right model. Reasoning tasks go to one model, coding to another, fast lookups to a third.

  • Multi-provider support - OpenAI, Anthropic, Azure, Bedrock, Vertex AI, Ollama, any OpenAI-compatible endpoint.

  • Cost governance - per-identity budgets with enforcement. Your finance team can see exactly which agent, on which team, spent what.

3. MCP Gateway - governed tool access

Agents need tools. The MCP Gateway provides:

  • Aggregation - multiple MCP server backends behind a single gateway.

  • Permission filtering - structural enforcement, not runtime checks. Filtered tools don't exist in the registry. An agent that isn't authorized for a tool can't discover it.

  • Per-client isolation - each client gets a dedicated session. No cross-contamination.

  • Namespacing - tools prefixed by backend ID to avoid collisions across backends.

All three share the same Ziti identity. The identity that authenticates an agent to the network also controls which LLMs it can access and which MCP tools it can use.

Bolted on vs. baked in

Bolted on: Your harness manages its own API keys for LLM providers. You build a permission system for tool access. You add TLS between components. You write audit logging code. You build a dashboard to track costs. You configure firewall rules. Each is a separate system you maintain, generally with disparate identity models.

Baked in: Your harness embeds Agora for connectivity. Agents get cryptographic identity from the network. They access LLMs through the LLM Gateway using that identity. They access tools through the MCP Gateway using the same identity. Cost controls, access policies, audit trails, and encryption are properties of the infrastructure, not features you maintain.

For a harness product, the second approach means a customer's security team asks "how are agent communications secured?" and the answer comes from the infrastructure - cryptographic identity per agent, mutual authentication, end-to-end encryption, zero listening ports, full audit trail - not from documentation about how to configure TLS correctly.

The second approach is less code, fewer systems to operate, and a security posture that doesn't degrade as the harness grows.

Where to start

The open source components are all Apache 2.0:

For teams building harnesses and wanting to work closely with us on architecture and integration, we at NetFoundry are running an AI Accelerator design partner program. Small cohort, direct access to the engineering team, and input into the roadmap. If you're building an agent runtime and the problems in this post sound familiar, we'd like to hear about it.

Zero Trust for AI Infrastructure

Part 6 of 6

A seven-part series on what AI SecOps actually requires at the network layer. Each post takes one piece of the problem - the network blind spot in today's AI gateways, agent isolation, cross-org collaboration, correlated observability, private model routing, building harnesses on zero trust, and the open-source-to-commercial upgrade path - and shows what changes when zero trust is foundational instead of bolted on.

Start from the beginning

AI SecOps: Why Your AI Infrastructure Has a Network-Shaped Blind Spot

Three things AI infrastructure needs that no AI gateway provides