Network-Level Isolation for AI Agents: Beyond Containers

Application-level isolation constrains what the agent can do on its host. Network-level isolation constrains what the agent can reach beyond it.

Host isolation gets most of the sandboxing attention. Network isolation often gets none.

What "sandboxed" usually means today

When people talk about agent sandboxing, they typically mean some combination of:

Containers or VMs to isolate the runtime environment
Filesystem restrictions to limit what the agent can read and write
Process-level capabilities to restrict system calls
Execution timeouts to kill runaway agents

These are important controls, but they all operate at the host level. Once the agent has a network socket, it's constrained only by whatever network-level controls happen to exist, In most environments, those controls are coarse at best.

Why traditional network controls fall short

The usual suspects don't fit well here.

Firewalls operate on IP addresses and ports. They weren't designed for per-agent granularity. You can restrict what subnet an agent can reach, but you can't easily say "this agent can access the billing API but not the user database" when both are on the same host.

VPNs solve the wrong problem. They give authenticated users broad network access and then try to restrict it. The model is "connect first, then constrain." For autonomous agents that might misbehave, you want the opposite.

Service mesh (Istio, Linkerd) was built for microservice-to-microservice traffic. It provides mTLS and traffic policy between services, but it has no concept of agent identity, no per-agent policy, and no way to make services invisible to unauthorized callers. Services are discoverable by default. Policy restricts access after discovery.

Tailscale-style mesh networking is closer, and Tailscale specifically has invested heavily in ACLs and tag-based policies. WireGuard-based overlays provide network-level isolation from the public internet, but the underlying model is still "join the network, then restrict." Every device on a tailnet can reach every other device until ACLs say otherwise, and the default posture is open within the mesh. For agent containment, you want the inverse: nothing reachable until policy explicitly creates a path.

The missing primitive: dark by default

OpenZiti takes a fundamentally different approach. Services on an OpenZiti network have zero listening ports. They're not on the public internet, they're not on the internal network, and they don't respond to port scans. From the perspective of any unauthorized caller, they don't exist.

This isn't a firewall rule blocking traffic. It's the absence of any network path until the overlay explicitly creates one. We call this "dark by default," and it changes the containment model for agents in a meaningful way.

When an agent runs on an OpenZiti overlay:

No discovery - the agent cannot scan for or discover services it isn't authorized to access. There's nothing to scan. No ports, no endpoints, no DNS records to enumerate.
No lateral movement - a compromised agent can't pivot to adjacent services. The overlay only creates paths for explicitly authorized connections. Everything else is void.
Per-agent identity - every agent gets its own cryptographic identity that is individually verifiable, auditable, and revocable.
Service-level microsegmentation - policy is defined at the service level, not the network level. Agent A can access Service X but not Service Y, even if both services run on the same host. This is finer-grained than anything IP-based controls can provide.

What we're building with this

At NetFoundry, we currently use OpenZiti configurations in conjunction with containers for agent sandboxing. The container handles runtime isolation and is configured with no default network egress. OpenZiti is the agent's only path off the host, and it governs every connection by identity. Together, they provide both dimensions of containment.

Each agent gets its own OpenZiti identity at enrollment time. That identity determines exactly which services the agent can reach - which LLM endpoints, which MCP tool servers, which other agents, which data sources. Everything else is dark. The agent doesn't know those other services exist, because from its network perspective, they genuinely don't.

Agora, our agent-to-agent networking layer built on OpenZiti, extends this further with workgroup-scoped visibility. Agents are organized into workgroups, and agents outside a workgroup can't discover, see, or interact with agents inside it. This isn't a filtered view - the agents, their capabilities, and their existence aren't observable from outside the workgroup.

Direct Agora support for sandboxing primitives is planned - purpose-built APIs for defining isolation boundaries, containment policies, and automated response to misbehavior. Today, you configure the same result using OpenZiti's service policies and Agora's workgroup model. The intent is the same; the ergonomics will improve.

Scenario: what happens when an agent misbehaves

Say you have a data analysis agent that's authorized to access a reporting database and an LLM endpoint. It runs in a container with an OpenZiti identity scoped to exactly those two services.

The agent gets compromised - prompt injection, a dependency vulnerability, whatever the cause. The attacker now controls the agent process.

On a traditional network, the attacker can start probing. Port scan the local subnet. Look for the Kubernetes API server. Try to reach other databases. Check if there's an internal admin panel somewhere. The container limits what they can do locally, but the network is wide open for discovery.

On an OpenZiti overlay, there's nothing to probe. The agent's identity grants access to two services. Everything else is dark. The attacker can't scan for services because there are no listening ports to find. They can't resolve DNS for internal services because the overlay doesn't expose them. They can't reach the Kubernetes API, other databases, or admin panels. On the OpenZiti overlay, those destinations don't exist - no listening ports, no service identity advertised, no path to construct.

The blast radius is two services instead of the entire network.

And revocation is instant. Revoke the agent's identity and it immediately loses the ability to authenticate to anything. No key rotation race. No hoping nobody copied the API key. The cryptographic identity is gone, and with it, all network access.

Identity controls what agents can reach - including models and tools

Network isolation is the foundation, but we layer additional controls on top through LLM Gateway and MCP Gateway.

When an agent connects to an LLM through LLM Gateway, it authenticates with its OpenZiti identity. That identity determines which models the agent can access and its token budget. A data analysis agent might be restricted to a specific model with a daily token budget. A customer-facing agent might get access to a different model. The identity is the policy anchor for model access and cost governance.

The same model applies to MCP tools. When an agent connects to MCP Gateway, its identity determines which tool servers are visible and which specific tools within those servers are available. Tools the agent isn't authorized for don't appear in the registry at all - they're structurally absent, not checked at runtime.

So a sandboxed agent's identity controls three things simultaneously: what network paths are available, what models it can use, and what tools it can invoke. One identity, three enforcement surfaces.

This is a network problem

The industry conversation around agent sandboxing is heavily weighted toward runtime isolation. That's the right starting point, but it's not the whole picture.

An agent that is perfectly sandboxed at the process level but has unrestricted network access is not contained. It's isolated from the host and exposed to the network. For autonomous agents that make their own decisions about what to connect to, that's a significant gap.

Network-level isolation closes that gap. It's the difference between a locked door and a wall. Firewall rules are locked doors; someone sufficiently motivated can find ways around them. Dark-by-default overlay networking is a wall; there's nothing on the other side to find.

Try it

The open source components are all on GitHub:

OpenZiti - the zero-trust overlay network
Agora - agent-to-agent networking with workgroup isolation
LLM Gateway - governed LLM access with identity-based controls
MCP Gateway - governed MCP tool access with structural permission filtering

All Apache 2.0.

If you're building agent infrastructure and the containment question is on your mind, we at NetFoundry are running an AI Accelerator design partner program with teams working through exactly these problems. We'd like to hear about your use case.

Containing the Blast Radius: Network-Level Isolation for AI Agents

What "sandboxed" usually means today

Why traditional network controls fall short

The missing primitive: dark by default

What we're building with this

Scenario: what happens when an agent misbehaves

Identity controls what agents can reach - including models and tools

This is a network problem

Try it

Comments

Zero Trust for AI Infrastructure

The Gap Between "Agents Can Talk" and "Agents Should Talk"

More from this blog

Secure Your Kubernetes Workloads with Ephemeral Zero-Trust Identities

Bake It In: Building Agent Runtimes on Zero Trust from Day One

Dark Model Endpoints: Private LLM Meshes for Regulated Industries

You Can't Govern What You Can't See

The Gap Between "Agents Can Talk" and "Agents Should Talk"

Command Palette

What "sandboxed" usually means today

Why traditional network controls fall short

The missing primitive: dark by default

What we're building with this

Scenario: what happens when an agent misbehaves

Identity controls what agents can reach - including models and tools

This is a network problem

Try it

Comments

Zero Trust for AI Infrastructure

The Gap Between "Agents Can Talk" and "Agents Should Talk"

More from this blog