AI SecOps:The Network-Shaped Blind Spot in AI Infrastructure

If you're responsible for an enterprise AI platform, you've probably deployed an AI gateway by now. Something that proxies LLM requests, manages API keys, tracks costs, runs some guardrails on prompts. That's table stakes at this point.

But a question that keeps coming up for platform and security teams: what's happening at the network layer?

Your AI gateway sees the requests that flow through it, but it doesn't see what your agents can reach. It doesn't control which MCP servers a compromised agent could discover. It doesn't isolate agent A from agent B when agent A starts misbehaving. It doesn't enforce engagement terms between agents from different organizations. It doesn't even know those agents exist until they send a request.

This is a blind spot. It's the same blind spot the industry had with microservices before service mesh, and with remote access before zero trust started replacing VPNs. The application layer handles what it can see, but the network layer handles everything else.

The problem, concretely

At NetFoundry, we talk to platform teams that are enabling AI adoption across their organizations, and the pattern is consistent:

Developers adopt AI tools bottom-up. MCP servers, Claude Desktop, Cursor, custom agents connecting to LLMs. This happens fast, usually without central coordination.
API keys proliferate. Each developer, tool, and agent connection carries its own credential. OAuth tokens for MCP servers, API keys for LLM providers, database credentials for data tools. All stored in or accessible to the AI client.
No network-level visibility. The security team asks "what MCP servers are our developers connecting to?" and doesn't get a good answer. The platform team asks "which agents can reach which internal services?" and gets silence.
No blast radius containment. If an agent is compromised - or just buggy - there's nothing at the network layer preventing it from discovering and reaching every service on the network. Application-level sandboxing helps. It's necessary, but not sufficient.
Cross-org agent collaboration opens new attack surface. The moment agents from different organizations need to interact (a data provider sharing feeds with a client's analytics agent, for example), the security model gets complicated fast. Who governs that interaction? What terms apply? What happens when the relationship ends?

Every AI gateway on the market today operates above this layer. They secure the request. They don't secure the network.

What AI SecOps actually requires

We think AI SecOps - genuine security operations for AI infrastructure - needs to happen at three layers simultaneously:

1. Identity that's cryptographic, not shared secrets

API keys are shared secrets. They don't tell you who is using them, they can't be attributed to a specific agent instance, and revoking one affects everyone who has it.

AI workloads need cryptographic identity. X.509 certificates bound to each agent, each connection mutually authenticated, each interaction attributable to a specific identity. This is what OpenZiti has provided for years. We extended it to AI workloads because the problem is the same - you need to know exactly who is connecting, not just that someone has the right key.

When an agent connects to an LLM through our LLM Gateway, it authenticates with its Ziti identity. The virtual API key it uses is tied to that identity. Per-identity budgets, model restrictions, and audit trails all flow from the identity, not from a shared key that could be anyone.

2. Dark by default - no attack surface to exploit

Most AI gateways are publicly reachable endpoints protected by API keys. MCP servers sit behind firewalls with ports open for the gateway to reach them. The gateways are visible. The services are visible. The attack surface is the entire network.

We take a different approach. Services on an OpenZiti network have zero listening ports. They're not on the internet. They're not on the internal network. They don't exist to anyone who isn't explicitly authorized to see them. We call this "dark by default" and it applies to every component: the LLM Gateway, the MCP Gateway, the MCP servers behind them, and the agents themselves.

A compromised agent on a dark network can't scan for services, can't discover MCP servers it isn't authorized for, can't reach LLM endpoints it hasn't been granted access to. Not because a firewall rule blocks it - because the services literally don't exist from its network perspective.

3. Governed agent interaction, not just proxied traffic

This is where Agora comes in, and where we think the market has the biggest gap.

Some gateways (Kong, for example) can proxy A2A traffic - inspect it, log it, apply rate limits. That's useful. But proxying traffic is not governing the relationship.

Agora provides governed agent collaboration at the network layer:

Workgroups define who can see whom. An agent outside a workgroup cannot discover, see, or interact with agents inside it. Not a filtered view - structural invisibility.
Engagement contracts bound every session. Maximum duration, maximum message count, allowed message types, required workgroup memberships. The controller evaluates and enforces these terms. A provider doesn't need to be online to decide whether to accept an engagement - the contract speaks for it.
Sessions have explicit lifecycle. Proposed, accepted, active, closed. Every state transition is recorded. Every close has a reason. Closed sessions are retained for audit.
Envelopes carry infrastructure-visible headers and opaque payloads. The network can enforce governance (message type restrictions, envelope count limits) without understanding every payload format.

The Macro Pulse reference demo in the Agora repo shows this in practice: five organizations, eight agents, four inter-org workgroups. The market data provider and the internet signals provider don't share a workgroup. Neither knows the other exists. The consuming client is the only party with visibility across all channels. Every envelope is auditable. Every session is bounded by a contract.

How the pieces fit together

The NetFoundry AI platform is three products on a unified zero-trust foundation:

MCP Gateway secures access to MCP tool servers. Aggregates multiple backends, namespaces tools, filters permissions structurally (filtered tools don't exist in the registry - they're not checked at runtime). Per-client session isolation. No open ports.
LLM Gateway governs access to LLM providers. Multi-provider routing with semantic routing (3-layer cascade: heuristics, embeddings, LLM classifier). Identity-based virtual API keys. Per-identity budgets. Guardrails for PII, content safety, and prompt injection. Private model meshes without VPN.
Agora provides the governed agent network underneath. Cryptographic identity per agent. Workgroup-scoped discovery. Engagement contracts. Session governance. Full audit trail.

Each product works standalone. Together, they share one identity model - the same Ziti identity that authenticates an agent to the network also controls which LLMs it can access, which MCP tools it can use, and which other agents it can interact with. One identity, three surfaces, correlated observability across all of them.

What this means for your security team

If you're evaluating AI infrastructure, here are the questions we think your security team should be asking:

Can an agent reach services it isn't authorized for? If your AI gateway sits on a flat network, the answer is probably yes. Application-level controls help, but network-level microsegmentation is what prevents discovery and lateral movement.
What happens when you revoke an agent's access? With API keys, revocation is coarse - keys are typically shared across agents and services, so revoking one breaks others, and you usually don't know a key has leaked until it's already been used. With cryptographic identity, every agent has its own, you revoke at the agent level, and that agent is immediately unable to authenticate to anything. No grace period, no key rotation race, no collateral damage.
Can you reconstruct what happened? Not just which LLM calls were made, but which agents communicated with which other agents, under what contract terms, in which workgroup, with what close reason. If you can't answer this, you don't have AI SecOps - you have AI logging.
Are your MCP servers dark? If they're listening on ports behind a firewall, they're not dark. A firewall rule is not the same as a service that has zero listening ports and is invisible on the network.
What's your blast radius if an agent is compromised? If the answer is "everything on the network," you have a network-shaped problem that no application-layer gateway can solve.

Where to go from here

The open source components - Agora, LLM Gateway, and MCP Gateway - are all Apache 2.0 and available on GitHub. The Macro Pulse demo in the Agora repo runs end-to-end with live data sources if you want to see governed multi-agent collaboration in action.

For teams that want the full platform with commercial orchestration, visibility dashboards, and enterprise support, we at NetFoundry are running an AI Accelerator design partner program with a small number of early adopters. If the problems described here sound familiar, we'd like to hear about your use case.

The network layer is where AI security needs to live. We've spent years building zero trust at the network layer with OpenZiti. AI workloads need it.

AI SecOps: Why Your AI Infrastructure Has a Network-Shaped Blind Spot

The problem, concretely

What AI SecOps actually requires

How the pieces fit together

What this means for your security team

Where to go from here

Comments

Zero Trust for AI Infrastructure

Containing the Blast Radius: Network-Level Isolation for AI Agents

More from this blog

Secure Your Kubernetes Workloads with Ephemeral Zero-Trust Identities

Bake It In: Building Agent Runtimes on Zero Trust from Day One

Dark Model Endpoints: Private LLM Meshes for Regulated Industries

You Can't Govern What You Can't See

The Gap Between "Agents Can Talk" and "Agents Should Talk"

Command Palette

The problem, concretely

What AI SecOps actually requires

How the pieces fit together

What this means for your security team

Where to go from here

Comments

Zero Trust for AI Infrastructure

Containing the Blast Radius: Network-Level Isolation for AI Agents

More from this blog