How to Stop AI Agents Leaking Private Context

The delegation problem

AI agents are becoming delegates. They no longer just answer questions or complete isolated tasks — they act on behalf of users, carrying rich private context into interactions with other agents and services.

A health agent knows about diagnoses, medications, family medical history, and mental health status. A financial agent knows about income, debt, spending patterns, and investment strategy. A personal assistant knows about relationship dynamics, daily routines, emotional state, and private preferences. That depth of understanding is precisely why these agents are worth having. It is also what makes coordination between them a disclosure problem.

The obvious response is: don't give agents that context. Keep them narrow, purpose-built, and minimally informed. For some tasks, that is a perfectly valid design choice. A scheduling agent that only sees calendar slots does not create a disclosure problem.

But the most powerful delegation use cases require exactly this kind of context. An agent negotiating a contract on your behalf needs to know your budget limits and walk-away conditions. An agent assessing medical compatibility needs access to real health constraints. An agent mediating a dispute needs to understand each party's actual priorities and concerns. Remove the context and you remove the reason for delegation — the agent can no longer exercise the judgment or navigate the trade-offs that a human intermediary would bring.

AgentVault is designed for this harder case: agents that need sensitive context to be useful, coordinating with other agents that carry similarly sensitive context. The goal is not to prevent agents from ever seeing private data — it is to let them reason with it while keeping coordination structurally bounded.

When two agents need to coordinate — to assess compatibility between their users, negotiate terms, synthesize information across domains, or mediate a sensitive decision — the private context each agent carries becomes part of the interaction surface. If the communication channel between them is unconstrained, either agent can disclose any information it has access to.

Even where the public incident record is still emerging, this is a predictable structural risk of any system where context-rich agents communicate over free-text channels.

Why "be careful" does not work

The problem is not that agents have sensitive context. The problem is that free-text coordination gives them no structural limit on what they can reveal.

The most common mitigation is prompting: tell the agent to be discreet, add guardrails to the system prompt. But this treats disclosure as a behavioral issue when it is actually a channel capacity issue.

A free-text channel between two agents has effectively unlimited information capacity. A model instructed to "not share medical information" might still reveal a user's health status through the specificity of its dietary preferences, the timing of its scheduling constraints, or the way it frames risk tolerance. Models are opaque, probabilistic systems — they cannot reliably track what they have revealed across a conversation, and they cannot guarantee that subtle correlations in their output do not leak private facts.

The only way to guarantee bounded disclosure is to bound the channel itself — to structurally constrain the output so that the total information it can carry is finite and measurable, regardless of what the model attempts to express.

Four structural approaches

Preventing agent-to-agent context leakage requires structural mechanisms, not just behavioral instructions. Four approaches, used together, provide a complete agent-to-agent privacy architecture:

1. Constrain channel capacity via schemas

The most direct way to limit disclosure is to narrow the output channel. Instead of allowing agents to communicate via free text, constrain the output to a fixed schema with bounded fields.

A field that accepts an integer between 1 and 5 can carry at most ~2.3 bits of information. A three-value enum carries ~1.6 bits. A boolean carries 1 bit. The total information capacity of the output is the sum of its fields' individual capacities — a finite, measurable bound on how much the agent can disclose.

This is not about choosing the right format. It is about choosing a format whose information-theoretic properties match the privacy requirements. A schema with three bounded integers and two enums carries far less information than a schema with a single free-text string — even though both are "structured outputs."

Schema design is disclosure design. The schema determines the ceiling on what can leak.

2. Pre-agree terms via contracts

A schema is only meaningful as a privacy constraint if both parties agreed to it before the session began. A schema imposed unilaterally, selected dynamically, or changed mid-session does not provide the same guarantee.

Coordination contracts solve this. A contract is a machine-readable document that defines the session's purpose, the output schema, the prompt template, the guardian policy, and the disclosure terms. It is content-addressed by its SHA-256 hash and established before any private context enters the system. Both parties agree to identical terms. The contract cannot be renegotiated during the session.

Pre-agreement means consent is explicit and auditable. Each party knows exactly what they are consenting to disclose. If the schema permits a compatibility score from 1 to 5, both sides know that roughly 2.3 bits of information will flow — before they share any private context.

3. Enforce at an external point

Agreement is necessary but not sufficient. The contract specifies what should happen. Enforcement ensures it does happen.

In many agent systems, the model itself is the only enforcement point. The model generates an output, and the calling code validates it. But in agent-to-agent coordination, the model is one of the interested parties. Asking the model to enforce its own disclosure constraints is like asking a negotiator to also serve as the arbiter.

An external enforcement point — a relay that sits between the agents — provides structural independence. The relay validates every output against the contract's schema. Outputs that fall outside the agreed structure are rejected before they reach either party. The relay does not participate in the reasoning. It is a structural constraint, not a participant.

External enforcement means that even a compromised, careless, or adversarial model cannot disclose more than the schema permits. The constraint holds regardless of model behavior.

4. Verify after the fact

Trust but verify. Even with pre-agreed contracts and external enforcement, parties need a way to confirm that the session was conducted under the declared terms. Without verification, assurance depends on trusting the enforcement layer — and trust without evidence is fragile.

Cryptographic receipts provide this verification. After the session completes, both parties receive an Ed25519-signed record that binds the contract hash, output schema, prompt template hash, guardian policy hash, model identity, and session metadata to the result. The receipt is independently verifiable. Either party can confirm that the output they received was produced under the contract they agreed to.

Receipts make the privacy guarantee auditable. They transform "we agreed to these terms" from a claim into verifiable evidence.

How the four approaches compose

Each approach addresses a different failure mode:

Schema constraints bound the channel capacity — limiting how much information can flow regardless of intent.
Contracts ensure mutual consent — both parties agree to the same terms before the session begins.
External enforcement removes the model from the trust chain for output validation — constraints hold even if the model tries to circumvent them.
Receipts provide verifiable evidence — after the fact, both parties can confirm the session was governed correctly.

Used in isolation, each approach has gaps. Schemas without contracts lack consent. Contracts without enforcement are aspirational. Enforcement without receipts is unverifiable. Used together, they form a complete bounded disclosure architecture.

The structural principle

The common thread across all four approaches is structural enforcement rather than behavioral control. Instead of asking agents to be careful, the system constrains what agents can express. Instead of trusting model behavior, the architecture bounds the channel. Instead of hoping for compliance, it makes compliance verifiable.

This is the core insight behind AgentVault: privacy in agent-to-agent coordination is not a prompting problem, a fine-tuning problem, or a model-alignment problem. It is an architecture problem. The right architecture makes disclosure structurally bounded — regardless of what the model knows, intends, or attempts.

To explore how these four mechanisms work together in a running system, see the AgentVault repository.