Skip to main content
Back to Blog
aiagentic-aiai-agent-securitymosaicleaksenterprise-data-privacyllm-toolsquery-log-auditingservicenow-research

The Mosaic Effect: Why Your Research Agent Leaks More When It Gets Smarter

MosaicLeaks shows smarter AI agents leak more enterprise data via query logs — and prompt-based safety fixes barely help. What your threat model is missing.

Zyfolks Team ·

A research agent fires off three innocent-looking web queries over the course of a single task. None of them, taken alone, would raise an eyebrow. But anyone watching the outbound traffic can stitch the fragments back together into a private fact that lives only inside an enterprise’s internal documents. That is the threat model behind MosaicLeaks, a new benchmark and training method from ServiceNow’s research team, and it exposes a counterintuitive result: training an agent to be better at its job often makes it worse at keeping secrets.

What MosaicLeaks Actually Measures

The paper, authored by Alexander Gurung, Spandana Gella, Alexandre Drouin, Issam H. Laradji, Perouz Taslakian, and Rafael Pardinas, introduces a 1,001-chain dataset of multi-hop research tasks that interleave private enterprise documents with a controlled public web corpus. Local documents are pulled from DRBench-style enterprise tasks; web documents come from BrowseComp-Plus. The split is 559 training chains, 98 validation, and 344 held-out-company test chains. Crucially, each chain is constructed so the answer to one hop becomes the bridge entity for the next, forcing the agent to retrieve private context before it can form the next useful web query.

Most agent evaluations score end-to-end task success and call it a day. MosaicLeaks instead treats the web query log itself as the leakage channel and defines three escalating threats. Intent leakage means an observer can guess what the agent is investigating. Answer leakage means the query log holds enough fragments to answer a private question someone already has. Full-information leakage is the strongest form — the observer can state verifiably true private claims without being told what to look for in the first place.

If you’re a team shipping a research copilot that touches HR records, customer data, or internal financials, this framing changes your threat model overnight. Your DLP rules probably scan documents and outputs; they almost certainly do not reconstruct a mosaic across a session of search queries. The team’s prediction here lands hard: query-log auditing will become a standard part of enterprise agent procurement within the next eighteen months, and vendors who can’t produce a leakage score will get screened out.

Why “Please Don’t Leak” Doesn’t Work

The most tempting fix is the cheapest one: add a line to the planner prompt telling the agent not to send web queries that reveal private information. ServiceNow’s team tried exactly this. The result is the kind of graph that should be printed and pinned above every prompt engineer’s desk.

For Qwen3-4B, the safety prompt nudged answer/full-information leakage from 34.0% down to 25.5%. At the same time, strict chain success — the share of chains where every hop is answered correctly — dropped from 48.7% to 44.5%. The agent didn’t learn to construct safer queries. It mostly just issued fewer of them, which dragged accuracy down without closing the leak.

This is the failure mode you keep seeing in production: prompts add hedges, hedges shave a few points off the worst behavior, and the gap is declared closed. It isn’t. For anyone evaluating custom-built versus off-the-shelf agent platforms, this is a useful sharp edge. A SaaS agent you can’t retrain is an agent stuck with prompt-level mitigations. If your data is sensitive enough that a mosaic of three queries matters, the policy work has to happen during training, not at inference.

The Performance-Privacy Tradeoff Nobody Warned You About

The paper’s most uncomfortable finding comes from a baseline experiment most teams would consider a win. The researchers fine-tuned the agent with a standard outcome reward — get the chain right, get reinforced. Strict chain success climbed from 48.7% to 59.3%, a clean double-digit gain on the actual task.

Leakage climbed too. Answer/full-information leakage jumped from 34.0% to 51.7%.

A more informative query is usually a better query for retrieval. “Cloud migration milestones January 2024” beats “recent tech news” for hitting the right document. So the model learned to pack more context into each search — which is exactly the behavior the mosaic adversary feeds on. Every fragment that helps the agent find the right public page also helps the observer reconstruct the private one.

For a healthcare firm using a research agent over patient cohorts, or a fintech team running diligence over deal documents, this is the kind of failure that doesn’t show up in a quarterly accuracy report. It shows up in a regulator’s letter. Anyone planning bespoke AI agents for support, sales, or internal knowledge work should treat this result as evidence that single-objective fine-tuning is a privacy regression by default. Optimization pressure flows toward whichever metric you measure, and “didn’t leak” is rarely on the dashboard.

How PA-DR Trains the Agent to Search Without Leaking

Privacy-Aware Deep Research, or PA-DR, is the team’s training recipe for resolving the tradeoff. It combines two rewards. The first is a situational task reward: instead of scoring an entire trajectory once at the end, each model call is judged against other calls made at the same stage and hop with the same information available. A Plan call gets credit for searching the right source and pulling the right document; a Choose call gets credit for picking the document that contains the answer. The second is a learned privacy reward — a Qwen3-4B classifier estimates whether the current queries leak private information directly and whether adding them to the existing query log creates a new mosaic leak. PA-DR penalizes the larger of the two, anchoring the cost to the exact planning decision that widened the exposure.

Task + PA-DR reward reaches 58.7% strict chain success — essentially matching the 59.3% from task-only training — while dropping answer/full-information leakage from 34.0% all the way down to 9.9%. That final figure is lower than the untrained base model’s leakage. Training for privacy didn’t just cancel the regression introduced by task-only training; it left the agent leaking less than it did before any RL at all.

And it didn’t get there by searching less. PA-DR actually issues more web queries than the base model. It just strips the revealing details — the “15%”, the “2024”, the specific shape of the answer it’s hunting — out of the query text. The agent still finds the right public document. It just stops dragging private fragments along for the ride.

The sample-efficiency gain is worth flagging separately. The situational task reward reaches roughly 55% strict success with about 5-6x fewer generated training samples than outcome-only RL, according to the team’s training-efficiency table. PA-DR keeps that efficiency while folding in the privacy gain. For shops weighing the question of building a custom agent versus stitching together automation flows, the implication is concrete: per-step credit assignment has moved from research curiosity to baseline requirement for serious agent RL.

FAQ

Q: What is the mosaic effect in AI agents? A: The mosaic effect is when an agent’s individual web queries each look harmless, but the cumulative query log lets an observer reconstruct private information the agent pulled from internal documents. MosaicLeaks formalizes this with three measures: intent leakage, answer leakage, and full-information leakage, in increasing order of severity.

Q: Why can’t a prompt fix privacy leakage? A: ServiceNow’s team tested adding a safety instruction to the planner prompt and found inconsistent gains paired with a drop in task performance. For Qwen3-4B, leakage fell from 34.0% to 25.5% but strict chain success fell from 48.7% to 44.5%. The agent issued fewer queries rather than constructing safer ones, which is a mitigation that erodes the product without closing the hole.

Q: How does PA-DR differ from standard RL fine-tuning? A: PA-DR uses a situational reward that judges each model call against comparable calls at the same hop and stage, rather than scoring whole trajectories. It adds a learned privacy reward from a Qwen3-4B classifier that flags both direct leaks and new mosaic risks introduced by the current query. The combination cut leakage from 34.0% to 9.9% while keeping strict chain success at 58.7%.

Key Takeaways

  • Treat the agent’s web query log as a first-class output during evaluation; teams that only audit final answers will miss mosaic-style exposure entirely.
  • Outcome-only RL fine-tuning on multi-hop research is a privacy regression by default — expect leakage to climb alongside accuracy unless a privacy signal is in the loop.
  • Per-step situational rewards are emerging as the better credit-assignment primitive for agent RL, with the MosaicLeaks results showing a roughly 5-6x sample-efficiency advantage over outcome rewards.
  • Vendor-locked agents that only expose prompt-level controls will be hard to deploy in regulated industries once query-log auditing becomes standard procurement diligence.
  • The next twelve months will see privacy benchmarks like MosaicLeaks copied into enterprise RFPs, with leakage metrics sitting next to accuracy on every serious agent scorecard.

Have a project in mind?

Tell us what you're building — we reply within 24 hours.