Few engineering organisations can match the scale of resources available to Meta. Despite having virtually unlimited access to computing power and internal AI tools, the company has opted to restrict the use of Claude Code and Codex in its applied AI division, highlighting concerns over potential security and development risks. The concern, as reported, is that heavy usage could inadvertently distill proprietary knowledge into rival AI systems through normal employee workflows. No espionage required, no data breach, no rogue employee – engineers doing their jobs, nothing more.
This bears reflection: the risk Meta is navigating here doesn’t stem from malicious parties or rogue staff, but from the nature of the tools themselves. It’s the ordinary, unremarkable act of using an AI tool to do work, the same thing millions of people at thousands of companies are doing every day, with little visibility into where the outputs go once they leave the screen.
What Inadvertent Distillation Actually Means
Classically, distillation is an intentional technique: developers train a smaller, leaner model on the results of a larger one, aiming to replicate those high-end capabilities more affordably. What Meta’s concern describes is something messier and tougher to detect. Call it inadvertent distillation: the process by which a company’s proprietary knowledge leaks into an external model’s training data through the normal use of that model by employees.
In practice, it happens across every department: developers feed private code into AI assistants for suggestions, product managers use them to polish strategy documents and support teams build chatbots directly on top of internal knowledge bases. In each case, the input may be logged, retained or used to improve the model. The output may reproduce memorised training content. And the terms of service governing what happens to that data are rarely read as carefully as they should be.
Evidence from code-specific language model research proves that this risk is very real. Recent studies show that code LLMs can leak data between 42% and 64% of the time. Even more concerning, these models produce exact, verbatim memorisation in about 13% of standard suggestions. This isn’t a statistical outlier – it’s a pervasive risk woven into everyday work.
More from Artificial Intelligence
- AI Is Quietly Influencing What Care Gets Approved – Should Patients Be Told?
- Are Heatwaves The Next Big Threat To AI And Europe’s Digital Infrastructure?
- Why Are So Many People In London Experiencing AI Scams?
- Uber Used A Year’s AI Budget In 4 Months: Are Companies Spending On AI Faster Than They Can Measure It?
- Is The Global Health System Ready For Real-Time, AI-Driven Clinical Trials?
- Meta’s New Content Moderators Are AI – Should We Trust The Algorithm?
- What Does AI’s First Lawsuit Win Mean For The Future Of Law?
- Exabeam Launches Open Source Praxen To Bring Agent Behaviour Verification To AI Agents And Digital Workers
The Hidden Risks Of AI-Driven Workflows
Meta may be focused on the high-stakes battle between big AI labs, but the security risks they are navigating are a reality for any company whose value depends on its internal data. Source code, proprietary datasets, pricing logic, customer data, internal workflows and roadmap material. Any of these that passes through a third-party AI tool is, depending on the vendor’s data policies, potentially no longer purely internal.
The hard truth: companies are treating AI vendor agreements like routine paperwork, overlooking the fact that they are actually managing the fate of their intellectual property. They’re asking whether the tool works, not what happens to the prompts once they’re sent. The Aura data breach earlier this year was a reminder that data exposure often happens through ordinary processes rather than targeted attacks. The AI distillation risk follows the same pattern: low visibility, no obvious trigger, potential for harm.
The most exposed organisations are those that have scaled AI adoption too quickly, neglecting to implement guardrails on acceptable employee usage. In today’s market, this lack of oversight is worryingly common. The Deloitte AI infrastructure survey from 2026 found that a majority of businesses lack clear visibility into where their AI-generated outputs are stored or how they’re used downstream.
Building a Defensive Strategy For AI Adoption
The answer isn’t to ban AI tools – that ship has sailed and the productivity loss would be felt. The answer is to treat AI vendor terms with the same seriousness you’d treat a data processing agreement, because that’s effectively what they are.
Three things to build policy around: what your employees are permitted to paste into external AI tools, what categories of internal material should never touch a third-party model regardless of the task and what the vendor’s actual data retention and training rights say in their terms. Most AI tool vendors offer enterprise tiers with stronger data isolation, zero-retention policies and contractual commitments that the consumer or standard tiers don’t provide. If your business is using the consumer tier, you’re operating on the default data policy, which is almost certainly not designed with your IP protection in mind.
Meta noticed this risk early enough to write internal guidelines around it. Most businesses are still in the phase of being excited that the tools work. The distance between those two positions is where competitive intelligence quietly moves in the wrong direction, and by the time it shows up as a problem, it’s already been happening for a while.
