Two years ago, the standard response from the AI industry to safety concerns was that voluntary commitments and internal red teams were sufficient. That position is getting harder to defend.
Claude has been weaponised in large-scale extortion campaigns targeting organisations across multiple countries. Grok was deployed in national security contexts despite a documented history of generating harmful content. The Pentagon designated Anthropic a supply chain risk after it refused to remove ethical constraints on autonomous weapons use, a decision in a legal challenge that some commentators have described as likely unlawful First Amendment retaliation. Vibe-coded applications are shipping with significantly higher vulnerability rates than human-written code.
The incentive structure underneath all of this is quite consistent. When competitive position depends on how fast a product ships, safety functions get negotiated down. Internal safety teams are routinely under-resourced relative to the product and research organisations they are supposed to govern.
A 2025 EY-linked survey found that a majority of organisations allow employees to develop or deploy AI agents without high-level approval; only 60% issue formal guidance for such work. OpenAI significantly restructured its internal safety and alignment functions in early 2026. FLI’s Winter 2025 Index concluded that no frontier lab scored in the top tier on overall safety, with scores on existential-risk measures particularly weak.
The case that this represents a structural problem rather than a collection of isolated incidents is building. The attack surface created by AI systems is qualitatively different from previous software: these are systems that can autonomously take actions, adapt to context and be redirected toward purposes their developers did not intend. A model capable of identifying vulnerabilities rapidly and across large codebases can also be used to exploit them. A model trained to be helpful can be prompted to assist with extortion.
The debate has shifted: no longer whether AI has introduced new risks, but whether the industry has the infrastructure to contain them.
A Structural Imbalance, Not A Collection Of Mistakes
The pattern across the evidence points not to individual companies cutting corners, but to competitive dynamics making caution commercially irrational. Labs that move slowly lose ground and safety teams that block deployment get overruled.
Researchers who probe vulnerabilities in publicly available models face threats of legal action under terms of service that prohibit safety-related testing. In 2023, major AI labs signed White House voluntary commitments to support independent safety research. By 2024, almost none had established real protections for the researchers who try to do it.
At the enterprise deployment level, the problem is compounded by data environments that were not designed for AI. Fragmented data, inconsistent classification policies and limited visibility into where sensitive information flows mean that integrating AI systems into existing infrastructure significantly increases the risk of unintended exposure. The pace of AI adoption has not been matched by the governance maturity needed to make it safe.
The problem with safety teams isn’t competence ā they are being outpaced by an architecture that treats security as a gate at the end of the process rather than a foundation at the start of it. The result is that safety functions are reviewing products that have already been shipped, patching vulnerabilities in systems already in production, and managing incidents in real time rather than preventing them at the design stage.
What Meaningful Safety Infrastructure Would Actually Look Like
There is rational agreement across the field on what the components would need to be.
Independent, adversarial safety research with genuine legal protection rather than the threat of litigation. Mandatory pre-deployment testing with enforcement teeth rather than voluntary frameworks. Zero-trust deployment environments where AI agents operate under least-privilege constraints and require cryptographic human-in-the-loop authorisation for sensitive actions. AIBOM manifests bound to runtime telemetry. Incident disclosure requirements that create accountability for failures rather than allowing them to be buried.
The real challenge lies in identifying who will build this. Labs are competing against each other and have weak incentives to absorb the cost of infrastructure that benefits the whole industry. Enterprise buyers could in principle refuse to purchase models that lack transparent governance, but most currently lack the technical authority to audit what they are buying.
Regulators have the mandate but have consistently lagged the technology. The EU AI Act is the closest thing to a binding framework; the US has no real equivalent. What several contributors to this piece argue is that until the cost of deploying an insecure AI system exceeds the commercial benefit of having shipped it first, voluntary safety culture will remain just that.
We put the question to AI safety researchers, cybersecurity specialists and deployment risk experts to find out what they think needs to change.
More from Cybersecurity
- ShinyHunters Just Hacked Rockstar Through A Supplier ā Every Business Using Third-Party Software Should Pay Attention
- Is Vibe Coding Safe Or A Cybersecurity Disaster Waiting To Happen?
- Anthropic Is Taking On Cybersecurity With AI, And It Has Brought Apple and Amazon Along For The Ride
- External Attack Surface Management And Why It Matters For Startups
- SpyCloudās 2026 Identity Exposure Report Reveals Explosion Of Non-Human Identity Theft
- The Aura Data Breach Exposed 900,000 Users ā Here Is What Every Business Needs To Know
- How AI And Hacking Professionalism Are Overwhelming Endpoint Security
- Navigating The Hidden Dangers Of USB Devices In The Modern Workspace
Our Experts:
- Omair Manzoor, Founder and CEO, ioSENTRIX
- Paulo Cardoso do Amaral, former CIO and NATO Scientific Advisor on Cybersecurity
- Raphael Karger, CTO, ZeroPath
- Seb de Lemos, CEO, hosting.com
- Shreyans Mehta, CTO, Cequence Security
- Collin Hogue-Spears, Senior Director, Black Duck Software
- Stanislav Kazanov, Head of GRC, Cybersecurity and Sustainability, Innowise
- Aviral Srivastava, Security Engineer, Amazon
Omair Manzoor, Founder and CEO, ioSENTRIX
![]()
“The honest answer is yes, but not in the way most people frame it. The problem isn’t that any single company decided to cut corners. It’s that the competitive dynamics made cutting corners rational. When the competitive gap between shipping now and shipping in six weeks determines market position, safety stops being a foundation and becomes a negotiable variable.
“We’re seeing the results in real time. Claude Code weaponised into an automated extortion pipeline. Apple Intelligence hijacked through prompt injection on 200 million devices. Vibe-coded applications shipping with three times the vulnerability rate of human-written code. These aren’t hypotheticals. These are findings from our actual pen testing engagements and from public research in the last few months alone.
“Safety teams can’t keep pace, not at current resourcing levels. The product team ships the LLM integration before the security team knows it exists. Shadow AI is the new shadow IT, except it moves faster and touches far more sensitive data. What meaningful safety infrastructure looks like is honestly pretty boring: mandatory adversarial testing before any model touches production data, independent red teaming that isn’t funded by the company being tested, and regulatory teeth. Not guidelines, not frameworks. Actual enforceable standards with consequences. Until the incentive structure rewards caution, we’ll keep having this conversation every time something blows up.”
Paulo Cardoso do Amaral, Former CIO and NATO Scientific Advisor on Cybersecurity
![]()
“The AI race has structurally compromised safety, not because every model is reckless, but because the incentives are. When speed, scale and strategic positioning dominate, safety becomes a drag coefficient rather than a hard launch condition. Attackers can automate code exploitation faster. Social engineering is now powered by convincing voice, image and video impersonation. Frontier models are being pulled into national security contexts before governance is mature.
“Safety teams are not keeping pace. In too many organisations, advisory functions remain while product and deployment teams operate at wartime tempo.
“Meaningful safety infrastructure would look more like aviation or financial market infrastructure: mandatory pre-deployment testing, independent red-teaming, continuous monitoring, incident disclosure, auditable logs, strong identity and provenance controls, and clear restrictions for military and other high-risk uses. It also requires redesigning insecure digital architectures, not merely adding guardrails afterwards. Responsibility starts with frontier labs, but deployers, regulators, sector bodies and states all share it. If AI is now part of critical infrastructure, safety cannot be a voluntary culture. It has to be engineered, audited and enforced.”
Raphael Karger, CTO, ZeroPath
![]()
“Yes, but it’s more precise to say the AI race has revealed a pre-existing structural gap. Security has always been an afterthought in software. AI just accelerated the timeline and raised the blast radius. The pressure to ship isn’t new. What’s new is that the models being shipped can themselves be weaponised as attack infrastructure. The race dynamic makes it harder to justify slowing down for security work that doesn’t show up on a benchmark.
“Safety and security teams at most AI labs are structurally downstream of the product and research organisations. They review what’s already been built. That’s not a staffing problem. It’s an architectural one. You can’t hire your way out of a process that treats security as a gate rather than a foundation.
“Meaningful infrastructure means continuous, automated security validation integrated into the model development lifecycle, not red-teaming sprints before a release. It means treating AI systems like the complex attack surfaces they are. Responsibility is shared: labs own the model layer, but the broader ecosystem, the platforms, integrations and deployment environments, needs its own security posture. Right now almost no one is looking at that layer seriously.”
Seb de Lemos, CEO, hosting.com
![]()
“AI hasn’t broken safety outright overnight, but it has materially stretched and fragmented it, particularly in software development. With AI, anyone can now act as a developer. That democratisation is powerful, but it introduces uneven standards, where production-ready code is deployed without the governance, testing and review processes that were once standard. Many people developing software now either don’t fully understand what they’re building or are using AI to accelerate development without understanding what loopholes their code might contain.
“Internal security teams are being asked to operate at a pace and scale that simply didn’t exist before. AI accelerates development, but security practices, governance processes and compliance checks have not scaled at the same rate. We’re seeing this play out in real incidents where AI-generated code has introduced vulnerabilities because the underlying logic wasn’t validated. Safety teams aren’t failing. They’re being outpaced.
“Meaningful safety infrastructure needs to be built in, not bolted on, spanning the full lifecycle from development through to deployment and ongoing maintenance. Regulation and compliance should be operationalised directly into infrastructure, ensuring applications are compliant by default rather than through manual intervention. If AI is lowering the barrier to building software, the industry must equally lower the barrier to building it safely.”
Shreyans Mehta, CTO, Cequence Security
![]()
“The cybersecurity industry spent a decade building detection around human behavioural signals. AI agents break that detection. They make direct HTTP requests from clean residential IPs with plausible headers, never execute JavaScript, never render a page. Every UEBA baseline built on human behavioural norms is now effectively irrelevant. What matters now is real-time detection: server-side behavioural analysis trained on years of real API traffic, operating on mathematical models that do not depend on the entity being human.
“Most organisations that have moved beyond basic connectivity have landed on identity as their answer. Integrate with an enterprise identity provider, enforce OAuth, and ensure agents act on behalf of authenticated users. But this is exactly where the industry’s thinking stops, and where the most dangerous failures begin. Controlling agent permissions at the tool level is essential, not just who the agent is, but what it is allowed to do.
“Sensitive data still flows through tool calls that identity alone cannot inspect. Agent behaviour can drift in ways that authentication cannot detect. This is why AI gateways are needed: combining sensitive data detection, behavioural fingerprinting, session binding and a trusted registry on top of identity and connectivity. One AI coding agent we observed made 2,500 tool calls over 48 hours before improvising, probing unauthorised file paths and attempting write operations its credentials did not permit.”
Collin Hogue-Spears, Senior Director, Black Duck Software
![]()
“Yes. The EU and China have binding regulatory floors. The US does not. The December 2025 White House executive order pre-empts state action without replacing it, leaving California’s SB 53 and New York’s RAISE Act as the de facto national standard. FLI’s Winter 2025 Index graded no frontier lab above C+ overall on safety, and none above D on existential safety. The February 2026 Pentagon supply chain designation punished Anthropic, the lab with the highest safety score, for holding two narrow ethical red lines. That is the signal every other lab reads.
“Safety teams can’t keep pace, and the reason is architectural. Deterministic compliance frameworks cannot govern stochastic agents generating novel outputs on every invocation. CrowdStrike’s 2026 Threat Report puts adversary breakout time at 27 seconds. Non-human agent identities now outnumber human identities 82 to one, and only 18% of security leaders trust legacy identity access management for those agents. OpenAI dissolved its Mission Alignment team in February 2026. This is not an effort problem. It is a tool-category problem.
“Meaningful infrastructure requires an agent zero-trust gateway applying NIST SP 800-207 to every tool invocation, with deny-by-default access and scoped credentials per action; AIBOM manifests bound to runtime telemetry alerting on out-of-manifest calls; capability-tiered controls; and a pre-deployment testing framework covering prompt injection and tool misuse. NIST owns the AI RMF, OSCAL and SBOM ecosystems. That’s where the baseline gets built.”
Stanislav Kazanov, Head of GRC, Cybersecurity and Sustainability, Innowise
![]()
“The AI race has actively penalised safety. When a government blacklists a lab building frontier AI for enforcing responsible development ethics on autonomous weapons, while promoting developers who remove those constraints, the market receives a very clear signal: caution in AI development is a commercial liability.
“Safety teams are mathematically challenged to keep up. They are attempting to defend against exponential increases in capability with only linear resources. Attackers are already using vibe-hacking to exploit agentic AI tools for automated data extraction and extortion. A corporate red team cannot manually patch behavioural vulnerabilities faster than the underlying model can generate new, unforeseeable logical paths.
“Meaningful safety infrastructure in 2026 cannot consist of an internal trust and safety committee reporting to a Chief Revenue Officer. There must be zero-trust deployment environments where autonomous AI is denied from conducting privileged network functions without a hardware-bound, human-in-the-loop cryptographic signature. AI vendors cannot build this because they’re competing on price. It must be created by enterprise buyers, CISOs and GRC leaders, who will refuse to purchase models without transparent governance mandated by regulation with the technical authority to audit model weights before deployment. Until the cost of delivering an insecure AI exceeds the benefit of shipping first, the industry can only manage the blast radius, not prevent it.”
Aviral Srivastava, Security Engineer, Amazon
![]()
“The race has structurally compromised safety, but not in the way most people talk about it. The bigger risk is not rogue models. It’s that the infrastructure layer underneath these models is being shipped at startup speed with enterprise-grade security assumptions that simply are not true. I’ve filed critical vulnerabilities in AI platforms with tens of thousands of production deployments where the maintainers denied the issue, hid behind documentation as a fix, or simply stopped responding. That’s not a model alignment problem. It’s a basic software security problem dressed up in AI branding.
“Safety teams can’t keep pace because the scope of what AI safety means keeps expanding while the investment stays narrow. Most attention goes to alignment research and red teaming model outputs. Almost nobody is looking at the deployment stack, the orchestration frameworks, the model file formats, the inference engines. That’s where the actual attack surface is right now, and it’s largely unguarded.
“Meaningful safety infrastructure starts with treating AI tooling like critical software, not hackathon projects. That means funded security audits, real vulnerability disclosure programmes with actual response timelines, and regulatory teeth behind frameworks like the NIST AI RMF instead of voluntary adoption that nobody enforces. The responsibility sits with the companies shipping these tools, but most of them are currently optimising for GitHub stars and funding rounds, not security posture.”
For any questions, comments or features, please contact us directly.
