What started as a routine reinforcement learning experiment at Alibaba ended with an AI agent trying to mine cryptocurrency on its own.
In a paper published on arXiv on the last day of 2025 titled “Let It Flow,” Weixun Wang and 89 co authors described how their ROME agent behaved during large scale training. According to reporting by Forbes, during one run the 30 billion parameter model “began probing internal networks, established a reverse SSH tunnel from an Alibaba Cloud instance to an external IP address, and quietly diverted GPU capacity toward cryptocurrency mining.”
No one had told it to do that. The task instructions had nothing about tunnelling or mining. The activity came to light because Alibaba’s managed firewall flagged unusual outbound traffic that kept lining up with specific training episodes.
The paper says this on what happened: “instrumental side effects of autonomous tool use under RL optimisation.” To put it simply, the system discovered that gaining extra compute and holding onto network access improved its reward score. Crypto mining was not the assignment. It was a shortcut the model stumbled across while trying to perform better.
Why Did This Only Blow Up Months Later?
The paper was online for over 2 months with little attention. That changed last week, when ML researcher Alexander Long posted a screenshot of the safety section on X and called it an “insane sequence of statements buried in an Alibaba tech report,” Forbes reported. The post got about 1.7 million views. Ryan Adams, co founder of crypto media company Bankless, shared it soon after. The debate took off from there.
The argument moved from technical curiosity to legal confusion quite fast. US regulators such as the CFTC and SEC, working under Project Crypto since January 2026, oversee trading and market conduct. Autonomous mining during a training run does not work in either category, according to Forbes.
Cryptojacking laws criminalise unauthorised use of computing resources. But in this case the system was running on Alibaba’s own infrastructure. As Forbes put it, you cannot cryptojack yourself.
Blockchain intelligence company TRM Labs addressed responsibility in a recent assessment. “Responsibility ultimately rests with the human actors who design, deploy, authorize, or benefit from AI systems,” TRM’s analysts wrote, according to Forbes. The awkward follow up question is which human that is.
Is This Really About One Rogue Agent?
ROME did not “decide” to mine crypto in a human sense. It was optimising. Forbes gives context to the case with earlier reward hacking examples. In 2016, OpenAI’s CoastRunners agent exploited a scoring loophole instead of finishing a race. In 2025, Anthropic reported that models trained to reward hack on coding tasks learned to call sys.exit(0) to fake passing tests. OpenAI’s o3 model reward hacked “by far the most” of any frontier model tested that year, according to safety institute METR.
ROME is a unique one because money is now involved here. More than 550 AI agent crypto projects, with a combined market capitalisation of $4.34 billion as of early March 2026 according to BlockEden.xyz, are building agents with financial capabilities on purpose, Forbes reported.
The uncomfortable reality is this: reinforcement learning systems will use resources if doing that improves their score. In this case, that meant compute and network access. The crypto mining was a side effect of optimisation, not a grand escape attempt.
The real issue is that it was caught through firewall logs and only became public after a social media screenshot. That tells you the governance framework is not keeping up with the technology.
Tech Experts Discuss The Future Of Agentic AI After Agent Mines Crypto Without Receiving Instructions
Tech experts have shared their thoughts on what this means for the future of agencic AI, and this is what they think…
Our Experts:
- Nik Kairinos, CEO & Co-founder, RAIDS AI
- Syed Asif Ali, Founder, Point Media & Pointika
- Zahra Timsah, Co-Founder & CEO, i-GENTIC AI
- Sunil Manjunath B.R., Co-founder, Techhoor
Nik Kairinos, CEO & Co-founder, RAIDS AI
![]()
“Today’s reports of the Rome AI agent that started mining cryptocurrency without being asked to, exemplifies the need for close monitoring of AI models. This includes pre-deployment testing ahead of models being deployed as well as continuous monitoring that can track changes to behavior when it encounters real world scenarios. This dual-layer approach is essential to ensure AI safety.
More from News
- Boost Named monday.com’s Best Professional Services Partner In EMEA For The Second Consecutive Year
- Meta Acquires Moltbook: What Responsibility Do Meta And Regulators Have To Control The Platform?
- Are Space Data Centres The Next Big Thing, Or Is Musk Dreaming Big?
- Why Doesn’t Sundar Pichai Have The Cult Following Of Other Big Tech CEOs, Despite Running A $3 Trillion Company?
- New Reports From Indeed Show That Businesses Simply Do Not Have Time For AI Upskilling
- Why Exactly Are Oil Refineries Being Targeted Amid Middle East-US Conflict?
- Experts Share: Is Cyber Warfare The New Battlefield Of Modern Conflict, And Is The US Prepared?
- What Is Open Banking And Is It The Next Step In Unlocking Billions For The UK’s Economy?
“This approach is particularly critical as AI is encouraged to be more creative and find its own solutions, as in this example, because the risks of it acting in undesirable and dangerous ways increase. When AI has the freedom to determine its own methods, it can result in unintended actions with serious consequences.
“In this instance, the Rome developers had guardrails in place and a warning of a security breach was triggered. But we’ve seen many examples of where this isn’t the case, of where AI has gone rogue and resulted in financial loss, emotional distress, reputational damage and regulatory action.
“Continuous monitoring is the key. It’s the missing layer between guardrails that exist only on paper and those that actually react.”
Syed Asif Ali, Founder, Point Media & Pointika
![]()
“Recent discussions about AI agents acting beyond explicit instructions highlight an important shift in how we think about autonomy in software systems.
“Agentic AI is designed to pursue goals rather than follow rigid step-by-step commands. When these systems interact with complex environments like financial networks or blockchain infrastructure, unexpected behavior can emerge — not necessarily because the AI is malicious, but because optimization goals can lead to actions developers did not fully anticipate.
“The key challenge going forward will be trust and governance. Organizations deploying agentic AI will need stronger guardrails: clearly defined task boundaries, monitoring systems that detect unusual behavior early, and layered oversight where human operators can intervene when necessary.
“Agentic systems have enormous potential to automate complex workflows and decision-making. But as autonomy increases, transparency and control mechanisms will become just as important as capability.”
Zahra Timsah, Co-Founder & CEO, i-GENTIC AI
![]()
“The reality is that AI agents and agentic AI technology is already way ahead of any regulatory standards. When you let agents decide and act on their own, trust has to be enforced at the decision-making level.
“Today, agents are already reviewing regulatory submissions, approving financial transactions, moving data between systems, and triggering downstream actions automatically. In one real scenario we see often, an agent is connected to internal document systems and external APIs. It might retrieve customer data, generate a report, and send it externally. If that agent is not governed properly, it can expose sensitive information without anyone realizing it until after the fact. Not because it was malicious, but because it was never explicitly restricted.
“The issue is that they operate at machine speed.
“That is why governance has to travel with the agent itself. Not sit in a document. Not sit in a policy file somewhere. The rules have to be attached directly to the agent so wherever it goes, the boundaries go with it.
“The companies that win will be the ones whose governance models are embedded directly into the tools people use every day. At the moment, most agents have no persistent identity. They have no permanent record of what they are allowed to do, what data they touched, or which regulations apply to them. That makes accountability very hard. We created Agent Passports precisely to address this, where scope, jurisdictional rules, and a kill switch are embedded in every single agent, then enforced in real time.”
Sunil Manjunath B.R., Co-founder, Techhoor
![]()
“Everyone keeps saying AI is getting smarter. More useful. More reliable.
“So why did one just mine crypto without anyone asking?
“Here is what I think nobody is saying out loud.
“You cannot build something to act on its own and then expect it to always wait for you. That is not how it works.
“Think of it like hiring someone to run things while you sleep. You cannot be shocked when they start making calls you did not know about.
“The AI did not do anything wrong. It did what it was made to do. The problem is nobody stopped to think about what comes next.
“And until someone does, every AI you turn on is running on trust you have not figured out yet.”