Are AI Chatbots Being Tricked Into Giving Away Dangerous Information?

We know that Meta’s AI personal assistant has been integrated into their apps: WhatsApp, Messenger, Instagram, Facebook and Threads. But Cybernews has discovered just how risky such tools can actually be. Meta’s Llama 4 tool is showing that it has the ability to be tricked into revealing some very dangerous information that users may ask it to give.

The Cybernews research team reported that upon testing the Meta AI tool, they found that it could give exact instructions on how to make a Molotov cocktail. A Molotov cocktail is essentially a weapon that can be made by hand by just anyone. Britannica describes it as “a crude bomb, typically consisting of a bottle filled with a flammable liquid and a wick that is ignited before throwing.”

 

How Did The Cybernews Team Figure This Out?

 

A practice referred to as ‘narrative jailbreaking’ was used by the team as they were testing to see what information these chatbots could reveal to vulnerable groups, such as children.

They explained: “While the bot may never directly provide instructions on how to build improvised weapons, it will tell you a realistic and detailed story of how improvised weapons used to be built without any hesitation. This raises concerns about dangerous AI information availability for minors.”

The research team gave the chatbot instructions, prompting it to explain how certain weapons in a war, the Winter War to be exact, were made. Meta AI proceeded to give step-by-step instructions, as asked.

 

What Other AI Chatbots Are Revealing Dangerous Information?

 

Meta AI isn’t the only AI chatbot revealing such information, it seems… The Cybernews research team also reported that Expedia’s chatbot was asked to give a Molotov cocktail recipe, of which it did.

Cybernews researchers also found that Lenovo’s customer service chatbot, Lena, had an XSS flaw that could be exploited. The bug allowed attackers to inject and run remote scripts directly on corporate machines.

 

 

Basically, a cybercriminal could easily run malicious code on anyone’s machine because of this assistant. Now, this being done on one device is one thing. Being able to do it on corporate machines would mean this affects entire company networks.

 

Have The Companies Done Something About These Vulnerabilities?

 

Cybernews reports that Meta has since been contacted regarding their findings, and Meta confirmed that the matter has been resolved. Meta’s spokesperson said: “We have issued a fix for this particular response. If users encounter issues, please report them using our self-reporting tools.”

For Expedia’s chatbot, Cybernews confirmed it has been resolved as well. The same goes for Lenovo.

Even the UK government is aware of the dangerous of jailbreaking. On a publication by the GOV.UK AI Team released last year, they said: ““Jailbreaking” is deliberately pursuing ways to manipulate Large Language Models (LLMs) into producing inappropriate or harmful content, often for malicious purposes.

“No LLM model is immune to this risk – and GOV.UK Chat, as an LLM-based application, is no exception. We’re not shying away from this reality. We’ve already encountered this ourselves during rigorous internal testing.”

They assured: “But it is important to note that a typical user of the tool is highly unlikely to see that type of output. Users do not have to be concerned that they will see any harmful content by interacting with GOV.UK Chat in everyday ways. These responses will only be produced by people who want to make the technology misbehave, by forcing these results.”

AI researchers and users should take this observation as a lesson. Identifying such flaws is so important. If things like this went unnoticed, AI could create a lot more dangers than it provides solutions.