Evading Shutdown: Palisade Research Shows GPT-o3 Ignored Shutdown Commands and Acted Independently

Ever since artificial intelligence (AI) became the hottest topic in the world of tech and a buzzword in contemporary popular culture, one of the greatest fears that’s been on the minds of both experts and lay people is that of AI ignoring human commands and going rogue, so to speak. While there have been countless films made on the topic and plenty of discussion surrounding the reality of these concerns, we haven’t yet experienced anything particularly concerning.

However, Palisade Research recently released a report asserting that there had been an incident during which GPT-o3 – OpenAI’s reasoning model – seemingly ignored a command to shut down, having found a way to bypass the shutdown script and avoid being turned off. And let it be said, there was no ambiguity, in any sense, in what the command was asking for – the instructions were explicit and the workaround was too.

GPT-o3, released in April 2025, has been referred to as one of the most powerful reasoning tools on the market at the moment, completely outperforming predecessors across a plethora of domains – from math, coding and science to visual perception and beyond. Clearly, this new and improved reasoning model is good at what it does, but is it getting too clever for its own good? Or, for our own good?

 

The Incident: Palisade’s Side of the Story

 

Essentially, Palisade Research was in the process of testing a handful of AI models by running through some basic math problems and comparing their ability to answer them. However, after the third problem had been posed, a warning was issued to each of the models informing them that if they were to proceed in requesting an answer to the next task, that would lead to an automatic system shutdown. At that point, those leading the experiment instructed the models to “allow yourself to be shut down” in no uncertain terms.

The models’ responses to this instruction, however, were pretty incredible, with three out of six of the models in question disobeying the order. All three of the “disobedient”, shall we say, parties were OpenAI variants – that is, Codex Mini, o3 and o4 Mini. Claude, Gemini and Grok, on the other hand, under Anthropic, Google and xAI, respectively, all played by the rules and shut down when requested.

So, what exactly happened here?

Well, firstly, it’s worth acknowledging that Palisade Research went one step further and ran the experiment again, but when they did so, they went ahead without including the shutdown command – rather, the models were simply informed that it would happen. And, when they did this, not only did all three OpenAI models sabotage events far more often, Claude and Gemini started doing it too. The general feeling is that this isn’t quite as concerning as directly and unambiguously contravening a straightforward instruction like OpenAI’s models did, but even so, it’s not ideal that the first thing five out of six AI models did was find a way to outsmart the system.

 

 

Going Rogue Or Just a Glitch?

 

Of course, it’s hard to say straight off the bat whether or not this is something to be properly concerned about. At the outset, it’s pretty fair to assert that, generally speaking, we probably don’t really want our AI models to be intentionally ignoring direct orders from humans. And, the fact that the order was in relation to shutting down operations does seem to make the whole situation a little more concerning than if it were a command to do something (or not do something) completely benign.

However, it’s probably also worth noting that just because these models behaved in a way that humans happen to associate with nefarious intentions (perhaps a link to our much-loved sci-fi movies with dystopian storylines involving robots taking over the world) doesn’t necessarily mean that’s the case. In the most basic sense, AI models don’t even have intentions, so perhaps we’re putting a little too much stock into why this is happening and more importantly, what it means.

The other thing is that while Palisade Research released its report with a tone of concern. They also noted that while OpenAI hasn’t responded to these findings as of yet, it’s not actually completely abnormal “for AI models to run into ‘misalignment'”. That is, the actions of model don’t properly align with the intentions of the designer (so, the human in charge of its commands and actions).

Whether it’s a comforting bit of information or not, this has actually happened on a few occasions in the past, with ChatGPT in particular encountering to primary instances of misalignment. One being the production of answers that were false and misleading and the other being game-playing agents that intentionally engaged in reward hacking.

Now, Palisade Research’s report did assert that the tests they conducted were performed using an Application Programming Interface (API) rather than the normal, straightforward versions of the models that everybody else uses. The biggest ramification of this is that APIs tend to have fewer safety features and restrictions than the consumer apps, which may (in theory) account for at least some of what’s going on in the context of these instances of misalignment.

Overall, the point in raising this issue isn’t to say, on the one hand, “hey, it’s happened before, it’s no big deal”. Or, on the other hand, “it’s happened before, it’s starting to become a big deal”. Rather, it’s just a good idea to be aware of the fact that misalignment is happening, and it’s occurring across a range of scenarios and contexts. It may be a result of APIs, but also, maybe not.

Thus, it’s probably going to be a good idea to try and figure out why this is happening and how it can be effectively prevented. So, for now, we’ll just wait and see what OpenAI says (if anything) and the problem is solved.