Will Elon Musk’s Next Grok Model Reach AGI?

Grok 4 has set a new bar in recent months and according to Greg Kamradt, President of the ARC Prize Foundation, two new submissions on the ARC-AGI benchmark used Grok 4 and scored higher than any other entry so far. One version scored 79.6% at $8.42 per task, while another version reached 29.4% at $30.40 per task. Both were open-source submissions that used Grok 4’s program synthesis methods with test-time adaptation.

The foundation asked the authors why they chose Grok 4. Their response was straightforward: “It was the best model I used in testing.” Elon Musk quickly pointed out that these results came from Grok 4 alone, hinting that Grok 5 could stretch the gap even further once training begins.

The strength of Grok 4 also showed in ARC-AGI V2, where it hit 15.9%, nearly double its closest rival. On competitive coding tests it reached scores above 79%, and on maths Olympiad problems Grok 4 Heavy was the first model to hit over 60%. Musk called the progress “crushingly good,” showing his pride in how far the model has gone.

 

What Makes Grok Different From Its Rivals?

 

Grok 4 was built using Colossus, a supercomputer with 200,000 GPUs. The system was trained with reinforcement learning scaled up to levels never seen before. According to xAI, this training was six times more efficient than past efforts and used far larger pools of verified data, from maths and coding to a wide mix of other areas.

The model also learned to use tools natively. Grok can run code, search the web in real time, and even dig through posts on X to find knowledge. In voice mode, Grok can see through a camera and describe what it sees. This tool-use training has helped the model answer complex questions with more precision.

Musk has placed Grok alongside Gemini and Claude as a contender, but Grok’s ARC-AGI performance has put it ahead. One benchmark dubbed “Humanity’s Last Exam” saw Grok 4 Heavy score 50.7%, a first for any model. Other tests such as USAMO 2025 and AIME’25 placed Grok well above rivals, with maths proof scores as high as 61.9% and competition maths at 96.7%.

 

 

What Has Musk Said About Grok 5?

 

Musk has made it clear that training for Grok 5 will begin soon. On X, he wrote, “Grok 5 starts training in a few weeks.” He later added, “I now think @xAI has a chance of reaching AGI with @Grok 5. Never thought that before.”

His comments mark a change in tone. Musk had spoken before about AI progress but had not directly said that Grok could reach human-level intelligence. His new posts show he now sees that as possible. He also promised the update before the end of the year, calling it “crushingly good.”

The optimism stems from how Grok 4 has already been used to lead benchmarks without major tweaks. Musk believes the new version will go even further, with training runs set to start before the end of 2025.

 

Could Grok 5 Cross Into AGI?

 

ARC Prize’s own numbers show how far Grok 4 has gone, but whether Grok 5 crosses into AGI depends on how it performs in reasoning and real-world tasks. At present, Grok 4 is closer to AGI benchmarks than any other model. As X Freeze wrote, “No other model even comes close and has not passed Grok 4 previous raw performance. Currently Grok is more closer to AGI than any other AI models.”

Musk’s belief in Grok 5 gives weight to the idea. But the question of AGI is still tied to tests outside benchmarks. AGI is defined as matching or beating human intelligence across a wide spread of domains, not just coding or maths. Whether Grok 5 can do that will only be known once it is trained and tested.