OpenAI Announces CriticGPT, A Tool That Finds ChatGPT Errors

OpenAI announced their new model, CriticGPT, which is made to identify errors in code for programmers. Part of the GPT-4 series, it’s designed to analyse responses generated by ChatGPT and help users find mistakes, mainly during the training process AI goes through.

GPTs are trained by “Reinforcement Learning from Human Feedback,” or RLHF for short. This method involves using human feedback to train the AI to become more human-like as it develops.

How RLHF Works In Training

Humans are referred to as “AI trainers.” In the process, humans review different responses the AI gives in response to their requests. The responses are then rated, and feedback is given depending on whether the response was good and accurate, or hallucinated its response.

A good example of this training would help illustrate this process if you’ve ever used ChatGPT. Sometimes, the AI will ask you whether its response was helpful, accurate, incomplete, or inaccurate, and that feedback is then taken in.

So, through constantly receiving that feedback, the AI learns over time by repeating the patterns or behaviours seen in the positive feedback. The opposite would then apply with negative feedback, where it would try to avoid those patterns.


What Is CriticGPT’s Purpose For AI?

The purpose of this tool is so that AI becomes more reliable, as time has shown the tech cannot always be trusted from an accuracy point of view. “As we make advances in reasoning and model behaviour, ChatGPT becomes more accurate and its mistakes become more subtle. This can make it hard for AI trainers to spot inaccuracies when they do occur, making the comparison task that powers RLHF much harder.

“This is a fundamental limitation of RLHF, and it may make it increasingly difficult to align models as they gradually become more knowledgeable than any person that could provide feedback,” explained OpenAI, on their announcement.

How Useful Is CriticGPT?

“We found that when people get help from CriticGPT to review ChatGPT code they outperform those without help 60% of the time. We are beginning the work to integrate CriticGPT-like models into our RLHF labeling pipeline, providing our trainers with explicit AI assistance,” said OpenAI.

Also, CriticGPT’s critiques were preferred over those generated by ChatGPT in 63% of cases. When there’s an error, CriticGPT will highlight the error and on the side, give a critique as to why it is an error.

Although the tool is trained to give precise and brief critiques, it is proving to be very useful. This would be something OpenAI works on changing over time, but for now, this is a very efficient solution to ChatGPT, known for making small errors.

“Sometimes real-world mistakes can be spread across many parts of an answer. Our work focuses on errors that can be pointed out in one place, but in the future, we need to tackle dispersed errors as well,” added OpenAI.

This means that for now, only one error at a time can be picked up. So, even though the tool isn’t 100% accurate as of yet, this could just be the right solution for the hallucination issue that recently came up.