Why Is AI Overly Confident In Giving Incorrect Responses?

A TikTok user recently took to the platform a very strange interaction with the famous ChatGPT. On the chat, she asks the AI how many “R”s are in the word “strawberry”, and with confidence, the AI answers “2”. The user goes as far as asking the bot to spell out the word and count the amount, but with no success, the AI goes back to its original answer, even apologising for making the mistake of saying “3” when counting the spelled out word.

Although the interaction was humorous, it raises a huge concern on how much more information is very confidently answered, but completely incorrect. The University of Marlyand explains it perfectly in a guide, saying, “As of 2023, a typical AI model isn’t assessing whether the information it provides is correct. Its goal when it receives a prompt is to generate what it thinks is the most likely string of words to answer that prompt. Sometimes this results in a correct answer, but sometimes it doesn’t – and the AI cannot interpret or distinguish between the two. It’s up to you to make the distinction.”

 

Why Does AI Do This?

 

The question as to why AI answers confidently wrong has been discussed across the net, and on a Reddit post, users debate the reasoning, with one saying, “ChatGPT is just a machine without consciousness or confidence, and when it gives a wrong answer, it’s not being confidently wrong; it’s just wrong.” Another user argued that even though ChatGPT creates answers based on learning patterns from large amounts of data, its responses sometimes appear confident which can mislead users into accepting incorrect information.

 

Meet the “Thermometer” Calibration Tool

 

To solve this problem, researchers at MIT, and the MIT-IBM Watson AI Lab have announced another new tool called “Thermometer.” This tool helps with the issue of large language models being too confident in their wrong answers or too doubtful of their right ones. Thermometer stands out because it can work across many different tasks, giving it a wide application range.

 

 

Better Or Worse Than Other Modes?

 

Thermometer uses less computing power than other methods. It also has a smaller model that works alongside the main LLM to adjust its confidence accurately. This method make sure that the LLM keeps its accuracy while being flexible enough to handle tasks it has never seen before. Temperature scaling is used here to match the model’s confidence with how correct its answers are.

Thermometer’s side model is initially trained across a few datasets but can adapt to new scenarios without needing new data. So if trained on algebra and medical question datasets, it can later tune an LLM to respond accurately to queries in geometry or biology.

 

What Are Social Media Users Saying?

 

On Threads, koumouz said, “I find it pretty funny that we’re all super concerned that large language models, when posed with a question they don’t know the answer to, make things up rather than state they don’t know.
And yet this is precisely the behaviour of many (most?) social media influencers and presumed experts.”

This point brings up the main thing when it comes to AI, and that is: fact-checking is so important when interacting with these technologies and tools, to take away the issue of misleading responses. Combining this, and tools such as those that MIT have developed, will surely help with the misinformation concern many have with AI.