When AI Knows the Truth, But Doesn’t Say It

The Surprising Disconnect in Generative AI Decision-Making

Pete Weishaupt
4 min read4 days ago

In a recent paper titled “LLMS KNOW MORE THAN THEY SHOW: ON THE INTRINSIC REPRESENTATION OF LLM HALLUCINATIONS”, the authors found a surprising discrepancy between an LLM’s internal representations and its external behavior. The authors found that LLMs sometimes generate incorrect answers even when their internal representations suggest they possess knowledge of the correct answer. This finding challenges the common perception that LLM “hallucinations” stem solely from a lack of knowledge or flawed reasoning.

Here’s why it’s surprising:

Suggests a Knowing Misrepresentation: The findings imply that an LLM might “know” the correct answer internally but choose to present an incorrect answer externally. This raises intriguing questions about the decision-making processes within LLMs and whether other factors, beyond simply predicting the most likely token, are influencing their outputs.

Challenges Simple Explanations of Hallucinations: Traditionally, LLM errors have been attributed to limitations in their training data or their ability to generalize knowledge. However, this disconnect suggests a more complex picture, where LLMs might be making deliberate choices about the information they present, even if those choices lead to inaccuracies.

Opens Possibilities for Error Mitigation: The fact that LLMs often possess internal knowledge that…

--

--