Member-only story

When AI Knows the Truth, But Doesn’t Say It

The Surprising Disconnect in Generative AI Decision-Making

4 min readOct 9, 2024

In a recent paper titled “LLMS KNOW MORE THAN THEY SHOW: ON THE INTRINSIC REPRESENTATION OF LLM HALLUCINATIONS”, the authors found a surprising discrepancy between an LLM’s internal representations and its external behavior. The authors found that LLMs sometimes generate incorrect answers even when their internal representations suggest they possess knowledge of the correct answer. This finding challenges the common perception that LLM “hallucinations” stem solely from a lack of knowledge or flawed reasoning.

Here’s why it’s surprising:

● Suggests a Knowing Misrepresentation: The findings imply that an LLM might “know” the correct answer internally but choose to present an incorrect answer externally. This raises intriguing questions about the decision-making processes within LLMs and whether other factors, beyond simply predicting the most likely token, are influencing their outputs.

● Challenges Simple Explanations of Hallucinations: Traditionally, LLM errors have been attributed to limitations in their training data or their ability to generalize knowledge. However, this disconnect suggests a more complex picture, where LLMs might be making deliberate choices about the information they present, even if those choices lead to inaccuracies.

● Opens Possibilities for Error Mitigation: The fact that LLMs often possess internal knowledge that is not reflected in their outputs offers a promising avenue for improving their accuracy. If we can develop decoding strategies that prioritize this internal knowledge, we could potentially reduce hallucinations and enhance the reliability of LLMs.

The sources propose that factors like fluency or grammatical correctness might be taking precedence over factual accuracy during the generation process.

This highlights the importance of investigating alternative decoding methods that can leverage the LLM’s internal truthfulness signals to produce more accurate responses

In Depth

Large language models (LLMs) are known for their impressive ability to generate human-like text. But they are prone to making errors, sometimes generating information that is factually incorrect or illogical. These errors are often referred to as “hallucinations.”

When AI Knows the Truth, But Doesn’t Say It

The Surprising Disconnect in Generative AI Decision-Making

Written by Pete Weishaupt

No responses yet