Papers
arxiv:2407.07071

Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps

Published on Jul 9
· Submitted by voidism on Jul 10
Authors:
,
,

Abstract

When asked to summarize articles or answer questions given a passage, large language models (LLMs) can hallucinate details and respond with unsubstantiated answers that are inaccurate with respect to the input context. This paper describes a simple approach for detecting such contextual hallucinations. We hypothesize that contextual hallucinations are related to the extent to which an LLM attends to information in the provided context versus its own generations. Based on this intuition, we propose a simple hallucination detection model whose input features are given by the ratio of attention weights on the context versus newly generated tokens (for each attention head). We find that a linear classifier based on these lookback ratio features is as effective as a richer detector that utilizes the entire hidden states of an LLM or a text-based entailment model. The lookback ratio-based detector -- Lookback Lens -- is found to transfer across tasks and even models, allowing a detector that is trained on a 7B model to be applied (without retraining) to a larger 13B model. We further apply this detector to mitigate contextual hallucinations, and find that a simple classifier-guided decoding approach is able to reduce the amount of hallucination, for example by 9.6% in the XSum summarization task.

Community

Paper author Paper submitter
  • A simple approach leverages only the attention maps (weights) in LLaMA to detect if the generated content contains contextual hallucinations -- cases where LLMs generate fake facts that do not exist in the provided documents.
  • Using the detector to guide LLMs' text generation can help reduce hallucinations. The detector can be transferred across tasks and models.

Code & Trained Classifier & Data: https://github.com/voidism/Lookback-Lens

Wow, it is really interesting idea for LLM hallucination problem.
The idea of using self-attention patterns to detect hallucination is similar to the paper "Attention Satisfies..." (https://arxiv.org/abs/2309.15098).
However, this paper proposes a new decoding strategy that can mitigate hallucination, whereas the paper above provides only the analysis on the cause of hallucination.

·
Paper author

Thanks for sharing! This paper is super interesting!
But I found that this paper still focuses on close-book hallucination settings -- they make LLMs answer questions without any given documents. Our paper focuses on the setting that the correct facts exist in the document, but the LLM still hallucinates. We believe that in such cases the attention patterns on the context would be more meaningful as it records how the LLMs look at the context information.
We will include this paper in our related work in the next version!

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2407.07071 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2407.07071 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2407.07071 in a Space README.md to link it from this page.

Collections including this paper 6