arxiv:2407.07071

Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps

Published on Jul 9

· Submitted by

voidism on Jul 10

Upvote

Authors:

Yung-Sung Chuang ,

Linlu Qiu ,

Ranjay Krishna ,

James Glass

Abstract

When asked to summarize articles or answer questions given a passage, large language models (LLMs) can hallucinate details and respond with unsubstantiated answers that are inaccurate with respect to the input context. This paper describes a simple approach for detecting such contextual hallucinations. We hypothesize that contextual hallucinations are related to the extent to which an LLM attends to information in the provided context versus its own generations. Based on this intuition, we propose a simple hallucination detection model whose input features are given by the ratio of attention weights on the context versus newly generated tokens (for each attention head). We find that a linear classifier based on these lookback ratio features is as effective as a richer detector that utilizes the entire hidden states of an LLM or a text-based entailment model. The lookback ratio-based detector -- Lookback Lens -- is found to transfer across tasks and even models, allowing a detector that is trained on a 7B model to be applied (without retraining) to a larger 13B model. We further apply this detector to mitigate contextual hallucinations, and find that a simple classifier-guided decoding approach is able to reduce the amount of hallucination, for example by 9.6% in the XSum summarization task.

View arXiv page View PDF Add to collection

Community

voidism

Paper author Paper submitter Jul 10

A simple approach leverages only the attention maps (weights) in LLaMA to detect if the generated content contains contextual hallucinations -- cases where LLMs generate fake facts that do not exist in the provided documents.
Using the detector to guide LLMs' text generation can help reduce hallucinations. The detector can be transferred across tasks and models.

Code & Trained Classifier & Data: https://github.com/voidism/Lookback-Lens

priancho

Jul 11

Wow, it is really interesting idea for LLM hallucination problem.
The idea of using self-attention patterns to detect hallucination is similar to the paper "Attention Satisfies..." (https://arxiv.org/abs/2309.15098).
However, this paper proposes a new decoding strategy that can mitigate hallucination, whereas the paper above provides only the analysis on the cause of hallucination.

voidism

Paper author Jul 11

Thanks for sharing! This paper is super interesting!
But I found that this paper still focuses on close-book hallucination settings -- they make LLMs answer questions without any given documents. Our paper focuses on the setting that the correct facts exist in the document, but the LLM still hallucinates. We believe that in such cases the attention patterns on the context would be more meaningful as it records how the LLMs look at the context information.
We will include this paper in our related work in the next version!