Papers
arxiv:2407.06071

From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty

Published on Jul 8
· Submitted by Mivg on Jul 10

Abstract

Large language models (LLMs) often exhibit undesirable behaviors, such as hallucinations and sequence repetitions. We propose to view these behaviors as fallbacks that models exhibit under uncertainty, and investigate the connection between them. We categorize fallback behaviors -- sequence repetitions, degenerate text, and hallucinations -- and extensively analyze them in models from the same family that differ by the amount of pretraining tokens, parameter count, or the inclusion of instruction-following training. Our experiments reveal a clear and consistent ordering of fallback behaviors, across all these axes: the more advanced an LLM is (i.e., trained on more tokens, has more parameters, or instruction-tuned), its fallback behavior shifts from sequence repetitions, to degenerate text, and then to hallucinations. Moreover, the same ordering is observed throughout a single generation, even for the best-performing models; as uncertainty increases, models shift from generating hallucinations to producing degenerate text and then sequence repetitions. Lastly, we demonstrate that while common decoding techniques, such as random sampling, might alleviate some unwanted behaviors like sequence repetitions, they increase harder-to-detect hallucinations.

Community

Paper author Paper submitter

What do LLMs do when they are uncertain? We found that the stronger the LLM, the more it hallucinates and the less it loops! This pattern extends to sampling methods and instruction tuning.

We categorize fallback behaviors into types: sequence repetitions, degenerate text, and hallucinations. By pushing models towards uncertainty, we analyze their emergence across different model sizes, architectures, pretraining token counts, and instruction-following training.

im1.png

We find that the more advanced an LLM is (more parameters, longer pretraining, or instruction-tuning), the more complex its fallback behaviors, shifting from sequence repetitions to degenerate text and then to hallucinations.

im2.png

im3.png

im4.png

Even the best-performing models show this order within a single generation. As they try to recall more facts about a topic, they move from generating hallucinations to degenerate text, then to sequence repetitions.

im5.png

Interestingly, common decoding techniques like random temperature sampling can reduce some behaviors (like sequence repetitions) but increase harder-to-detect hallucinations.

im6.png

We also find evidence that this shift is continuous, with models becoming more degenerate with generation length, as measured by proportion of unique tokens in the sequence and compared to human baseline on the same topics.

im7.png

·

Congrats @Mivg on this work! Are you planning to upload the dataset to the hub?

If so, here's how to link it to this paper: https://huggingface.co/docs/hub/en/datasets-cards#linking-a-paper

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2407.06071 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2407.06071 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2407.06071 in a Space README.md to link it from this page.

Collections including this paper 2