Papers
arxiv:2404.07979

LLoCO: Learning Long Contexts Offline

Published on Apr 11
· Submitted by akhaliq on Apr 12

Abstract

Processing long contexts remains a challenge for large language models (LLMs) due to the quadratic computational and memory overhead of the self-attention mechanism and the substantial KV cache sizes during generation. We propose a novel approach to address this problem by learning contexts offline through context compression and in-domain parameter-efficient finetuning. Our method enables an LLM to create a concise representation of the original context and efficiently retrieve relevant information to answer questions accurately. We introduce LLoCO, a technique that combines context compression, retrieval, and parameter-efficient finetuning using LoRA. Our approach extends the effective context window of a 4k token LLaMA2-7B model to handle up to 128k tokens. We evaluate our approach on several long-context question-answering datasets, demonstrating that LLoCO significantly outperforms in-context learning while using 30times fewer tokens during inference. LLoCO achieves up to 7.62times speed-up and substantially reduces the cost of long document question answering, making it a promising solution for efficient long context processing. Our code is publicly available at https://github.com/jeffreysijuntan/lloco.

Community

So to do QA on a book:

  1. Summarise/Compress the book using a separate LLM
  2. Store it in a vector database
  3. Generate the answers to all the questions that you want to ask
  4. Finetune it
  5. Voila. You can now ask it questions...

It's a bit cumbersome, and for the use-case proscribed, it defeats it's own purpose (You have have to generate the QA pairs! In the real world, these don't exist yet, hence the reason for doing the QA in the first place)

I'm sure what you've built works great in certain circumstances (books like the Bible) but for real world on the fly use cases (newly released books, legal texts, confidential data etc) this is cracking a nut with a sledgehammer, only to find you already had a pocketful of cracked nuts.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 5

Browse 5 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2404.07979 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2404.07979 in a Space README.md to link it from this page.

Collections including this paper 8