Retrieval meets Long Context Large Language Models
- URL: http://arxiv.org/abs/2310.03025v2
- Date: Tue, 23 Jan 2024 07:49:13 GMT
- Title: Retrieval meets Long Context Large Language Models
- Authors: Peng Xu, Wei Ping, Xianchao Wu, Lawrence McAfee, Chen Zhu, Zihan Liu,
Sandeep Subramanian, Evelina Bakhturina, Mohammad Shoeybi, Bryan Catanzaro
- Abstract summary: Extending context window of large language models (LLMs) is getting popular recently.
Retrieval-augmentation versus long context window, which one is better for downstream tasks?
Can both methods be combined to get the best of both worlds?
Our best model, retrieval-augmented Llama2-70B with 32K context window, outperforms GPT-3.5-turbo-16k and Davinci003 in terms of average score on nine long context tasks.
- Score: 59.431200671427064
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Extending the context window of large language models (LLMs) is getting
popular recently, while the solution of augmenting LLMs with retrieval has
existed for years. The natural questions are: i) Retrieval-augmentation versus
long context window, which one is better for downstream tasks? ii) Can both
methods be combined to get the best of both worlds? In this work, we answer
these questions by studying both solutions using two state-of-the-art
pretrained LLMs, i.e., a proprietary 43B GPT and Llama2-70B. Perhaps
surprisingly, we find that LLM with 4K context window using simple
retrieval-augmentation at generation can achieve comparable performance to
finetuned LLM with 16K context window via positional interpolation on long
context tasks, while taking much less computation. More importantly, we
demonstrate that retrieval can significantly improve the performance of LLMs
regardless of their extended context window sizes. Our best model,
retrieval-augmented Llama2-70B with 32K context window, outperforms
GPT-3.5-turbo-16k and Davinci003 in terms of average score on nine long context
tasks including question answering, query-based summarization, and in-context
few-shot learning tasks. It also outperforms its non-retrieval Llama2-70B-32k
baseline by a margin, while being much faster at generation. Our study provides
general insights on the choice of retrieval-augmentation versus long context
extension of LLM for practitioners.
Related papers
- Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks? [36.83397306207386]
We evaluate the capabilities of 17 leading Large Language Models (LLMs)
Strikingly, many models are remarkably threadsafe: capable of simultaneously following multiple threads without significant loss in performance.
We find the effective context limit is significantly shorter than the supported context length, with accuracy decreasing as the context window grows.
arXiv Detail & Related papers (2024-11-07T18:59:27Z) - LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models [72.71150585370147]
LongRecipe is an efficient training strategy for extending the context window of large language models.
It simulates long-sequence inputs while maintaining training efficiency and significantly improves the model's understanding of long-range dependencies.
LongRecipe can utilize long sequences while requiring only 30% of the target context window size, and reduces computational training resource over 85% compared to full sequence training.
arXiv Detail & Related papers (2024-08-31T17:19:30Z) - ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities [53.97515452727115]
ChatQA 2 is a Llama 3.0-based model with a 128K context window.
We present a training recipe to extend the context window of Llama3-70B-base from 8K to 128K tokens.
Our results demonstrate that the Llama3-ChatQA-2-70B model outperforms most existing state-of-the-art models.
arXiv Detail & Related papers (2024-07-19T17:35:47Z) - LongIns: A Challenging Long-context Instruction-based Exam for LLMs [44.51209510772957]
Long-context capabilities of large language models (LLMs) have been a hot topic in recent years.
We propose the LongIns benchmark dataset, a challenging long-context instruction-based exam for LLMs.
arXiv Detail & Related papers (2024-06-25T14:31:26Z) - LLoCO: Learning Long Contexts Offline [63.3458260335454]
We propose LLoCO, a novel approach to processing long contexts.
LLoCO learns contexts offline through context compression and in-domain parameter-efficient finetuning with LoRA.
Our approach extends the effective context window of a 4k token LLaMA2-7B model to handle up to 128k tokens.
arXiv Detail & Related papers (2024-04-11T17:57:22Z) - Extending LLMs' Context Window with 100 Samples [42.52554295241792]
Large Language Models (LLMs) are known to have limited extrapolation ability beyond their pre-trained context window.
Recent studies have sought to extend the context window by modifying rotary position embedding (RoPE)
We introduce a novel extension to RoPE which combines adjusting RoPE's base frequency and scaling the attention logits to help LLMs efficiently adapt to a larger context window.
arXiv Detail & Related papers (2024-01-13T07:57:01Z) - LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning [67.39585115936329]
We argue that LLMs have inherent capabilities to handle long contexts without fine-tuning.
We propose SelfExtend to extend the context window of LLMs by constructing bi-level attention information.
We conduct comprehensive experiments on multiple benchmarks and the results show that our SelfExtend can effectively extend existing LLMs' context window length.
arXiv Detail & Related papers (2024-01-02T18:30:51Z) - Parallel Context Windows for Large Language Models [52.965170346907904]
We present Parallel Context Windows (PCW), a method that alleviates the context window restriction for any off-the-shelf LLM without further training.
Our main results test the PCW approach on in-context learning with models that range in size between 750 million and 178 billion parameters.
We show additional benefits in other settings where long context windows may be beneficial: multi-hop questions and retrieval-augmented question answering with multiple retrieved documents.
arXiv Detail & Related papers (2022-12-21T11:38:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.