In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs
Miss
- URL: http://arxiv.org/abs/2402.10790v2
- Date: Wed, 21 Feb 2024 03:07:42 GMT
- Title: In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs
Miss
- Authors: Yuri Kuratov, Aydar Bulatov, Petr Anokhin, Dmitry Sorokin, Artyom
Sorokin, Mikhail Burtsev
- Abstract summary: We introduce BABILong, a new benchmark designed to assess model capabilities in extracting and processing distributed facts.
Fine-tuning GPT-2 with recurrent memory augmentations enables it to handle tasks involving up to $11times 106$ elements.
This achievement marks a substantial leap, as it is by far the longest input processed by any neural network model to date.
- Score: 4.8384738694883955
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the challenge of processing long documents using
generative transformer models. To evaluate different approaches, we introduce
BABILong, a new benchmark designed to assess model capabilities in extracting
and processing distributed facts within extensive texts. Our evaluation, which
includes benchmarks for GPT-4 and RAG, reveals that common methods are
effective only for sequences up to $10^4$ elements. In contrast, fine-tuning
GPT-2 with recurrent memory augmentations enables it to handle tasks involving
up to $11\times 10^6$ elements. This achievement marks a substantial leap, as
it is by far the longest input processed by any neural network model to date,
demonstrating a significant improvement in the processing capabilities for long
sequences.
Related papers
- Multi-stage Large Language Model Pipelines Can Outperform GPT-4o in Relevance Assessment [6.947361774195549]
We propose a modular classification pipeline that divides the relevance assessment task into multiple stages.
One of our approaches showed an 18.4% Krippendorff's $alpha$ accuracy increase over OpenAI's GPT-4o mini.
arXiv Detail & Related papers (2025-01-24T07:33:39Z) - DocMamba: Efficient Document Pre-training with State Space Model [56.84200017560988]
We present DocMamba, a novel framework based on the state space model.
It is designed to reduce computational complexity to linear while preserving global modeling capabilities.
Experiments on the HRDoc confirm DocMamba's potential for length extrapolation.
arXiv Detail & Related papers (2024-09-18T11:34:28Z) - Mini-Sequence Transformer: Optimizing Intermediate Memory for Long Sequences Training [78.93900796545523]
Mini-Sequence Transformer (MsT) is a methodology for highly efficient and accurate LLM training with extremely long sequences.
MsT partitions input sequences and iteratively processes mini-sequences to reduce intermediate memory usage.
integrated with the huggingface library, MsT successfully extends the maximum context length of Qwen, Mistral, and Gemma-2 by 12-24x.
arXiv Detail & Related papers (2024-07-22T01:52:30Z) - HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM
Inference [68.59839755875252]
HiRE comprises of two novel components: (i) a compression scheme to cheaply predict top-$k$ rows/columns with high recall, followed by full computation restricted to the predicted subset, and (ii) DA-TOP-$k$: an efficient multi-device approximate top-$k$ operator.
We demonstrate that on a one billion parameter model, HiRE applied to both the softmax as well as feedforward layers, achieves almost matching pretraining and downstream accuracy, and speeds up inference latency by $1.47times$ on a single TPUv5e device.
arXiv Detail & Related papers (2024-02-14T18:04:36Z) - M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models [58.54538318912159]
M4LE is a benchmark for evaluating the long-sequence capability of large language models (LLMs)
M4LE is based on a diverse NLP task pool comprising 36 NLP task types and 12 domains.
We conducted a systematic evaluation on 11 well-established LLMs, especially those optimized for long-sequence inputs.
arXiv Detail & Related papers (2023-10-30T03:11:30Z) - Abstractive Summarization as Augmentation for Document-Level Event
Detection [0.0]
We bridge the performance gap between shallow and deep models on document-level event detection by using abstractive text summarization as an augmentation method.
We use four decoding methods for text generation, namely beam search, top-k sampling, top-p sampling, and contrastive search.
Our results show that using the document title offers 2.04% and 3.19% absolute improvement in macro F1-score for linear SVM and RoBERTa, respectively.
arXiv Detail & Related papers (2023-05-29T11:28:26Z) - How Does Generative Retrieval Scale to Millions of Passages? [68.98628807288972]
We conduct the first empirical study of generative retrieval techniques across various corpus scales.
We scale generative retrieval to millions of passages with a corpus of 8.8M passages and evaluating model sizes up to 11B parameters.
While generative retrieval is competitive with state-of-the-art dual encoders on small corpora, scaling to millions of passages remains an important and unsolved challenge.
arXiv Detail & Related papers (2023-05-19T17:33:38Z) - MuLD: The Multitask Long Document Benchmark [4.835289158553091]
We present a new long document benchmark consisting of only documents over 10,000 tokens.
We show that models with increased context length are better able to solve the tasks presented.
arXiv Detail & Related papers (2022-02-15T12:42:55Z) - SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval [11.38022203865326]
SPLADE model provides highly sparse representations and competitive results with respect to state-of-the-art dense and sparse approaches.
We modify the pooling mechanism, benchmark a model solely based on document expansion, and introduce models trained with distillation.
Overall, SPLADE is considerably improved with more than $9$% gains on NDCG@10 on TREC DL 2019, leading to state-of-the-art results on the BEIR benchmark.
arXiv Detail & Related papers (2021-09-21T10:43:42Z) - Pre-training Tasks for Embedding-based Large-scale Retrieval [68.01167604281578]
We consider the large-scale query-document retrieval problem.
Given a query (e.g., a question), return the set of relevant documents from a large document corpus.
We show that the key ingredient of learning a strong embedding-based Transformer model is the set of pre-training tasks.
arXiv Detail & Related papers (2020-02-10T16:44:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.