Mixture-of-PageRanks: Replacing Long-Context with Real-Time, Sparse GraphRAG
- URL: http://arxiv.org/abs/2412.06078v1
- Date: Sun, 08 Dec 2024 21:55:12 GMT
- Title: Mixture-of-PageRanks: Replacing Long-Context with Real-Time, Sparse GraphRAG
- Authors: Nicholas Alonso, Beren Millidge,
- Abstract summary: We develop an algorithm based on PageRank, a graph-based retrieval algorithm, which we call mixture-of-PageRanks (MixPR)
MixPR uses a mixture of PageRank-based graph-retrieval algorithms implemented using sparse matrices for efficent, cheap retrieval.
Our retriever achieves state-of-the-art results across a wide range of long-context benchmark tasks.
- Score: 7.8553071988266385
- License:
- Abstract: Recent advances have extended the context window of frontier LLMs dramatically, from a few thousand tokens up to millions, enabling entire books and codebases to fit into context. However, the compute costs of inferencing long-context LLMs are massive and often prohibitive in practice. RAG offers an efficient and effective alternative: retrieve and process only the subset of the context most important for the current task. Although promising, recent work applying RAG to long-context tasks has two core limitations: 1) there has been little focus on making the RAG pipeline compute efficient, and 2) such works only test on simple QA tasks, and their performance on more challenging tasks is unclear. To address this, we develop an algorithm based on PageRank, a graph-based retrieval algorithm, which we call mixture-of-PageRanks (MixPR). MixPR uses a mixture of PageRank-based graph-retrieval algorithms implemented using sparse matrices for efficent, cheap retrieval that can deal with a variety of complex tasks. Our MixPR retriever achieves state-of-the-art results across a wide range of long-context benchmark tasks, outperforming both existing RAG methods, specialized retrieval architectures, and long-context LLMs despite being far more compute efficient. Due to using sparse embeddings, our retriever is extremely compute efficient, capable of embedding and retrieving millions of tokens within a few seconds and runs entirely on CPU.
Related papers
- Emulating Retrieval Augmented Generation via Prompt Engineering for Enhanced Long Context Comprehension in LLMs [23.960451986662996]
This paper proposes a method that emulates Retrieval Augmented Generation (RAG) through specialized prompt engineering and chain-of-thought reasoning.
We evaluate our approach on selected tasks from BABILong, which interleaves standard bAbI QA problems with large amounts of distractor text.
arXiv Detail & Related papers (2025-02-18T02:49:40Z) - Does RAG Really Perform Bad For Long-Context Processing? [15.889864680212147]
RetroLM is a novel framework for long-context processing.
Unlike traditional methods, RetroLM employs KV-level retrieval augmentation.
Building on this framework, we develop a specialized retriever for precise retrieval of critical pages.
arXiv Detail & Related papers (2025-02-17T05:02:25Z) - LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs - No Silver Bullet for LC or RAG Routing [70.35888047551643]
We present LaRA, a novel benchmark specifically designed to rigorously compare RAG and LC LLMs.
LaRA encompasses 2,326 test cases across four practical QA task categories and three types of naturally occurring long texts.
We find that the optimal choice between RAG and LC depends on a complex interplay of factors, including the model's parameter size, long-text capabilities, context length, task type, and the characteristics of the retrieved chunks.
arXiv Detail & Related papers (2025-02-14T08:04:22Z) - An Effective Framework to Help Large Language Models Handle Numeric-involved Long-context Tasks [0.0]
Large Language Models (LLMs) have demonstrated remarkable capabilities in handling long texts.
Their performance significantly degrades when it comes to numerical calculations in the long-context.
We propose a workflow which decomposes a numeric-involved long-context task into 4 low-level subtasks.
The results in 2 numeric-involved long-context benchmarks demonstrate our workflow can not only improve accuracy, but also significantly reduce the cost of API calls.
arXiv Detail & Related papers (2024-11-15T12:39:02Z) - GARLIC: LLM-Guided Dynamic Progress Control with Hierarchical Weighted Graph for Long Document QA [16.945257645760428]
In the past, Retrieval-Augmented Generation (RAG) methods split text into chunks to enable language models to handle long documents.
Recent tree-based RAG methods are able to retrieve detailed information while preserving global context.
We propose a new retrieval method, called LLM-Guided Dynamic Progress Control with Hierarchical Weighted Graph (GARLIC)
arXiv Detail & Related papers (2024-10-07T07:02:09Z) - You Only Use Reactive Attention Slice For Long Context Retrieval [33.712515776334016]
Supporting longer context for Large Language Models (LLM) is a promising direction to advance LLMs.
We propose an attention-based retrieval technique, You Only Use Reactive Attention slice (YOURA)
Our technique achieves up to 30% vLLM inference throughput improvement for serving long-context queries.
arXiv Detail & Related papers (2024-09-03T15:30:57Z) - RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation [54.707460684650584]
Large Language Models (LLMs) demonstrate human-level capabilities in dialogue, reasoning, and knowledge retention.
Current research addresses this bottleneck by equipping LLMs with external knowledge, a technique known as Retrieval Augmented Generation (RAG)
RAGLAB is a modular and research-oriented open-source library that reproduces 6 existing algorithms and provides a comprehensive ecosystem for investigating RAG algorithms.
arXiv Detail & Related papers (2024-08-21T07:20:48Z) - KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches [52.02764371205856]
Long context capability is a crucial competency for large language models (LLMs)
This work provides a taxonomy of current methods and evaluating 10+ state-of-the-art approaches across seven categories of long context tasks.
arXiv Detail & Related papers (2024-07-01T17:59:47Z) - Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs [61.40047491337793]
We present Hierarchical cOntext MERging (HOMER), a new training-free scheme designed to overcome the limitations of large language models.
HomeR uses a divide-and-conquer algorithm, dividing long inputs into manageable chunks.
A token reduction technique precedes each merging, ensuring memory usage efficiency.
arXiv Detail & Related papers (2024-04-16T06:34:08Z) - Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control [66.78146440275093]
Learned retrieval (LSR) is a family of neural methods that encode queries and documents into sparse lexical vectors.
We explore the application of LSR to the multi-modal domain, with a focus on text-image retrieval.
Current approaches like LexLIP and STAIR require complex multi-step training on massive datasets.
Our proposed approach efficiently transforms dense vectors from a frozen dense model into sparse lexical vectors.
arXiv Detail & Related papers (2024-02-27T14:21:56Z) - Reinforcement Learning for Branch-and-Bound Optimisation using
Retrospective Trajectories [72.15369769265398]
Machine learning has emerged as a promising paradigm for branching.
We propose retro branching; a simple yet effective approach to RL for branching.
We outperform the current state-of-the-art RL branching algorithm by 3-5x and come within 20% of the best IL method's performance on MILPs with 500 constraints and 1000 variables.
arXiv Detail & Related papers (2022-05-28T06:08:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.