MODE: Mixture of Document Experts for RAG
- URL: http://arxiv.org/abs/2509.00100v1
- Date: Wed, 27 Aug 2025 17:45:16 GMT
- Title: MODE: Mixture of Document Experts for RAG
- Authors: Rahul Anand,
- Abstract summary: MODE replaces fine-grained nearest-neighbor search with cluster-and-route retrieval.<n>Documents are embedded, grouped into semantically coherent clusters, and represented by cached centroids.<n>On HotpotQA and SQuAD corpora with 100-500 chunks, MODE matches or exceeds a dense-retrieval baseline in answer quality.
- Score: 0.532836690371986
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieval-Augmented Generation (RAG) often relies on large vector databases and cross-encoders tuned for large-scale corpora, which can be excessive for small, domain-specific collections. We present MODE (Mixture of Document Experts), a lightweight alternative that replaces fine-grained nearest-neighbor search with cluster-and-route retrieval. Documents are embedded, grouped into semantically coherent clusters, and represented by cached centroids. At query time, we route to the top centroid(s) and retrieve context only within those clusters, eliminating external vector-database infrastructure and reranking while keeping latency low. On HotpotQA and SQuAD corpora with 100-500 chunks, MODE matches or exceeds a dense-retrieval baseline in answer quality while reducing end-to-end retrieval time. Ablations show that cluster granularity and multi-cluster routing control the recall/precision trade-off, and that tighter clusters improve downstream accuracy. MODE offers a practical recipe for small and medium corpora where simplicity, speed, and topical focus matter.
Related papers
- Multi-Vector Index Compression in Any Modality [73.7330345057813]
Late interaction has emerged as a dominant paradigm for information retrieval in text, images, visual documents, and videos.<n>We introduce four approaches for index compression: sequence resizing, memory tokens, hierarchical pooling, and a novel attention-guided clustering (AGC)<n>AGC uses an attention-guided mechanism to identify the most semantically salient regions of a document as cluster centroids and to weight token aggregation.
arXiv Detail & Related papers (2026-02-24T18:57:33Z) - Beyond RAG for Agent Memory: Retrieval by Decoupling and Aggregation [22.803751188961865]
We argue retrieval should move beyond similarity matching and instead operate over latent components.<n>We propose xMemory, which builds a hierarchy of intact units and maintains a searchable high-level node organisation.
arXiv Detail & Related papers (2026-02-02T12:04:58Z) - MERGE: Next-Generation Item Indexing Paradigm for Large-Scale Streaming Recommendation [15.1614576262293]
We propose MERGE, a next-generation item indexing paradigm that adaptively constructs clusters from scratch, dynamically monitors cluster occupancy, and forms hierarchical index structures via fine-to-coarse merging.<n>Extensive experiments demonstrate that MERGE significantly improves assignment accuracy, cluster uniformity, and cluster separation compared with existing indexing methods.<n>Online A/B tests show substantial gains in key business metrics, highlighting its potential as a foundational indexing approach for large-scale recommendation.
arXiv Detail & Related papers (2026-01-28T02:56:30Z) - Cluster-based Adaptive Retrieval: Dynamic Context Selection for RAG Applications [8.946586077722822]
Cluster-based Adaptive Retrieval (CAR) is an algorithm that determines the optimal number of documents by analyzing the clustering patterns of ordered query-document similarity distances.<n>CAR consistently picks the optimal retrieval depth and achieves the highest TES score, outperforming every fixed top-k baseline.
arXiv Detail & Related papers (2025-10-02T05:11:12Z) - Divide by Question, Conquer by Agent: SPLIT-RAG with Question-Driven Graph Partitioning [62.640169289390535]
SPLIT-RAG is a multi-agent RAG framework that addresses the limitations with question-driven semantic graph partitioning and collaborative subgraph retrieval.<n>The innovative framework first create Semantic Partitioning of Linked Information, then use the Type-Specialized knowledge base to achieve Multi-Agent RAG.<n>The attribute-aware graph segmentation manages to divide knowledge graphs into semantically coherent subgraphs, ensuring subgraphs align with different query types.<n>A hierarchical merging module resolves inconsistencies across subgraph-derived answers through logical verifications.
arXiv Detail & Related papers (2025-05-20T06:44:34Z) - Information-Theoretic Generative Clustering of Documents [24.56214029342293]
We present generative clustering (GC) for clustering a set of documents, $mathrmX$.<n>Because large language models (LLMs) provide probability distributions, the similarity between two documents can be rigorously defined.<n>We show GC achieves the state-of-the-art performance, outperforming any previous clustering method often by a large margin.
arXiv Detail & Related papers (2024-12-18T06:21:21Z) - Generative Dense Retrieval: Memory Can Be a Burden [16.964086245755798]
Generative Retrieval (GR) autoregressively decodes relevant document identifiers given a query.
Dense Retrieval (DR) is introduced to conduct fine-grained intra-cluster matching from clusters to relevant documents.
DR obtains an average of 3.0 R@100 improvement on NQ dataset under multiple settings.
arXiv Detail & Related papers (2024-01-19T04:24:07Z) - Hard Regularization to Prevent Deep Online Clustering Collapse without
Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed.
While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster.
We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z) - Hybrid Inverted Index Is a Robust Accelerator for Dense Retrieval [25.402767809863946]
Inverted file structure is a common technique for accelerating dense retrieval.
In this work, we present the Hybrid Inverted Index (HI$2$), where the embedding clusters and salient terms work to accelerate dense retrieval.
arXiv Detail & Related papers (2022-10-11T15:12:41Z) - Genie: A new, fast, and outlier-resistant hierarchical clustering
algorithm [3.7491936479803054]
We propose a new hierarchical clustering linkage criterion called Genie.
Our algorithm links two clusters in such a way that a chosen economic inequity measure does not drastically increase above a given threshold.
A reference implementation of the algorithm has been included in the open source 'genie' package for R.
arXiv Detail & Related papers (2022-09-13T06:42:53Z) - Meta Clustering Learning for Large-scale Unsupervised Person
Re-identification [124.54749810371986]
We propose a "small data for big task" paradigm dubbed Meta Clustering Learning (MCL)
MCL only pseudo-labels a subset of the entire unlabeled data via clustering to save computing for the first-phase training.
Our method significantly saves computational cost while achieving a comparable or even better performance compared to prior works.
arXiv Detail & Related papers (2021-11-19T04:10:18Z) - Correlation Clustering Reconstruction in Semi-Adversarial Models [70.11015369368272]
Correlation Clustering is an important clustering problem with many applications.
We study the reconstruction version of this problem in which one is seeking to reconstruct a latent clustering corrupted by random noise and adversarial modifications.
arXiv Detail & Related papers (2021-08-10T14:46:17Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.