RoutIR: Fast Serving of Retrieval Pipelines for Retrieval-Augmented Generation
- URL: http://arxiv.org/abs/2601.10644v1
- Date: Thu, 15 Jan 2026 18:04:43 GMT
- Title: RoutIR: Fast Serving of Retrieval Pipelines for Retrieval-Augmented Generation
- Authors: Eugene Yang, Andrew Yates, Dawn Lawrie, James Mayfield, Trevor Adriaanse,
- Abstract summary: Retrieval models are key components of Retrieval-Augmented Generation (RAG) systems.<n>RAG systems are often dynamic and may involve multiple rounds of retrieval.<n>RoutIR is a Python package that wraps arbitrary retrieval methods.
- Score: 24.079284500008754
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieval models are key components of Retrieval-Augmented Generation (RAG) systems, which generate search queries, process the documents returned, and generate a response. RAG systems are often dynamic and may involve multiple rounds of retrieval. While many state-of-the-art retrieval methods are available through academic IR platforms, these platforms are typically designed for the Cranfield paradigm in which all queries are known up front and can be batch processed offline. This simplification accelerates research but leaves state-of-the-art retrieval models unable to support downstream applications that require online services, such as arbitrary dynamic RAG pipelines that involve looping, feedback, or even self-organizing agents. In this work, we introduce RoutIR, a Python package that provides a simple and efficient HTTP API that wraps arbitrary retrieval methods, including first stage retrieval, reranking, query expansion, and result fusion. By providing a minimal JSON configuration file specifying the retrieval models to serve, RoutIR can be used to construct and query retrieval pipelines on-the-fly using any permutation of available models (e.g., fusing the results of several first-stage retrieval methods followed by reranking). The API automatically performs asynchronous query batching and caches results by default. While many state-of-the-art retrieval methods are already supported by the package, RoutIR is also easily expandable by implementing the Engine abstract class. The package is open-sourced and publicly available on GitHub: http://github.com/hltcoe/routir.
Related papers
- Chain of Retrieval: Multi-Aspect Iterative Search Expansion and Post-Order Search Aggregation for Full Paper Retrieval [68.71038700559195]
Chain of Retrieval(COR) is a novel iterative framework for full-paper retrieval.<n>We present SCIBENCH, a benchmark providing both complete and segmented contexts of full papers for queries and candidates.
arXiv Detail & Related papers (2025-07-14T08:41:53Z) - TreeHop: Generate and Filter Next Query Embeddings Efficiently for Multi-hop Question Answering [27.37434534716611]
TreeHop is an embedding-level framework for multi-hop question answering.<n>TreeHop dynamically updates query embeddings by fusing semantic information from prior queries.<n>TreeHop is a faster and more cost-effective solution for deployment in a range of knowledge-intensive applications.
arXiv Detail & Related papers (2025-04-28T01:56:31Z) - GeAR: Generation Augmented Retrieval [82.20696567697016]
This paper introduces a novel method, $textbfGe$neration.<n>It improves the global document-Query similarity through contrastive learning, but also integrates well-designed fusion and decoding modules.<n>When used as a retriever, GeAR does not incur any additional computational cost over bi-encoders.
arXiv Detail & Related papers (2025-01-06T05:29:00Z) - Large Language Model Can Be a Foundation for Hidden Rationale-Based Retrieval [12.83513794686623]
In this paper, we propose and study a more challenging type of retrieval task, called hidden rationale retrieval.<n>To address such problems, an instruction-tuned Large language model (LLM) with a cross-encoder architecture could be a reasonable choice.<n>We name this retrieval framework by RaHoRe and verify its zero-shot and fine-tuned performance superiority on Emotional Support Conversation (ESC)
arXiv Detail & Related papers (2024-12-21T13:19:15Z) - Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models.
Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models.
Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z) - REAPER: Reasoning based Retrieval Planning for Complex RAG Systems [3.160580413086215]
Complex queries can even require multi-step retrieval.
Most RAG Agents handle such Chain-of-Thought tasks by interleaving reasoning and retrieval steps.
We show significant gains in latency over Agent-based systems and are able to scale easily to new and unseen use cases.
arXiv Detail & Related papers (2024-07-26T07:05:54Z) - Database-Augmented Query Representation for Information Retrieval [71.41745087624528]
We present a novel retrieval framework called Database-Augmented Query representation (DAQu)<n>DAQu augments the original query with various (query-related) metadata across multiple tables.<n>We validate our DAQu in diverse retrieval scenarios, demonstrating that it significantly enhances overall retrieval performance.
arXiv Detail & Related papers (2024-06-23T05:02:21Z) - Question-Based Retrieval using Atomic Units for Enterprise RAG [3.273958158967657]
Enterprise retrieval augmented generation (RAG) offers a flexible framework for combining powerful large language models (LLMs) with internal, possibly temporally changing, documents.
This work applies a zero-shot adaptation of standard dense retrieval steps for more accurate chunk recall.
arXiv Detail & Related papers (2024-05-20T20:27:00Z) - SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot
Neural Sparse Retrieval [92.27387459751309]
We provide SPRINT, a unified Python toolkit for evaluating neural sparse retrieval.
We establish strong and reproducible zero-shot sparse retrieval baselines across the well-acknowledged benchmark, BEIR.
We show that SPLADEv2 produces sparse representations with a majority of tokens outside of the original query and document.
arXiv Detail & Related papers (2023-07-19T22:48:02Z) - Query Rewriting for Retrieval-Augmented Large Language Models [139.242907155883]
Large Language Models (LLMs) play powerful, black-box readers in the retrieve-then-read pipeline.
This work introduces a new framework, Rewrite-Retrieve-Read instead of the previous retrieve-then-read for the retrieval-augmented LLMs.
arXiv Detail & Related papers (2023-05-23T17:27:50Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.