Related papers: Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track

Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track

URL: http://arxiv.org/abs/2406.16828v1
Date: Mon, 24 Jun 2024 17:37:52 GMT
Title: Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track
Authors: Ronak Pradeep, Nandan Thakur, Sahel Sharifymoghaddam, Eric Zhang, Ryan Nguyen, Daniel Campos, Nick Craswell, Jimmy Lin,
Abstract summary: It is crucial to have an arena to build, test, visualize, and systematically evaluate RAG-based search systems. We propose the TREC 2024 RAG Track to foster innovation in evaluating RAG systems.
Score: 51.25144287084172
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Did you try out the new Bing Search? Or maybe you fiddled around with Google AI~Overviews? These might sound familiar because the modern-day search stack has recently evolved to include retrieval-augmented generation (RAG) systems. They allow searching and incorporating real-time data into large language models (LLMs) to provide a well-informed, attributed, concise summary in contrast to the traditional search paradigm that relies on displaying a ranked list of documents. Therefore, given these recent advancements, it is crucial to have an arena to build, test, visualize, and systematically evaluate RAG-based search systems. With this in mind, we propose the TREC 2024 RAG Track to foster innovation in evaluating RAG systems. In our work, we lay out the steps we've made towards making this track a reality -- we describe the details of our reusable framework, Ragnar\"ok, explain the curation of the new MS MARCO V2.1 collection choice, release the development topics for the track, and standardize the I/O definitions which assist the end user. Next, using Ragnar\"ok, we identify and provide key industrial baselines such as OpenAI's GPT-4o or Cohere's Command R+. Further, we introduce a web-based user interface for an interactive arena allowing benchmarking pairwise RAG systems by crowdsourcing. We open-source our Ragnar\"ok framework and baselines to achieve a unified standard for future RAG systems.

Related papers

A-RAG: Scaling Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces [34.59674580962045]
We introduce A-RAG, an Agentic RAG framework that exposes hierarchical retrieval interfaces directly to the model.<n>A-RAG provides three retrieval tools: keyword search, semantic search, and chunk read, enabling the agent to adaptively search and retrieve information across multiple granularities.<n> Experiments on multiple open-domain QA benchmarks show that A-RAG consistently outperforms existing approaches with comparable or lower retrieved tokens.
arXiv Detail & Related papers (2026-02-03T12:07:21Z)
Let the Barbarians In: How AI Can Accelerate Systems Performance Research [80.43506848683633]
We term this iterative cycle of generation, evaluation, and refinement AI-Driven Research for Systems.<n>We demonstrate that ADRS-generated solutions can match or even outperform human state-of-the-art designs.
arXiv Detail & Related papers (2025-12-16T18:51:23Z)
Search Is Not Retrieval: Decoupling Semantic Matching from Contextual Assembly in RAG [0.0]
We introduce the Search-Is-Not-Retrieve (SINR) framework, a dual-layer architecture that distinguishes between fine-grained search representations and coarse-grained retrieval contexts.<n>SINR enhances the composability, scalability, and context fidelity of retrieval systems by directly connecting small, semantically accurate search chunks to larger, contextually complete chunks.
arXiv Detail & Related papers (2025-11-07T02:51:20Z)
Evaluating Retrieval-Augmented Generation Systems on Unanswerable, Uncheatable, Realistic, Multi-hop Queries [53.99620546358492]
Real-world use cases often present RAG systems with complex queries for which relevant information is missing from the corpus or is incomplete.<n>Existing RAG benchmarks rarely reflect realistic task complexity for multi-hop or out-of-scope questions.<n>We present the first pipeline for automatic, difficulty-controlled creation of un$underlinec$heatable, $underliner$ealistic, $underlineu$nanswerable, and $underlinem$ulti-hop.
arXiv Detail & Related papers (2025-10-13T21:38:04Z)
QuIM-RAG: Advancing Retrieval-Augmented Generation with Inverted Question Matching for Enhanced QA Performance [1.433758865948252]
This work presents a novel architecture for building Retrieval-Augmented Generation (RAG) systems. RAG architecture is constructed to generate responses from the target document. We introduce QuIM-RAG, a novel approach for the retrieval mechanism in our system.
arXiv Detail & Related papers (2025-01-06T01:07:59Z)
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents [66.42579289213941]
Retrieval-augmented generation (RAG) is an effective technique that enables large language models to utilize external knowledge sources for generation. We introduce VisRAG, which tackles this issue by establishing a vision-language model (VLM)-based RAG pipeline. In this pipeline, instead of first parsing the document to obtain text, the document is directly embedded using a VLM as an image and then retrieved to enhance the generation of a VLM.
arXiv Detail & Related papers (2024-10-14T15:04:18Z)
Embodied-RAG: General Non-parametric Embodied Memory for Retrieval and Generation [65.23793829741014]
Embodied-RAG is a framework that enhances the model of an embodied agent with a non-parametric memory system. At its core, Embodied-RAG's memory is structured as a semantic forest, storing language descriptions at varying levels of detail. We demonstrate that Embodied-RAG effectively bridges RAG to the robotics domain, successfully handling over 200 explanation and navigation queries.
arXiv Detail & Related papers (2024-09-26T21:44:11Z)
CoIR: A Comprehensive Benchmark for Code Information Retrieval Models [52.61625841028781]
COIR (Code Information Retrieval Benchmark) is a benchmark specifically designed to assess code retrieval capabilities. COIR comprises ten meticulously curated code datasets, spanning eight distinctive retrieval tasks across seven diverse domains. We evaluate nine widely used retrieval models using COIR, uncovering significant difficulties in performing code retrieval tasks even with state-of-the-art systems.
arXiv Detail & Related papers (2024-07-03T07:58:20Z)
CRAG -- Comprehensive RAG Benchmark [58.15980697921195]
Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG) CRAG is a factual question answering benchmark of 4,409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search.
arXiv Detail & Related papers (2024-06-07T08:43:07Z)
The 2nd FutureDial Challenge: Dialog Systems with Retrieval Augmented Generation (FutureDial-RAG) [23.849336345191556]
The challenge builds upon the MobileCS2 dataset, a real-life customer service datasets with nearly 3000 high-quality dialogs. We define two tasks, track 1 for knowledge retrieval and track 2 for response generation, which are core research questions in dialog systems with RAG. We build baseline systems for the two tracks and design metrics to measure whether the systems can perform accurate retrieval and generate informative and coherent response.
arXiv Detail & Related papers (2024-05-21T07:35:21Z)
Towards a Search Engine for Machines: Unified Ranking for Multiple Retrieval-Augmented Large Language Models [21.115495457454365]
uRAG is a framework with a unified retrieval engine that serves multiple downstream retrieval-augmented generation (RAG) systems. We build a large-scale experimentation ecosystem consisting of 18 RAG systems that engage in training and 18 unknown RAG systems that use the uRAG as the new users of the search engine.
arXiv Detail & Related papers (2024-04-30T19:51:37Z)
FeB4RAG: Evaluating Federated Search in the Context of Retrieval Augmented Generation [31.371489527686578]
Federated search systems aggregate results from multiple search engines, selecting appropriate sources to enhance result quality and align with user intent. FEB4RAG is a novel dataset specifically designed for federated search within RAG frameworks.
arXiv Detail & Related papers (2024-02-19T07:06:52Z)
GAR-meets-RAG Paradigm for Zero-Shot Information Retrieval [16.369071865207808]
We propose a novel GAR-meets-RAG recurrence formulation that overcomes the challenges of existing paradigms. A key design principle is that the rewrite-retrieval stages improve the recall of the system and a final re-ranking stage improves the precision. Our method establishes a new state-of-the-art in the BEIR benchmark, outperforming previous best results in Recall@100 and nDCG@10 metrics on 6 out of 8 datasets.
arXiv Detail & Related papers (2023-10-31T03:52:08Z)
NeuralSearchX: Serving a Multi-billion-parameter Reranker for Multilingual Metasearch at a Low Cost [4.186775801993103]
We describe NeuralSearchX, a metasearch engine based on a multi-purpose large reranking model to merge results and highlight sentences. We show that our design choices led to a much cost-effective system with competitive QPS while having close to state-of-the-art results on a wide range of public benchmarks.
arXiv Detail & Related papers (2022-10-26T16:36:53Z)
Gait Recognition in the Wild: A Large-scale Benchmark and NAS-based Baseline [95.88825497452716]
Gait benchmarks empower the research community to train and evaluate high-performance gait recognition systems. GREW is the first large-scale dataset for gait recognition in the wild. SPOSGait is the first NAS-based gait recognition model.
arXiv Detail & Related papers (2022-05-05T14:57:39Z)
Autoregressive Search Engines: Generating Substrings as Document Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers. Previous work has explored ways to partition the search space into hierarchical structures. In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z)
Open-Retrieval Conversational Question Answering [62.11228261293487]
We introduce an open-retrieval conversational question answering (ORConvQA) setting, where we learn to retrieve evidence from a large collection before extracting answers. We build an end-to-end system for ORConvQA, featuring a retriever, a reranker, and a reader that are all based on Transformers.
arXiv Detail & Related papers (2020-05-22T19:39:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.