Related papers: HiFi-RAG: Hierarchical Content Filtering and Two-Pass Generation for Open-Domain RAG

HiFi-RAG: Hierarchical Content Filtering and Two-Pass Generation for Open-Domain RAG

URL: http://arxiv.org/abs/2512.22442v1
Date: Sat, 27 Dec 2025 02:37:40 GMT
Title: HiFi-RAG: Hierarchical Content Filtering and Two-Pass Generation for Open-Domain RAG
Authors: Cattalyya Nuengsigkapian,
Abstract summary: HiFi-RAG is the winning closed-source system in the Text-to-Text static evaluation of the MMU-RAGent NeurIPS 2025 Competition.<n>We leverage the speed and cost-efficiency of Gemini 2.5 Flash for query formulation, hierarchical content filtering, and citation attribution, while reserving the reasoning capabilities of Gemini 2.5 Pro for final answer generation.
Score: 0.29008108937701327
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Retrieval-Augmented Generation (RAG) in open-domain settings faces significant challenges regarding irrelevant information in retrieved documents and the alignment of generated answers with user intent. We present HiFi-RAG (Hierarchical Filtering RAG), the winning closed-source system in the Text-to-Text static evaluation of the MMU-RAGent NeurIPS 2025 Competition. Our approach moves beyond standard embedding-based retrieval via a multi-stage pipeline. We leverage the speed and cost-efficiency of Gemini 2.5 Flash (4-6x cheaper than Pro) for query formulation, hierarchical content filtering, and citation attribution, while reserving the reasoning capabilities of Gemini 2.5 Pro for final answer generation. On the MMU-RAGent validation set, our system outperformed the baseline, improving ROUGE-L to 0.274 (+19.6%) and DeBERTaScore to 0.677 (+6.2%). On Test2025, our custom dataset evaluating questions that require post-cutoff knowledge (post January 2025), HiFi-RAG outperforms the parametric baseline by 57.4% in ROUGE-L and 14.9% in DeBERTaScore.

Related papers

InfoGain-RAG: Boosting Retrieval-Augmented Generation via Document Information Gain-based Reranking and Filtering [17.346965728209394]
Retrieval-Augmented Generation (RAG) has emerged as a promising approach to address key limitations of Large Language Models (LLMs)<n>We propose Document Information Gain (DIG), a novel metric designed to quantify the contribution of retrieved documents to correct answer generation.<n>We introduce InfoGain-RAG, a framework that leverages DIG scores to train a specialized reranker, which prioritizes each retrieved document from exact distinguishing and accurate sorting perspectives.
arXiv Detail & Related papers (2025-09-16T07:28:07Z)
Retrieval-Augmented Generation for Reliable Interpretation of Radio Regulations [49.671779378073886]
We study question answering in the domain of radio regulations.<n>We propose a telecom-specific Retrieval-Augmented Generation (RAG) pipeline.<n>Our approach consistently improves generation accuracy across all tested models.
arXiv Detail & Related papers (2025-09-11T17:43:42Z)
Evaluating Hybrid Retrieval Augmented Generation using Dynamic Test Sets: LiveRAG Challenge [8.680958290253914]
We present our submission to the LiveRAG Challenge 2025, which evaluates retrieval-augmented generation (RAG) systems on dynamic test sets.<n>Our final hybrid approach combines sparse (BM25) and dense (E5) retrieval methods.<n>We demonstrate that neural re-ranking with RankLLaMA improves MAP from 0.523 to 0.797 but introduces prohibitive computational costs.
arXiv Detail & Related papers (2025-06-27T21:20:43Z)
R.I.P.: Better Models by Survival of the Fittest Prompts [51.2293437372642]
We introduce a method for evaluating data integrity based on the assumption that low-quality input prompts result in high variance and low quality responses.<n>This is achieved by measuring the rejected response quality and the reward gap between the chosen and rejected preference pair.
arXiv Detail & Related papers (2025-01-30T18:50:25Z)
Retrieval-Augmented Generation for Domain-Specific Question Answering: A Case Study on Pittsburgh and CMU [3.1787418271023404]
We designed a Retrieval-Augmented Generation (RAG) system to provide large language models with relevant documents for answering domain-specific questions. We extracted over 1,800 subpages using a greedy scraping strategy and employed a hybrid annotation process, combining manual and Mistral-generated question-answer pairs. Our RAG framework integrates BM25 and FAISS retrievers, enhanced with a reranker for improved document retrieval accuracy.
arXiv Detail & Related papers (2024-11-20T20:10:43Z)
Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage [74.70255719194819]
We introduce a novel framework based on sub-question coverage, which measures how well a RAG system addresses different facets of a question. We use this framework to evaluate three commercial generative answer engines: You.com, Perplexity AI, and Bing Chat. We find that while all answer engines cover core sub-questions more often than background or follow-up ones, they still miss around 50% of core sub-questions.
arXiv Detail & Related papers (2024-10-20T22:59:34Z)
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs [60.38044044203333]
Large language models (LLMs) typically utilize the top-k contexts from a retriever in retrieval-augmented generation (RAG) We propose a novel instruction fine-tuning framework RankRAG, which instruction-tunes a single LLM for the dual purpose of context ranking and answer generation in RAG. For generation, we compare our model with many strong baselines, including GPT-4-0613, GPT-4-turbo-2024-0409, and ChatQA-1.5, an open-sourced model with the state-of-the-art performance on RAG benchmarks.
arXiv Detail & Related papers (2024-07-02T17:59:17Z)
The Chronicles of RAG: The Retriever, the Chunk and the Generator [0.0]
This paper presents good practices to implement, optimize, and evaluate RAG for the Brazilian Portuguese language. We explore a diverse set of methods to answer questions about the first Harry Potter book.
arXiv Detail & Related papers (2024-01-15T18:25:18Z)
Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting [65.00288634420812]
Pairwise Ranking Prompting (PRP) is a technique to significantly reduce the burden on Large Language Models (LLMs) Our results are the first in the literature to achieve state-of-the-art ranking performance on standard benchmarks using moderate-sized open-sourced LLMs.
arXiv Detail & Related papers (2023-06-30T11:32:25Z)
Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription [73.66530509749305]
In this paper, we argue that, even in difficult cases, some end-to-end approaches show performance close to the hybrid baseline. We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures. Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8% WER abs. worse than the LF-MMI TDNN-F CHiME-6 Challenge baseline.
arXiv Detail & Related papers (2020-04-22T19:08:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.