HiFi-RAG: Hierarchical Content Filtering and Two-Pass Generation for Open-Domain RAG
- URL: http://arxiv.org/abs/2512.22442v1
- Date: Sat, 27 Dec 2025 02:37:40 GMT
- Title: HiFi-RAG: Hierarchical Content Filtering and Two-Pass Generation for Open-Domain RAG
- Authors: Cattalyya Nuengsigkapian,
- Abstract summary: HiFi-RAG is the winning closed-source system in the Text-to-Text static evaluation of the MMU-RAGent NeurIPS 2025 Competition.<n>We leverage the speed and cost-efficiency of Gemini 2.5 Flash for query formulation, hierarchical content filtering, and citation attribution, while reserving the reasoning capabilities of Gemini 2.5 Pro for final answer generation.
- Score: 0.29008108937701327
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieval-Augmented Generation (RAG) in open-domain settings faces significant challenges regarding irrelevant information in retrieved documents and the alignment of generated answers with user intent. We present HiFi-RAG (Hierarchical Filtering RAG), the winning closed-source system in the Text-to-Text static evaluation of the MMU-RAGent NeurIPS 2025 Competition. Our approach moves beyond standard embedding-based retrieval via a multi-stage pipeline. We leverage the speed and cost-efficiency of Gemini 2.5 Flash (4-6x cheaper than Pro) for query formulation, hierarchical content filtering, and citation attribution, while reserving the reasoning capabilities of Gemini 2.5 Pro for final answer generation. On the MMU-RAGent validation set, our system outperformed the baseline, improving ROUGE-L to 0.274 (+19.6%) and DeBERTaScore to 0.677 (+6.2%). On Test2025, our custom dataset evaluating questions that require post-cutoff knowledge (post January 2025), HiFi-RAG outperforms the parametric baseline by 57.4% in ROUGE-L and 14.9% in DeBERTaScore.
Related papers
- InfoGain-RAG: Boosting Retrieval-Augmented Generation via Document Information Gain-based Reranking and Filtering [17.346965728209394]
Retrieval-Augmented Generation (RAG) has emerged as a promising approach to address key limitations of Large Language Models (LLMs)<n>We propose Document Information Gain (DIG), a novel metric designed to quantify the contribution of retrieved documents to correct answer generation.<n>We introduce InfoGain-RAG, a framework that leverages DIG scores to train a specialized reranker, which prioritizes each retrieved document from exact distinguishing and accurate sorting perspectives.
arXiv Detail & Related papers (2025-09-16T07:28:07Z) - Retrieval-Augmented Generation for Reliable Interpretation of Radio Regulations [49.671779378073886]
We study question answering in the domain of radio regulations.<n>We propose a telecom-specific Retrieval-Augmented Generation (RAG) pipeline.<n>Our approach consistently improves generation accuracy across all tested models.
arXiv Detail & Related papers (2025-09-11T17:43:42Z) - Evaluating Hybrid Retrieval Augmented Generation using Dynamic Test Sets: LiveRAG Challenge [8.680958290253914]
We present our submission to the LiveRAG Challenge 2025, which evaluates retrieval-augmented generation (RAG) systems on dynamic test sets.<n>Our final hybrid approach combines sparse (BM25) and dense (E5) retrieval methods.<n>We demonstrate that neural re-ranking with RankLLaMA improves MAP from 0.523 to 0.797 but introduces prohibitive computational costs.
arXiv Detail & Related papers (2025-06-27T21:20:43Z) - R.I.P.: Better Models by Survival of the Fittest Prompts [51.2293437372642]
We introduce a method for evaluating data integrity based on the assumption that low-quality input prompts result in high variance and low quality responses.<n>This is achieved by measuring the rejected response quality and the reward gap between the chosen and rejected preference pair.
arXiv Detail & Related papers (2025-01-30T18:50:25Z) - Retrieval-Augmented Generation for Domain-Specific Question Answering: A Case Study on Pittsburgh and CMU [3.1787418271023404]
We designed a Retrieval-Augmented Generation (RAG) system to provide large language models with relevant documents for answering domain-specific questions.
We extracted over 1,800 subpages using a greedy scraping strategy and employed a hybrid annotation process, combining manual and Mistral-generated question-answer pairs.
Our RAG framework integrates BM25 and FAISS retrievers, enhanced with a reranker for improved document retrieval accuracy.
arXiv Detail & Related papers (2024-11-20T20:10:43Z) - Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage [74.70255719194819]
We introduce a novel framework based on sub-question coverage, which measures how well a RAG system addresses different facets of a question.
We use this framework to evaluate three commercial generative answer engines: You.com, Perplexity AI, and Bing Chat.
We find that while all answer engines cover core sub-questions more often than background or follow-up ones, they still miss around 50% of core sub-questions.
arXiv Detail & Related papers (2024-10-20T22:59:34Z) - RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs [60.38044044203333]
Large language models (LLMs) typically utilize the top-k contexts from a retriever in retrieval-augmented generation (RAG)
We propose a novel instruction fine-tuning framework RankRAG, which instruction-tunes a single LLM for the dual purpose of context ranking and answer generation in RAG.
For generation, we compare our model with many strong baselines, including GPT-4-0613, GPT-4-turbo-2024-0409, and ChatQA-1.5, an open-sourced model with the state-of-the-art performance on RAG benchmarks.
arXiv Detail & Related papers (2024-07-02T17:59:17Z) - The Chronicles of RAG: The Retriever, the Chunk and the Generator [0.0]
This paper presents good practices to implement, optimize, and evaluate RAG for the Brazilian Portuguese language.
We explore a diverse set of methods to answer questions about the first Harry Potter book.
arXiv Detail & Related papers (2024-01-15T18:25:18Z) - Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting [65.00288634420812]
Pairwise Ranking Prompting (PRP) is a technique to significantly reduce the burden on Large Language Models (LLMs)
Our results are the first in the literature to achieve state-of-the-art ranking performance on standard benchmarks using moderate-sized open-sourced LLMs.
arXiv Detail & Related papers (2023-06-30T11:32:25Z) - Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner
Party Transcription [73.66530509749305]
In this paper, we argue that, even in difficult cases, some end-to-end approaches show performance close to the hybrid baseline.
We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures.
Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8% WER abs. worse than the LF-MMI TDNN-F CHiME-6 Challenge baseline.
arXiv Detail & Related papers (2020-04-22T19:08:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.