Related papers: RAGferee: Building Contextual Reward Models for Retrieval-Augmented Generation

RAGferee: Building Contextual Reward Models for Retrieval-Augmented Generation

URL: http://arxiv.org/abs/2509.26011v1
Date: Tue, 30 Sep 2025 09:41:40 GMT
Title: RAGferee: Building Contextual Reward Models for Retrieval-Augmented Generation
Authors: Andrei C. Coman, Ionut-Teodor Sorodoc, Leonardo F. R. Ribeiro, Bill Byrne, James Henderson, Adrià de Gispert,
Abstract summary: RAGferee is a methodology that repurposes question-answering (QA) datasets into preference pairs that prioritise groundedness over stylistic features.<n>Using RAGferee, we curate a small preference dataset of 4K samples and fine-tune RMs ranging from 7B to 24B parameters.<n>Our RAG-centric RMs achieve state-of-the-art performance on ConJudgeBench, surpassing existing 70B+ RMs trained on much larger (up to 2.4M samples) general corpora, with an absolute improvement of +15.5%.
Score: 26.854073751273585
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Existing Reward Models (RMs), typically trained on general preference data, struggle in Retrieval Augmented Generation (RAG) settings, which require judging responses for faithfulness to retrieved context, relevance to the user query, appropriate refusals when context is insufficient, completeness and conciseness of information. To address the lack of publicly available RAG-centric preference datasets and specialised RMs, we introduce RAGferee, a methodology that repurposes question-answering (QA) datasets into preference pairs that prioritise groundedness over stylistic features, enabling the training of contextual RMs better suited to judging RAG responses. Using RAGferee, we curate a small preference dataset of 4K samples and fine-tune RMs ranging from 7B to 24B parameters. Our RAG-centric RMs achieve state-of-the-art performance on ContextualJudgeBench, surpassing existing 70B+ RMs trained on much larger (up to 2.4M samples) general corpora, with an absolute improvement of +15.5%.

Related papers

One Model to Critique Them All: Rewarding Agentic Tool-Use via Efficient Reasoning [54.580646706013965]
Reward models (RMs) play a critical role in aligning large language models with human preferences.<n>We introduce ToolRM, a family of lightweight generative RMs tailored for general tool-use scenarios.<n>To build these models, we propose a novel pipeline that constructs pairwise preference data using rule-based scoring and multidimensional sampling.
arXiv Detail & Related papers (2025-10-30T06:08:27Z)
ModernBERT + ColBERT: Enhancing biomedical RAG through an advanced re-ranking retriever [0.5371337604556311]
We develop a lightweight ModernBERT bidirectional encoder for efficient initial candidate retrieval with a ColBERTv2 late-interaction model for fine-grained re-ranking.<n>Our analysis of the retriever module confirmed the positive impact of the ColBERT re-ranker, which improved Recall@3 by up to 4.2 percentage points.<n>Our ablation studies reveal that this performance is critically dependent on a joint fine-tuning process that aligns the retriever and re-ranker.
arXiv Detail & Related papers (2025-10-06T12:34:55Z)
OpenRAG: Optimizing RAG End-to-End via In-Context Retrieval Learning [13.181087031343619]
We introduce OpenRAG, a RAG framework that is optimized end-to-end by tuning the retriever to capture in-context relevance.<n>Experiments across a wide range of tasks demonstrate that OpenRAG, by tuning a retriever end-to-end, leads to a consistent improvement of 4.0% over the original retriever.
arXiv Detail & Related papers (2025-03-11T13:04:05Z)
Chain-of-Retrieval Augmented Generation [91.02950964802454]
This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer.<n>Our proposed method, CoRAG, allows the model to dynamically reformulate the query based on the evolving state.
arXiv Detail & Related papers (2025-01-24T09:12:52Z)
RPO: Retrieval Preference Optimization for Robust Retrieval-Augmented Generation [33.85528514353727]
We introduce the Retrieval Preference Optimization (RPO) to adaptively leverage multi-source knowledge based on retrieval relevance.<n>RPO is the only RAG-dedicated alignment approach that quantifies the awareness of retrieval relevance in training.<n>Experiments on four datasets demonstrate that RPO outperforms RAG by 4-10% in accuracy without any extra component.
arXiv Detail & Related papers (2025-01-23T14:58:56Z)
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment [18.491114307921848]
We propose RAG-RewardBench, the first benchmark for evaluating RMs in RAG settings.<n>First, we design four crucial and challenging RAG-specific scenarios to assess RMs.<n>Then, we incorporate 18 RAG subsets, six retrievers, and 24 RALMs to increase the diversity of data sources.<n>Finally, we adopt an LLM-as-a-judge approach to improve preference annotation efficiency and effectiveness.
arXiv Detail & Related papers (2024-12-18T11:28:05Z)
Unanswerability Evaluation for Retrieval Augmented Generation [74.3022365715597]
UAEval4RAG is a framework designed to evaluate whether RAG systems can handle unanswerable queries effectively.<n>We define a taxonomy with six unanswerable categories, and UAEval4RAG automatically synthesizes diverse and challenging queries.
arXiv Detail & Related papers (2024-12-16T19:11:55Z)
Retrieval-Augmented Generation for Domain-Specific Question Answering: A Case Study on Pittsburgh and CMU [3.1787418271023404]
We designed a Retrieval-Augmented Generation (RAG) system to provide large language models with relevant documents for answering domain-specific questions. We extracted over 1,800 subpages using a greedy scraping strategy and employed a hybrid annotation process, combining manual and Mistral-generated question-answer pairs. Our RAG framework integrates BM25 and FAISS retrievers, enhanced with a reranker for improved document retrieval accuracy.
arXiv Detail & Related papers (2024-11-20T20:10:43Z)
RRM: Robust Reward Model Training Mitigates Reward Hacking [51.12341734942797]
Reward models (RMs) play a pivotal role in aligning large language models with human preferences.<n>We introduce a causal framework that learns preferences independent of these artifacts.<n>Experiments show that our approach successfully filters out undesirable artifacts, yielding a more robust reward model.
arXiv Detail & Related papers (2024-09-20T01:46:07Z)
SFR-RAG: Towards Contextually Faithful LLMs [57.666165819196486]
Retrieval Augmented Generation (RAG) is a paradigm that integrates external contextual information with large language models (LLMs) to enhance factual accuracy and relevance. We introduce SFR-RAG, a small LLM that is instruction-textual with an emphasis on context-grounded generation and hallucination. We also present ConBench, a new evaluation framework compiling multiple popular and diverse RAG benchmarks.
arXiv Detail & Related papers (2024-09-16T01:08:18Z)
CRAG -- Comprehensive RAG Benchmark [58.15980697921195]
Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG) CRAG is a factual question answering benchmark of 4,409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search.
arXiv Detail & Related papers (2024-06-07T08:43:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.