Related papers: RECOR: Reasoning-focused Multi-turn Conversational Retrieval Benchmark

RECOR: Reasoning-focused Multi-turn Conversational Retrieval Benchmark

URL: http://arxiv.org/abs/2601.05461v1
Date: Fri, 09 Jan 2026 01:25:46 GMT
Title: RECOR: Reasoning-focused Multi-turn Conversational Retrieval Benchmark
Authors: Mohammed Ali, Abdelrahman Abdallah, Amit Agarwal, Hitesh Laxmichand Patel, Adam Jatowt,
Abstract summary: We present a benchmark for reasoning-based conversational information retrieval comprising 707 conversations (2,971 turns) across eleven domains.<n>To ensure quality, our Decomposition-and-Verification framework transforms complex queries into fact-grounded multi-turn dialogues.
Score: 20.750773856512662
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Existing benchmarks treat multi-turn conversation and reasoning-intensive retrieval separately, yet real-world information seeking requires both. To bridge this gap, we present a benchmark for reasoning-based conversational information retrieval comprising 707 conversations (2,971 turns) across eleven domains. To ensure quality, our Decomposition-and-Verification framework transforms complex queries into fact-grounded multi-turn dialogues through multi-level validation, where atomic facts are verified against sources and explicit retrieval reasoning is generated for each turn. Comprehensive evaluation reveals that combining conversation history with reasoning doubles retrieval performance (Baseline .236 $\rightarrow$ History+Reasoning .479 nDCG@10), while reasoning-specialized models substantially outperform dense encoders. Despite these gains, further analysis highlights that implicit reasoning remains challenging, particularly when logical connections are not explicitly stated in the text.

Related papers

Multimodal Fact-Level Attribution for Verifiable Reasoning [80.60864342985748]
Multimodal large language models (MLLMs) are increasingly used for real-world tasks involving multi-step reasoning and long-form generation.<n>Existing multimodal grounding benchmarks and evaluation methods fail to assess attribution in complex multimodal reasoning.<n>We introduce MuRGAt, a benchmark for evaluating fact-level multimodal attribution in settings that require reasoning beyond direct observation.
arXiv Detail & Related papers (2026-02-12T03:10:02Z)
Thinking Forward and Backward: Multi-Objective Reinforcement Learning for Retrieval-Augmented Reasoning [137.33138614095435]
Retrieval-augmented generation (RAG) has proven to be effective in mitigating hallucinations in large language models.<n>Recent efforts have incorporated search-based interactions into RAG, enabling iterative reasoning with real-time retrieval.<n>We propose Bi-RAR, a novel retrieval-augmented reasoning framework that evaluates each intermediate step jointly in both forward and backward directions.
arXiv Detail & Related papers (2025-11-12T08:29:39Z)
MR$^2$-Bench: Going Beyond Matching to Reasoning in Multimodal Retrieval [86.35779264575154]
Multimodal retrieval is becoming a crucial component of modern AI applications, yet its evaluation lags behind the demands of more realistic and challenging scenarios.<n>We introduce MR$2$-Bench, a reasoning-intensive benchmark for multimodal retrieval.
arXiv Detail & Related papers (2025-09-30T15:09:14Z)
DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval [36.38599923075882]
DIVER is a retrieval pipeline designed for reasoning-intensive information retrieval.<n>It consists of four components: the document preprocessing stage, the query expansion stage, the retrieval stage and the reranking stage.<n>On the BRIGHT benchmark, DIVER achieves state-of-the-art nDCG@10 scores of 45.8 overall and 28.9 on original queries, consistently outperforming competitive reasoning-aware models.
arXiv Detail & Related papers (2025-08-11T13:57:49Z)
UniConv: Unifying Retrieval and Response Generation for Large Language Models in Conversations [71.79210031338464]
We show how to unify dense retrieval and response generation for large language models in conversation.<n>We conduct joint fine-tuning with different objectives and design two mechanisms to reduce the inconsistency risks.<n>The evaluations on five conversational search datasets demonstrate that our unified model can mutually improve both tasks and outperform the existing baselines.
arXiv Detail & Related papers (2025-07-09T17:02:40Z)
DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs [54.4857963044859]
We propose DialogueReason, a reasoning paradigm that uncovers the lost roles in monologue-style reasoning models.<n>Our work consists of an analysis of monologue reasoning patterns and the development of a dialogue-based reasoning approach.
arXiv Detail & Related papers (2025-05-11T16:39:58Z)
History-Aware Conversational Dense Retrieval [31.203399110612388]
We propose a History-Aware Conversational Dense Retrieval (HAConvDR) system, which incorporates two ideas: context-denoised query reformulation and automatic mining of supervision signals. Experiments on two public conversational search datasets demonstrate the improved history modeling capability of HAConvDR.
arXiv Detail & Related papers (2024-01-30T01:24:18Z)
Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents [35.6393824052347]
We propose a framework for dialogue chain-of-thought (CoT) reasoning. We present DOCTOR, a DialOgue Chain-of-ThOught Reasoner. We conduct experiments to show that enhancing dialogue agents with high-quality rationales from DOCTOR significantly improves the quality of their responses.
arXiv Detail & Related papers (2023-10-13T18:17:23Z)
ZeQR: Zero-shot Query Reformulation for Conversational Search [11.644235288057123]
We introduce a novel Zero-shot Query Reformulation (or Query Rewriting) framework that reformulates queries based on previous dialogue contexts without requiring supervision from conversational search data. Specifically, our framework utilizes language models designed for machine reading comprehension tasks to explicitly resolve two common ambiguities: coreference and omission, in raw queries. It also provides greater explainability and effectively enhances query intent understanding because ambiguities are explicitly and proactively resolved.
arXiv Detail & Related papers (2023-07-18T16:05:25Z)
Multi-Stage Conversational Passage Retrieval: An Approach to Fusing Term Importance Estimation and Neural Query Rewriting [56.268862325167575]
We tackle conversational passage retrieval (ConvPR) with query reformulation integrated into a multi-stage ad-hoc IR system. We propose two conversational query reformulation (CQR) methods: (1) term importance estimation and (2) neural query rewriting. For the former, we expand conversational queries using important terms extracted from the conversational context with frequency-based signals. For the latter, we reformulate conversational queries into natural, standalone, human-understandable queries with a pretrained sequence-tosequence model.
arXiv Detail & Related papers (2020-05-05T14:30:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.