Iterative Utility Judgment Framework via LLMs Inspired by Relevance in Philosophy
- URL: http://arxiv.org/abs/2406.11290v2
- Date: Tue, 19 Aug 2025 09:59:05 GMT
- Title: Iterative Utility Judgment Framework via LLMs Inspired by Relevance in Philosophy
- Authors: Hengran Zhang, Keping Bi, Jiafeng Guo, Xueqi Cheng,
- Abstract summary: We propose an Iterative utiliTy judgm fraEntMework (ITEM) to promote each step in Retrieval-Augmented Generation (RAG)<n>RAG's three core components -- relevance ranking derived from retrieval models, utility judgments, and answer generation -- align with Schutz's philosophical system of relevances.<n> Experimental results demonstrate significant improvements of ITEM in utility judgments, ranking, and answer generation upon representative baselines.
- Score: 66.95501113584541
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Relevance and utility are two frequently used measures to evaluate the effectiveness of an information retrieval (IR) system. Relevance emphasizes the aboutness of a result to a query, while utility refers to the result's usefulness or value to an information seeker. In Retrieval-Augmented Generation (RAG), high-utility results should be prioritized to feed to LLMs due to their limited input bandwidth. Re-examining RAG's three core components -- relevance ranking derived from retrieval models, utility judgments, and answer generation -- aligns with Schutz's philosophical system of relevances, which encompasses three types of relevance representing different levels of human cognition that enhance each other. These three RAG components also reflect three cognitive levels for LLMs in question-answering. Therefore, we propose an Iterative utiliTy judgmEnt fraMework (ITEM) to promote each step in RAG. We conducted extensive experiments on retrieval (TREC DL, WebAP), utility judgment task (GTI-NQ), and factoid question-answering (NQ) datasets. Experimental results demonstrate significant improvements of ITEM in utility judgments, ranking, and answer generation upon representative baselines.
Related papers
- Character-R1: Enhancing Role-Aware Reasoning in Role-Playing Agents via RLVR [67.66592867046229]
Character-R1 is a framework designed to provide verifiable reward signals for effective role-aware reasoning.<n>Our framework comprises three core designs: Cognitive Focus Reward, Reference-Guided Reward and Character-Conditioned Reward Normalization.
arXiv Detail & Related papers (2026-01-08T05:33:37Z) - Enhancing the Medical Context-Awareness Ability of LLMs via Multifaceted Self-Refinement Learning [49.559151128219725]
Large language models (LLMs) have shown great promise in the medical domain, achieving strong performance on several benchmarks.<n>However, they continue to underperform in real-world medical scenarios, which often demand stronger context-awareness.<n>We propose Multifaceted Self-Refinement (MuSeR), a data-driven approach that enhances LLMs' context-awareness along three key facets.
arXiv Detail & Related papers (2025-11-13T08:13:23Z) - Knowledge-Graph Based RAG System Evaluation Framework [27.082302648704708]
Large language models (LLMs) has become a significant research focus.<n>Retrieval Augmented Generation (RAG) greatly enhances generated content's reliability and relevance.<n> evaluating RAG systems remains a challenging task.
arXiv Detail & Related papers (2025-10-02T20:36:21Z) - TRUE: A Reproducible Framework for LLM-Driven Relevance Judgment in Information Retrieval [11.27206971411905]
We introduce textitTask-aware Evaluation (TRUE) for relevance judgment generation.<n>TRUE was originally developed for usefulness evaluation in search sessions.<n>We evaluate TRUE on the TREC DL 2019, 2020 and LLMJudge datasets.
arXiv Detail & Related papers (2025-09-29T23:58:47Z) - Decomposed Reasoning with Reinforcement Learning for Relevance Assessment in UGC Platforms [30.51899823655511]
Retrieval-augmented generation (RAG) plays a critical role in user-generated content platforms.<n> platforms present unique challenges: 1) ambiguous user intent due to sparse user feedback in RAG scenarios, and 2) substantial noise introduced by informal and unstructured language.
arXiv Detail & Related papers (2025-08-04T15:14:09Z) - Distilling a Small Utility-Based Passage Selector to Enhance Retrieval-Augmented Generation [77.07879255360342]
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating retrieved information.<n>In RAG, the emphasis has shifted to utility, which considers the usefulness of passages for generating accurate answers.<n>Our approach focuses on utility-based selection rather than ranking, enabling dynamic passage selection tailored to specific queries without the need for fixed thresholds.<n>Our experiments demonstrate that utility-based selection provides a flexible and cost-effective solution for RAG, significantly reducing computational costs while improving answer quality.
arXiv Detail & Related papers (2025-07-25T09:32:29Z) - Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs [69.10441885629787]
Retrieval-Augmented Generation (RAG) lifts the factuality of Large Language Models (LLMs) by injecting external knowledge.<n>It falls short on problems that demand multi-step inference; conversely, purely reasoning-oriented approaches often hallucinate or mis-ground facts.<n>This survey synthesizes both strands under a unified reasoning-retrieval perspective.
arXiv Detail & Related papers (2025-07-13T03:29:41Z) - Causal Retrieval with Semantic Consideration [6.967392207053045]
We propose CAWAI, a retrieval model that is trained with dual objectives: semantic and causal relations.
Our experiments demonstrate that CAWAI outperforms various models on diverse causal retrieval tasks.
We also show that CAWAI exhibits strong zero-shot generalization across scientific domain QA tasks.
arXiv Detail & Related papers (2025-04-07T03:04:31Z) - Training a Utility-based Retriever Through Shared Context Attribution for Retrieval-Augmented Language Models [51.608246558235166]
SCARLet is a framework for training utility-based retrievers in RALMs.
It incorporates two key factors, multi-task generalization and inter-passage interaction.
We evaluate our approach on ten datasets across various tasks, both in-domain and out-of-domain.
arXiv Detail & Related papers (2025-04-01T09:28:28Z) - SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction [20.6787276745193]
We introduce an automatic evaluation method that measures retrieval quality through the lens of information gain within the RAG framework.<n>We quantify the utility of retrieval by the extent to which it reduces semantic perplexity post-retrieval.
arXiv Detail & Related papers (2025-03-03T12:37:34Z) - Is Relevance Propagated from Retriever to Generator in RAG? [21.82171240511567]
RAG is a framework for incorporating external knowledge, usually in the form of a set of documents retrieved from a collection.
We empirically investigate whether a RAG context comprised of topically relevant documents leads to improved downstream performance.
arXiv Detail & Related papers (2025-02-20T20:21:46Z) - Long Context vs. RAG for LLMs: An Evaluation and Revisits [41.27137478456755]
This paper revisits recent studies on this topic, highlighting their key insights and discrepancies.
We show that LC generally outperforms RAG in question-answering benchmarks, especially for Wikipedia-based questions.
We also provide an in-depth discussion on this topic, highlighting the overlooked importance of context relevance in existing studies.
arXiv Detail & Related papers (2024-12-27T14:34:37Z) - Toward Optimal Search and Retrieval for RAG [39.69494982983534]
Retrieval-augmented generation (RAG) is a promising method for addressing some of the memory-related challenges associated with Large Language Models (LLMs)
Here, we work towards the goal of understanding how retrievers can be optimized for RAG pipelines for common tasks such as Question Answering (QA)
arXiv Detail & Related papers (2024-11-11T22:06:51Z) - JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking [81.88787401178378]
We introduce JudgeRank, a novel agentic reranker that emulates human cognitive processes when assessing document relevance.
We evaluate JudgeRank on the reasoning-intensive BRIGHT benchmark, demonstrating substantial performance improvements over first-stage retrieval methods.
In addition, JudgeRank performs on par with fine-tuned state-of-the-art rerankers on the popular BEIR benchmark, validating its zero-shot generalization capability.
arXiv Detail & Related papers (2024-10-31T18:43:12Z) - CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation [68.81271028921647]
We introduce CORAL, a benchmark designed to assess RAG systems in realistic multi-turn conversational settings.
CORAL includes diverse information-seeking conversations automatically derived from Wikipedia.
It supports three core tasks of conversational RAG: passage retrieval, response generation, and citation labeling.
arXiv Detail & Related papers (2024-10-30T15:06:32Z) - Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage [74.70255719194819]
We introduce a novel framework based on sub-question coverage, which measures how well a RAG system addresses different facets of a question.
We use this framework to evaluate three commercial generative answer engines: You.com, Perplexity AI, and Bing Chat.
We find that while all answer engines cover core sub-questions more often than background or follow-up ones, they still miss around 50% of core sub-questions.
arXiv Detail & Related papers (2024-10-20T22:59:34Z) - Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs [64.9693406713216]
Internal mechanisms that contribute to the effectiveness of RAG systems remain underexplored.
Our experiments reveal that several core groups of experts are primarily responsible for RAG-related behaviors.
We propose several strategies to enhance RAG's efficiency and effectiveness through expert activation.
arXiv Detail & Related papers (2024-10-20T16:08:54Z) - A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems [67.52782366565658]
State-of-the-art recommender systems (RSs) depend on categorical features, which ecoded by embedding vectors, resulting in excessively large embedding tables.<n>Despite the prosperity of lightweight embedding-based RSs, a wide diversity is seen in evaluation protocols.<n>This study investigates various LERS' performance, efficiency, and cross-task transferability via a thorough benchmarking process.
arXiv Detail & Related papers (2024-06-25T07:45:00Z) - Identifying Key Terms in Prompts for Relevance Evaluation with GPT Models [1.1965844936801802]
The purpose of this paper is to identify which specific terms in prompts positively or negatively impact relevance evaluation with Large Language Models.
By comparing the performance of these prompts in both few-shot and zero-shot settings, we analyze the influence of specific terms in the prompts.
arXiv Detail & Related papers (2024-05-11T06:30:13Z) - Improving Retrieval in Theme-specific Applications using a Corpus
Topical Taxonomy [52.426623750562335]
We introduce ToTER (Topical taxonomy Enhanced Retrieval) framework.
ToTER identifies the central topics of queries and documents with the guidance of the taxonomy, and exploits their topical relatedness to supplement missing contexts.
As a plug-and-play framework, ToTER can be flexibly employed to enhance various PLM-based retrievers.
arXiv Detail & Related papers (2024-03-07T02:34:54Z) - Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual
Recognition [57.08108545219043]
Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision.
Existing literature addresses this challenge by employing local-based representation approaches.
This article proposes a novel model, Robust Saliency-aware Distillation (RSaD), for few-shot fine-grained visual recognition.
arXiv Detail & Related papers (2023-05-12T00:13:17Z) - Joint Answering and Explanation for Visual Commonsense Reasoning [46.44588492897933]
Visual Commonsense Reasoning endeavors to pursue a more high-level visual comprehension.
It is composed of two indispensable processes: question answering over a given image and rationale inference for answer explanation.
We present a plug-and-play knowledge distillation enhanced framework to couple the question answering and rationale inference processes.
arXiv Detail & Related papers (2022-02-25T11:26:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.