Related papers: Iterative Utility Judgment Framework via LLMs Inspired by Relevance in Philosophy

Iterative Utility Judgment Framework via LLMs Inspired by Relevance in Philosophy

URL: http://arxiv.org/abs/2406.11290v1
Date: Mon, 17 Jun 2024 07:52:42 GMT
Title: Iterative Utility Judgment Framework via LLMs Inspired by Relevance in Philosophy
Authors: Hengran Zhang, Keping Bi, Jiafeng Guo, Xueqi Cheng,
Abstract summary: Utility and topical relevance are critical measures in information retrieval. We propose an Iterative utiliTy judgmEnt fraMework to promote each step of the cycle of Retrieval-Augmented Generation.
Score: 66.95501113584541
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Utility and topical relevance are critical measures in information retrieval (IR), reflecting system and user perspectives, respectively. While topical relevance has long been emphasized, utility is a higher standard of relevance and is more useful for facilitating downstream tasks, e.g., in Retrieval-Augmented Generation (RAG). When we incorporate utility judgments into RAG, we realize that the topical relevance, utility, and answering in RAG are closely related to the three types of relevance that Schutz discussed from a philosophical perspective. They are topical relevance, interpretational relevance, and motivational relevance, respectively. Inspired by the dynamic iterations of the three types of relevance, we propose an Iterative utiliTy judgmEnt fraMework (ITEM) to promote each step of the cycle of RAG. We conducted extensive experiments on multi-grade passage retrieval and factoid question-answering datasets (i.e., TREC DL, WebAP, and NQ). Experimental results demonstrate significant improvements in utility judgments, ranking of topical relevance, and answer generation upon representative baselines, including multiple single-shot utility judging approaches. Our code and benchmark can be found at https://anonymous.4open.science/r/ITEM-B486/.

Related papers

Causal Retrieval with Semantic Consideration [6.967392207053045]
We propose CAWAI, a retrieval model that is trained with dual objectives: semantic and causal relations. Our experiments demonstrate that CAWAI outperforms various models on diverse causal retrieval tasks. We also show that CAWAI exhibits strong zero-shot generalization across scientific domain QA tasks.
arXiv Detail & Related papers (2025-04-07T03:04:31Z)
Training a Utility-based Retriever Through Shared Context Attribution for Retrieval-Augmented Language Models [51.608246558235166]
SCARLet is a framework for training utility-based retrievers in RALMs. It incorporates two key factors, multi-task generalization and inter-passage interaction. We evaluate our approach on ten datasets across various tasks, both in-domain and out-of-domain.
arXiv Detail & Related papers (2025-04-01T09:28:28Z)
Is Relevance Propagated from Retriever to Generator in RAG? [21.82171240511567]
RAG is a framework for incorporating external knowledge, usually in the form of a set of documents retrieved from a collection. We empirically investigate whether a RAG context comprised of topically relevant documents leads to improved downstream performance.
arXiv Detail & Related papers (2025-02-20T20:21:46Z)
Long Context vs. RAG for LLMs: An Evaluation and Revisits [41.27137478456755]
This paper revisits recent studies on this topic, highlighting their key insights and discrepancies. We show that LC generally outperforms RAG in question-answering benchmarks, especially for Wikipedia-based questions. We also provide an in-depth discussion on this topic, highlighting the overlooked importance of context relevance in existing studies.
arXiv Detail & Related papers (2024-12-27T14:34:37Z)
Toward Optimal Search and Retrieval for RAG [39.69494982983534]
Retrieval-augmented generation (RAG) is a promising method for addressing some of the memory-related challenges associated with Large Language Models (LLMs) Here, we work towards the goal of understanding how retrievers can be optimized for RAG pipelines for common tasks such as Question Answering (QA)
arXiv Detail & Related papers (2024-11-11T22:06:51Z)
JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking [81.88787401178378]
We introduce JudgeRank, a novel agentic reranker that emulates human cognitive processes when assessing document relevance. We evaluate JudgeRank on the reasoning-intensive BRIGHT benchmark, demonstrating substantial performance improvements over first-stage retrieval methods. In addition, JudgeRank performs on par with fine-tuned state-of-the-art rerankers on the popular BEIR benchmark, validating its zero-shot generalization capability.
arXiv Detail & Related papers (2024-10-31T18:43:12Z)
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation [68.81271028921647]
We introduce CORAL, a benchmark designed to assess RAG systems in realistic multi-turn conversational settings. CORAL includes diverse information-seeking conversations automatically derived from Wikipedia. It supports three core tasks of conversational RAG: passage retrieval, response generation, and citation labeling.
arXiv Detail & Related papers (2024-10-30T15:06:32Z)
Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage [74.70255719194819]
We introduce a novel framework based on sub-question coverage, which measures how well a RAG system addresses different facets of a question. We use this framework to evaluate three commercial generative answer engines: You.com, Perplexity AI, and Bing Chat. We find that while all answer engines cover core sub-questions more often than background or follow-up ones, they still miss around 50% of core sub-questions.
arXiv Detail & Related papers (2024-10-20T22:59:34Z)
Identifying Key Terms in Prompts for Relevance Evaluation with GPT Models [1.1965844936801802]
The purpose of this paper is to identify which specific terms in prompts positively or negatively impact relevance evaluation with Large Language Models. By comparing the performance of these prompts in both few-shot and zero-shot settings, we analyze the influence of specific terms in the prompts.
arXiv Detail & Related papers (2024-05-11T06:30:13Z)
Improving Retrieval in Theme-specific Applications using a Corpus Topical Taxonomy [52.426623750562335]
We introduce ToTER (Topical taxonomy Enhanced Retrieval) framework. ToTER identifies the central topics of queries and documents with the guidance of the taxonomy, and exploits their topical relatedness to supplement missing contexts. As a plug-and-play framework, ToTER can be flexibly employed to enhance various PLM-based retrievers.
arXiv Detail & Related papers (2024-03-07T02:34:54Z)
Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual Recognition [57.08108545219043]
Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision. Existing literature addresses this challenge by employing local-based representation approaches. This article proposes a novel model, Robust Saliency-aware Distillation (RSaD), for few-shot fine-grained visual recognition.
arXiv Detail & Related papers (2023-05-12T00:13:17Z)
Joint Answering and Explanation for Visual Commonsense Reasoning [46.44588492897933]
Visual Commonsense Reasoning endeavors to pursue a more high-level visual comprehension. It is composed of two indispensable processes: question answering over a given image and rationale inference for answer explanation. We present a plug-and-play knowledge distillation enhanced framework to couple the question answering and rationale inference processes.
arXiv Detail & Related papers (2022-02-25T11:26:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.