Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness
- URL: http://arxiv.org/abs/2405.02714v1
- Date: Sat, 4 May 2024 17:10:00 GMT
- Title: Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness
- Authors: Xinran Zhao, Tong Chen, Sihao Chen, Hongming Zhang, Tongshuang Wu,
- Abstract summary: retrievers are expected to not only rely on the semantic relevance between the documents and the queries but also recognize the nuanced intents or perspectives behind a user query.
In this work, we study whether retrievers can recognize and respond to different perspectives of the queries.
We show that current retrievers have limited awareness of subtly different perspectives in queries and can also be biased toward certain perspectives.
- Score: 56.42192735214931
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The task of Information Retrieval (IR) requires a system to identify relevant documents based on users' information needs. In real-world scenarios, retrievers are expected to not only rely on the semantic relevance between the documents and the queries but also recognize the nuanced intents or perspectives behind a user query. For example, when asked to verify a claim, a retrieval system is expected to identify evidence from both supporting vs. contradicting perspectives, for the downstream system to make a fair judgment call. In this work, we study whether retrievers can recognize and respond to different perspectives of the queries -- beyond finding relevant documents for a claim, can retrievers distinguish supporting vs. opposing documents? We reform and extend six existing tasks to create a benchmark for retrieval, where we have diverse perspectives described in free-form text, besides root, neutral queries. We show that current retrievers covered in our experiments have limited awareness of subtly different perspectives in queries and can also be biased toward certain perspectives. Motivated by the observation, we further explore the potential to leverage geometric features of retriever representation space to improve the perspective awareness of retrievers in a zero-shot manner. We demonstrate the efficiency and effectiveness of our projection-based methods on the same set of tasks. Further analysis also shows how perspective awareness improves performance on various downstream tasks, with 4.2% higher accuracy on AmbigQA and 29.9% more correlation with designated viewpoints on essay writing, compared to non-perspective-aware baselines.
Related papers
- Generative Retrieval Meets Multi-Graded Relevance [104.75244721442756]
We introduce a framework called GRaded Generative Retrieval (GR$2$)
GR$2$ focuses on two key components: ensuring relevant and distinct identifiers, and implementing multi-graded constrained contrastive training.
Experiments on datasets with both multi-graded and binary relevance demonstrate the effectiveness of GR$2$.
arXiv Detail & Related papers (2024-09-27T02:55:53Z) - Open-World Evaluation for Retrieving Diverse Perspectives [39.22331280176582]
We curate a Benchmark for Retrieval Diversity for Subjective questions (BERDS)
Each example consists of a question and diverse perspectives associated with the question.
We build a language model based automatic evaluator that decides whether each retrieved document contains a perspective.
arXiv Detail & Related papers (2024-09-26T17:52:57Z) - Overview of PerpectiveArg2024: The First Shared Task on Perspective Argument Retrieval [56.66761232081188]
We present a novel dataset covering demographic and socio-cultural (socio) variables, such as age, gender, and political attitude, representing minority and majority groups in society.
We find substantial challenges in incorporating perspectivism, especially when aiming for personalization based solely on the text of arguments without explicitly providing socio profiles.
While we bootstrap perspective argument retrieval, further research is essential to optimize retrieval systems to facilitate personalization and reduce polarization.
arXiv Detail & Related papers (2024-07-29T03:14:57Z) - ExcluIR: Exclusionary Neural Information Retrieval [74.08276741093317]
We present ExcluIR, a set of resources for exclusionary retrieval.
evaluation benchmark includes 3,452 high-quality exclusionary queries.
training set contains 70,293 exclusionary queries, each paired with a positive document and a negative document.
arXiv Detail & Related papers (2024-04-26T09:43:40Z) - INSTRUCTIR: A Benchmark for Instruction Following of Information
Retrieval Models [32.16908034520376]
retrievers often only prioritize query information without delving into the users' intended search context.
We propose a novel benchmark,INSTRUCTIR, specifically designed to evaluate instruction-following ability in information retrieval tasks.
We observe that retrievers fine-tuned to follow task-style instructions, such as INSTRUCTOR, can underperform compared to their non-instruction-tuned counterparts.
arXiv Detail & Related papers (2024-02-22T06:59:50Z) - Perspectives on Large Language Models for Relevance Judgment [56.935731584323996]
Large language models (LLMs) claim that they can assist with relevance judgments.
It is not clear whether automated judgments can reliably be used in evaluations of retrieval systems.
arXiv Detail & Related papers (2023-04-13T13:08:38Z) - Exposing Query Identification for Search Transparency [69.06545074617685]
We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems.
We derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI.
arXiv Detail & Related papers (2021-10-14T20:19:27Z) - End-to-End Training of Multi-Document Reader and Retriever for
Open-Domain Question Answering [36.80395759543162]
We present an end-to-end differentiable training method for retrieval-augmented open-domain question answering systems.
We model retrieval decisions as latent variables over sets of relevant documents.
Our proposed method outperforms all existing approaches of comparable size by 2-3% exact match points.
arXiv Detail & Related papers (2021-06-09T19:25:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.