Related papers: Discovering Biases in Information Retrieval Models Using Relevance Thesaurus as Global Explanation

Discovering Biases in Information Retrieval Models Using Relevance Thesaurus as Global Explanation

URL: http://arxiv.org/abs/2410.03584v1
Date: Fri, 04 Oct 2024 16:42:13 GMT
Title: Discovering Biases in Information Retrieval Models Using Relevance Thesaurus as Global Explanation
Authors: Youngwoo Kim, Razieh Rahimi, James Allan,
Abstract summary: We propose a novel method to globally explain neural relevance models by constructing a "relevance thesaurus" This thesaurus is used to augment lexical matching models such as BM25 to approximate the neural model's predictions.
Score: 23.50629779375759
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Most efforts in interpreting neural relevance models have focused on local explanations, which explain the relevance of a document to a query but are not useful in predicting the model's behavior on unseen query-document pairs. We propose a novel method to globally explain neural relevance models by constructing a "relevance thesaurus" containing semantically relevant query and document term pairs. This thesaurus is used to augment lexical matching models such as BM25 to approximate the neural model's predictions. Our method involves training a neural relevance model to score the relevance of partial query and document segments, which is then used to identify relevant terms across the vocabulary space. We evaluate the obtained thesaurus explanation based on ranking effectiveness and fidelity to the target neural ranking model. Notably, our thesaurus reveals the existence of brand name bias in ranking models, demonstrating one advantage of our explanation method.

Related papers

A Comprehensive Taxonomy of Negation for NLP and Neural Retrievers [61.086220009192424]
We introduce a taxonomy of negation that derives from philosophical, linguistic, and logical definitions.<n>We generate two benchmark datasets that can be used to evaluate the performance of neural information retrieval models.<n>We propose a logic-based classification mechanism that can be used to analyze the performance of retrieval models on existing datasets.
arXiv Detail & Related papers (2025-07-30T02:44:20Z)
Explainable Moral Values: a neuro-symbolic approach to value classification [1.4186974630564675]
This work explores the integration of ontology-based reasoning and Machine Learning techniques for explainable value classification. By relying on an ontological formalization of moral values as in the Moral Foundations Theory, the textitsandra neuro-symbolic reasoner is used to infer values that are emphsatisfied by a certain sentence. We show that only relying on the reasoner's inference results in explainable classification comparable to other more complex approaches.
arXiv Detail & Related papers (2024-10-16T14:53:13Z)
Topics in the Haystack: Extracting and Evaluating Topics beyond Coherence [0.0]
We propose a method that incorporates a deeper understanding of both sentence and document themes. This allows our model to detect latent topics that may include uncommon words or neologisms. We present correlation coefficients with human identification of intruder words and achieve near-human level results at the word-intrusion task.
arXiv Detail & Related papers (2023-03-30T12:24:25Z)
Reconciliation of Pre-trained Models and Prototypical Neural Networks in Few-shot Named Entity Recognition [35.34238362639678]
We propose a one-line-code normalization method to reconcile such a mismatch with empirical and theoretical grounds. Our work also provides an analytical viewpoint for addressing the general problems in few-shot name entity recognition.
arXiv Detail & Related papers (2022-11-07T02:33:45Z)
Incorporating Relevance Feedback for Information-Seeking Retrieval using Few-Shot Document Re-Ranking [56.80065604034095]
We introduce a kNN approach that re-ranks documents based on their similarity with the query and the documents the user considers relevant. To evaluate our different integration strategies, we transform four existing information retrieval datasets into the relevance feedback scenario.
arXiv Detail & Related papers (2022-10-19T16:19:37Z)
Better Language Model with Hypernym Class Prediction [101.8517004687825]
Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs. In this study, we revisit this approach in the context of neural LMs.
arXiv Detail & Related papers (2022-03-21T01:16:44Z)
Hierarchical Interpretation of Neural Text Classification [31.95426448656938]
This paper proposes a novel Hierarchical INTerpretable neural text classifier, called Hint, which can automatically generate explanations of model predictions. Experimental results on both review datasets and news datasets show that our proposed approach achieves text classification results on par with existing state-of-the-art text classifiers.
arXiv Detail & Related papers (2022-02-20T11:15:03Z)
Tracing Origins: Coref-aware Machine Reading Comprehension [43.352833140317486]
We imitated the human's reading process in connecting the anaphoric expressions and leverage the coreference information to enhance the word embeddings from the pre-trained model. We demonstrated that the explicit incorporation of the coreference information in fine-tuning stage performed better than the incorporation of the coreference information in training a pre-trained language models.
arXiv Detail & Related papers (2021-10-15T09:28:35Z)
Prototypical Representation Learning for Relation Extraction [56.501332067073065]
This paper aims to learn predictive, interpretable, and robust relation representations from distantly-labeled data. We learn prototypes for each relation from contextual information to best explore the intrinsic semantics of relations. Results on several relation learning tasks show that our model significantly outperforms the previous state-of-the-art relational models.
arXiv Detail & Related papers (2021-03-22T08:11:43Z)
Introducing Syntactic Structures into Target Opinion Word Extraction with Deep Learning [89.64620296557177]
We propose to incorporate the syntactic structures of the sentences into the deep learning models for targeted opinion word extraction. We also introduce a novel regularization technique to improve the performance of the deep learning models. The proposed model is extensively analyzed and achieves the state-of-the-art performance on four benchmark datasets.
arXiv Detail & Related papers (2020-10-26T07:13:17Z)
High-order Semantic Role Labeling [86.29371274587146]
This paper introduces a high-order graph structure for the neural semantic role labeling model. It enables the model to explicitly consider not only the isolated predicate-argument pairs but also the interaction between the predicate-argument pairs. Experimental results on 7 languages of the CoNLL-2009 benchmark show that the high-order structural learning techniques are beneficial to the strong performing SRL models.
arXiv Detail & Related papers (2020-10-09T15:33:54Z)
Learning from Context or Names? An Empirical Study on Neural Relation Extraction [112.06614505580501]
We study the effect of two main information sources in text: textual context and entity mentions (names) We propose an entity-masked contrastive pre-training framework for relation extraction (RE) Our framework can improve the effectiveness and robustness of neural models in different RE scenarios.
arXiv Detail & Related papers (2020-10-05T11:21:59Z)
Leveraging Cognitive Search Patterns to Enhance Automated Natural Language Retrieval Performance [0.0]
We show that cognitive reformulation patterns that mimic user search behaviour are highlighted. We formalize the application of these patterns by considering a query conceptual representation. A genetic algorithm-based weighting process allows placing emphasis on terms according to their conceptual role-type.
arXiv Detail & Related papers (2020-04-21T14:13:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.