T-Retrievability: A Topic-Focused Approach to Measure Fair Document Exposure in Information Retrieval
- URL: http://arxiv.org/abs/2508.21704v1
- Date: Fri, 29 Aug 2025 15:14:16 GMT
- Title: T-Retrievability: A Topic-Focused Approach to Measure Fair Document Exposure in Information Retrieval
- Authors: Xuejun Chang, Zaiqiao Meng, Debasis Ganguly,
- Abstract summary: We propose a topic-focused localised retrievability measure, which first computes retrievability scores over multiple groups of topically-related documents.<n>Our analysis uncovers new insights into the exposure characteristics of various neural ranking models.
- Score: 22.953432572278597
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrievability of a document is a collection-based statistic that measures its expected (reciprocal) rank of being retrieved within a specific rank cut-off. A collection with uniformly distributed retrievability scores across documents is an indicator of fair document exposure. While retrievability scores have been used to quantify the fairness of exposure for a collection, in our work, we use the distribution of retrievability scores to measure the exposure bias of retrieval models. We hypothesise that an uneven distribution of retrievability scores across the entire collection may not accurately reflect exposure bias but rather indicate variations in topical relevance. As a solution, we propose a topic-focused localised retrievability measure, which we call \textit{T-Retrievability} (topic-retrievability), which first computes retrievability scores over multiple groups of topically-related documents, and then aggregates these localised values to obtain the collection-level statistics. Our analysis using this proposed T-Retrievability measure uncovers new insights into the exposure characteristics of various neural ranking models. The findings suggest that this localised measure provides a more nuanced understanding of exposure fairness, offering a more reliable approach for assessing document accessibility in IR systems.
Related papers
- Contextual Relevance and Adaptive Sampling for LLM-Based Document Reranking [16.036042734987024]
We propose contextual relevance, which we define as the probability that a document is relevant to a given query.<n>To efficiently estimate contextual relevance, we propose TS-SetRank, a sampling-based, uncertainty-aware reranking algorithm.<n> Empirically, TS-SetRank improves nDCG@10 over retrieval and reranking baselines by 15-25% on BRIGHT and 6-21% on BEIR.
arXiv Detail & Related papers (2025-11-03T04:03:32Z) - Improving Document Retrieval Coherence for Semantically Equivalent Queries [63.97649988164166]
We propose a variation of the Multi-Negative Ranking loss for training DR that improves the coherence of models in retrieving the same documents.<n>The loss penalizes discrepancies between the top-k ranked documents retrieved for diverse but semantic equivalent queries.
arXiv Detail & Related papers (2025-08-11T13:34:59Z) - Balancing Tails when Comparing Distributions: Comprehensive Equity Index (CEI) with Application to Bias Evaluation in Operational Face Biometrics [47.762333925222926]
Comprehensive Equity Index (CEI) is a novel metric designed to detect demographic bias in face recognition systems.<n>Our experiments confirm CEI's superior ability to detect nuanced biases where previous methods fall short.<n>CEI provides a robust and sensitive tool for operational fairness assessment.
arXiv Detail & Related papers (2025-06-12T10:43:31Z) - FAIR-QR: Enhancing Fairness-aware Information Retrieval through Query Refinement [1.8577028544235155]
We propose a novel framework that refines query keywords to retrieve documents from underrepresented groups and achieve group fairness.<n>Our method not only shows promising retrieval results regarding relevance and fairness but also interpretability by showing refined keywords used at each iteration.
arXiv Detail & Related papers (2025-03-27T02:10:19Z) - Exploring Information Retrieval Landscapes: An Investigation of a Novel Evaluation Techniques and Comparative Document Splitting Methods [0.0]
In this study, the structured nature of textbooks, the conciseness of articles, and the narrative complexity of novels are shown to require distinct retrieval strategies.
A novel evaluation technique is introduced, utilizing an open-source model to generate a comprehensive dataset of question-and-answer pairs.
The evaluation employs weighted scoring metrics, including SequenceMatcher, BLEU, METEOR, and BERT Score, to assess the system's accuracy and relevance.
arXiv Detail & Related papers (2024-09-13T02:08:47Z) - Thinking Racial Bias in Fair Forgery Detection: Models, Datasets and Evaluations [63.52709761339949]
We first contribute a dedicated dataset called the Fair Forgery Detection (FairFD) dataset, where we prove the racial bias of public state-of-the-art (SOTA) methods.<n>We design novel metrics including Approach Averaged Metric and Utility Regularized Metric, which can avoid deceptive results.<n>We also present an effective and robust post-processing technique, Bias Pruning with Fair Activations (BPFA), which improves fairness without requiring retraining or weight updates.
arXiv Detail & Related papers (2024-07-19T14:53:18Z) - Investigating Distributions of Telecom Adapted Sentence Embeddings for Document Retrieval [12.135498957287004]
We evaluate embeddings obtained from publicly available models and their domain-adapted variants.<n>We establish a systematic method to obtain thresholds for similarity scores for different embeddings.<n>We show that embeddings for domain-specific sentences have little overlap with those for domain-agnostic ones.
arXiv Detail & Related papers (2024-06-18T07:03:34Z) - Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness [56.42192735214931]
retrievers are expected to not only rely on the semantic relevance between the documents and the queries but also recognize the nuanced intents or perspectives behind a user query.
In this work, we study whether retrievers can recognize and respond to different perspectives of the queries.
We show that current retrievers have limited awareness of subtly different perspectives in queries and can also be biased toward certain perspectives.
arXiv Detail & Related papers (2024-05-04T17:10:00Z) - Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - Exposing Query Identification for Search Transparency [69.06545074617685]
We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems.
We derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI.
arXiv Detail & Related papers (2021-10-14T20:19:27Z) - A Training-free and Reference-free Summarization Evaluation Metric via
Centrality-weighted Relevance and Self-referenced Redundancy [60.419107377879925]
We propose a training-free and reference-free summarization evaluation metric.
Our metric consists of a centrality-weighted relevance score and a self-referenced redundancy score.
Our methods can significantly outperform existing methods on both multi-document and single-document summarization evaluation.
arXiv Detail & Related papers (2021-06-26T05:11:27Z) - Societal Biases in Retrieved Contents: Measurement Framework and
Adversarial Mitigation for BERT Rankers [9.811131801693856]
We provide a novel framework to measure the fairness in the retrieved text contents of ranking models.
We propose an adversarial bias mitigation approach applied to the state-of-the-art Bert rankers.
Our results on the MS MARCO benchmark show that, while the fairness of all ranking models is lower than the ones of ranker-agnostic baselines, the fairness in retrieved contents significantly improves when applying the proposed adversarial training.
arXiv Detail & Related papers (2021-04-28T08:53:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.