A Case Study of Balanced Query Recommendation on Wikipedia
- URL: http://arxiv.org/abs/2508.20399v1
- Date: Thu, 28 Aug 2025 03:52:31 GMT
- Title: A Case Study of Balanced Query Recommendation on Wikipedia
- Authors: Harshit Mishra, Sucheta Soundarajan,
- Abstract summary: We present a case study of BalancedQR using an extension of BalancedQR that handles biases in multiple dimensions.<n>We evaluate the extended version of BalancedQR on a Wikipedia dataset.
- Score: 1.143020642249583
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Modern IR systems are an extremely important tool for seeking information. In addition to search, such systems include a number of query reformulation methods, such as query expansion and query recommendations, to provide high quality results. However, results returned by such methods sometimes exhibit undesirable or wrongful bias with respect to protected categories such as gender or race. Our earlier work considered the problem of balanced query recommendation, where instead of re-ranking a list of results based on fairness measures, the goal was to suggest queries that are relevant to a user's search query but exhibit less bias than the original query. In this work, we present a case study of BalancedQR using an extension of BalancedQR that handles biases in multiple dimensions. It employs a Pareto front approach that finds balanced queries, optimizing for multiple objectives such as gender bias and regional bias, along with the relevance of returned results. We evaluate the extended version of BalancedQR on a Wikipedia dataset.Our results demonstrate the effectiveness of our extension to BalancedQR framework and highlight the significant impact of subtle query wording,linguistic choice on retrieval.
Related papers
- Decomposed Reasoning with Reinforcement Learning for Relevance Assessment in UGC Platforms [30.51899823655511]
Retrieval-augmented generation (RAG) plays a critical role in user-generated content platforms.<n> platforms present unique challenges: 1) ambiguous user intent due to sparse user feedback in RAG scenarios, and 2) substantial noise introduced by informal and unstructured language.
arXiv Detail & Related papers (2025-08-04T15:14:09Z) - Investigating the Robustness of Retrieval-Augmented Generation at the Query Level [4.3028340012580975]
Retrieval-augmented generation (RAG) has been proposed as a solution that dynamically incorporates external knowledge during inference.<n>Despite its promise, RAG systems face practical challenges-most notably, a strong dependence on the quality of the input query for accurate retrieval.
arXiv Detail & Related papers (2025-07-09T15:39:17Z) - QE-RAG: A Robust Retrieval-Augmented Generation Benchmark for Query Entry Errors [23.225358970952197]
Retriever-augmented generation (RAG) has become a widely adopted approach for enhancing the factual accuracy of large language models (LLMs)<n>QE-RAG is the first robust RAG benchmark designed specifically to evaluate performance against query entry errors.<n>We propose a contrastive learning-based robust retriever training method and a retrieval-augmented query correction method.
arXiv Detail & Related papers (2025-04-05T05:24:08Z) - Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence [56.09494651178128]
Retrieval models are commonly used in Information Retrieval (IR) applications, such as Retrieval-Augmented Generation (RAG)<n>We quantify the impact of biases, such as a preference for shorter documents, on retrievers like Dragon+ and Contriever.<n>We uncover major vulnerabilities, showing retrievers favor shorter documents, early positions, repeated entities, and literal matches, all while ignoring the answer's presence!
arXiv Detail & Related papers (2025-03-06T23:23:13Z) - Mitigating Bias for Question Answering Models by Tracking Bias Influence [84.66462028537475]
We propose BMBI, an approach to mitigate the bias of multiple-choice QA models.
Based on the intuition that a model would lean to be more biased if it learns from a biased example, we measure the bias level of a query instance.
We show that our method could be applied to multiple QA formulations across multiple bias categories.
arXiv Detail & Related papers (2023-10-13T00:49:09Z) - ReFIT: Relevance Feedback from a Reranker during Inference [109.33278799999582]
Retrieve-and-rerank is a prevalent framework in neural information retrieval.
We propose to leverage the reranker to improve recall by making it provide relevance feedback to the retriever at inference time.
arXiv Detail & Related papers (2023-05-19T15:30:33Z) - LoL: A Comparative Regularization Loss over Query Reformulation Losses
for Pseudo-Relevance Feedback [70.44530794897861]
Pseudo-relevance feedback (PRF) has proven to be an effective query reformulation technique to improve retrieval accuracy.
Existing PRF methods independently treat revised queries originating from the same query but using different numbers of feedback documents.
We propose the Loss-over-Loss (LoL) framework to compare the reformulation losses between different revisions of the same query during training.
arXiv Detail & Related papers (2022-04-25T10:42:50Z) - Does Recommend-Revise Produce Reliable Annotations? An Analysis on
Missing Instances in DocRED [60.39125850987604]
We show that a textit-revise scheme results in false negative samples and an obvious bias towards popular entities and relations.
The relabeled dataset is released to serve as a more reliable test set of document RE models.
arXiv Detail & Related papers (2022-04-17T11:29:01Z) - Surprise: Result List Truncation via Extreme Value Theory [92.5817701697342]
We propose a statistical method that produces interpretable and calibrated relevance scores at query time using nothing more than the ranked scores.
We demonstrate its effectiveness on the result list truncation task across image, text, and IR datasets.
arXiv Detail & Related papers (2020-10-19T19:15:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.