Related papers: Towards Fair RAG: On the Impact of Fair Ranking in Retrieval-Augmented Generation

Towards Fair RAG: On the Impact of Fair Ranking in Retrieval-Augmented Generation

URL: http://arxiv.org/abs/2409.11598v1
Date: Tue, 17 Sep 2024 23:10:04 GMT
Title: Towards Fair RAG: On the Impact of Fair Ranking in Retrieval-Augmented Generation
Authors: To Eun Kim, Fernando Diaz,
Abstract summary: This paper presents the first systematic evaluation of RAG systems integrated with fair rankings. We focus specifically on measuring the fair exposure of each relevant item across the rankings utilized by RAG systems. Our findings indicate that RAG systems with fair rankings can maintain a high level of generation quality and, in many cases, even outperform traditional RAG systems.
Score: 53.285436927963865
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many language models now enhance their responses with retrieval capabilities, leading to the widespread adoption of retrieval-augmented generation (RAG) systems. However, despite retrieval being a core component of RAG, much of the research in this area overlooks the extensive body of work on fair ranking, neglecting the importance of considering all stakeholders involved. This paper presents the first systematic evaluation of RAG systems integrated with fair rankings. We focus specifically on measuring the fair exposure of each relevant item across the rankings utilized by RAG systems (i.e., item-side fairness), aiming to promote equitable growth for relevant item providers. To gain a deep understanding of the relationship between item-fairness, ranking quality, and generation quality in the context of RAG, we analyze nine different RAG systems that incorporate fair rankings across seven distinct datasets. Our findings indicate that RAG systems with fair rankings can maintain a high level of generation quality and, in many cases, even outperform traditional RAG systems, despite the general trend of a tradeoff between ensuring fairness and maintaining system-effectiveness. We believe our insights lay the groundwork for responsible and equitable RAG systems and open new avenues for future research. We publicly release our codebase and dataset at https://github.com/kimdanny/Fair-RAG.

Related papers

A Reproducibility Study of Product-side Fairness in Bundle Recommendation [50.09508982179837]
We study product-side fairness in bundle recommendation (BR) across three real-world datasets.<n>Our results show that exposure patterns differ notably between bundles and items, revealing the need for fairness interventions.<n>We also find that fairness assessments vary considerably depending on the metric used, reinforcing the need for multi-faceted evaluation.
arXiv Detail & Related papers (2025-07-18T20:06:39Z)
Fairness under Competition [10.003345361182628]
We consider the effects of adopting fair classifiers on the overall level of ecosystem fairness.<n>We show that even if competing classifiers are individually fair, the ecosystem's outcome may be unfair.
arXiv Detail & Related papers (2025-05-22T06:43:15Z)
The Other Side of the Coin: Exploring Fairness in Retrieval-Augmented Generation [73.16564415490113]
Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by retrieving relevant document from external knowledge sources. We propose two approaches, FairFT and FairFilter, to mitigate the fairness issues introduced by RAG for small-scale LLMs.
arXiv Detail & Related papers (2025-04-11T10:17:10Z)
FAIR-QR: Enhancing Fairness-aware Information Retrieval through Query Refinement [1.8577028544235155]
We propose a novel framework that refines query keywords to retrieve documents from underrepresented groups and achieve group fairness. Our method not only shows promising retrieval results regarding relevance and fairness but also interpretability by showing refined keywords used at each iteration.
arXiv Detail & Related papers (2025-03-27T02:10:19Z)
FairDgcl: Fairness-aware Recommendation with Dynamic Graph Contrastive Learning [48.38344934125999]
We study how to implement high-quality data augmentation to improve recommendation fairness. Specifically, we propose FairDgcl, a dynamic graph adversarial contrastive learning framework. We show that FairDgcl can simultaneously generate enhanced representations that possess both fairness and accuracy.
arXiv Detail & Related papers (2024-10-23T04:43:03Z)
Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage [74.70255719194819]
We introduce a novel framework based on sub-question coverage, which measures how well a RAG system addresses different facets of a question. We use this framework to evaluate three commercial generative answer engines: You.com, Perplexity AI, and Bing Chat. We find that while all answer engines cover core sub-questions more often than background or follow-up ones, they still miss around 50% of core sub-questions.
arXiv Detail & Related papers (2024-10-20T22:59:34Z)
Does RAG Introduce Unfairness in LLMs? Evaluating Fairness in Retrieval-Augmented Generation Systems [18.926129063000264]
RAG (Retrieval-Augmented Generation) have recently gained significant attention for their enhanced ability to integrate external knowledge sources. We propose a fairness evaluation framework tailored to RAG methods, using scenario-based questions and analyzing disparities across demographic attributes.
arXiv Detail & Related papers (2024-09-29T22:04:26Z)
Trustworthiness in Retrieval-Augmented Generation Systems: A Survey [59.26328612791924]
Retrieval-Augmented Generation (RAG) has quickly grown into a pivotal paradigm in the development of Large Language Models (LLMs) We propose a unified framework that assesses the trustworthiness of RAG systems across six key dimensions: factuality, robustness, fairness, transparency, accountability, and privacy.
arXiv Detail & Related papers (2024-09-16T09:06:44Z)
RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation [61.14660526363607]
We propose a fine-grained evaluation framework, RAGChecker, that incorporates a suite of diagnostic metrics for both the retrieval and generation modules. RAGChecker has significantly better correlations with human judgments than other evaluation metrics. The metrics of RAGChecker can guide researchers and practitioners in developing more effective RAG systems.
arXiv Detail & Related papers (2024-08-15T10:20:54Z)
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework [69.4501863547618]
This paper introduces RAGEval, a framework designed to assess RAG systems across diverse scenarios. With a focus on factual accuracy, we propose three novel metrics Completeness, Hallucination, and Irrelevance. Experimental results show that RAGEval outperforms zero-shot and one-shot methods in terms of clarity, safety, conformity, and richness of generated samples.
arXiv Detail & Related papers (2024-08-02T13:35:11Z)
Thinking Racial Bias in Fair Forgery Detection: Models, Datasets and Evaluations [63.52709761339949]
We first contribute a dedicated dataset called the Fair Forgery Detection (FairFD) dataset, where we prove the racial bias of public state-of-the-art (SOTA) methods. We design novel metrics including Approach Averaged Metric and Utility Regularized Metric, which can avoid deceptive results. We also present an effective and robust post-processing technique, Bias Pruning with Fair Activations (BPFA), which improves fairness without requiring retraining or weight updates.
arXiv Detail & Related papers (2024-07-19T14:53:18Z)
RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems [0.0]
Retrieval-Augmented Generation (RAG) has become a standard architectural pattern for domain-specific knowledge into user-facing chat applications. We introduce RAGBench: the first comprehensive, large-scale RAG benchmark dataset of 100k examples. We formalize the TRACe evaluation framework: a set of explainable and actionable RAG evaluation metrics applicable across all RAG domains.
arXiv Detail & Related papers (2024-06-25T20:23:15Z)
Evaluation of Retrieval-Augmented Generation: A Survey [13.633909177683462]
We provide a comprehensive overview of the evaluation and benchmarks of Retrieval-Augmented Generation (RAG) systems. Specifically, we examine and compare several quantifiable metrics of the Retrieval and Generation components, such as relevance, accuracy, and faithfulness. We then analyze the various datasets and metrics, discuss the limitations of current benchmarks, and suggest potential directions to advance the field of RAG benchmarks.
arXiv Detail & Related papers (2024-05-13T02:33:25Z)
Fairness in Reinforcement Learning: A Survey [0.0]
We survey the literature to provide the most up-to-date snapshot of the frontiers of fairness in reinforcement learning. We highlight the methodologies researchers used to implement fairness in single- and multi-agent RL systems. We critically examine gaps in the literature, such as understanding fairness in the context of RLHF.
arXiv Detail & Related papers (2024-05-11T04:36:46Z)
Towards a Search Engine for Machines: Unified Ranking for Multiple Retrieval-Augmented Large Language Models [21.115495457454365]
uRAG is a framework with a unified retrieval engine that serves multiple downstream retrieval-augmented generation (RAG) systems. We build a large-scale experimentation ecosystem consisting of 18 RAG systems that engage in training and 18 unknown RAG systems that use the uRAG as the new users of the search engine.
arXiv Detail & Related papers (2024-04-30T19:51:37Z)
ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems [46.522527144802076]
We introduce ARES, an Automated RAG Evaluation System, for evaluating RAG systems. ARES finetunes lightweight LM judges to assess the quality of individual RAG components. We make our code and datasets publicly available on Github.
arXiv Detail & Related papers (2023-11-16T00:39:39Z)
Fairness Score and Process Standardization: Framework for Fairness Certification in Artificial Intelligence Systems [0.4297070083645048]
We propose a novel Fairness Score to measure the fairness of a data-driven AI system. It will also provide a framework to operationalise the concept of fairness and facilitate the commercial deployment of such systems.
arXiv Detail & Related papers (2022-01-10T15:45:12Z)
Incentives for Item Duplication under Fair Ranking Policies [69.14168955766847]
We study the behaviour of different fair ranking policies in the presence of duplicates. We find that fairness-aware ranking policies may conflict with diversity, due to their potential to incentivize duplication more than policies solely focused on relevance.
arXiv Detail & Related papers (2021-10-29T11:11:15Z)
Societal Biases in Retrieved Contents: Measurement Framework and Adversarial Mitigation for BERT Rankers [9.811131801693856]
We provide a novel framework to measure the fairness in the retrieved text contents of ranking models. We propose an adversarial bias mitigation approach applied to the state-of-the-art Bert rankers. Our results on the MS MARCO benchmark show that, while the fairness of all ranking models is lower than the ones of ranker-agnostic baselines, the fairness in retrieved contents significantly improves when applying the proposed adversarial training.
arXiv Detail & Related papers (2021-04-28T08:53:54Z)
"And the Winner Is...": Dynamic Lotteries for Multi-group Fairness-Aware Recommendation [37.35485045640196]
We argue that the previous literature has been based on simple, uniform and often uni-dimensional notions of fairness assumptions. We explicitly represent the design decisions that enter into the trade-off between accuracy and fairness across multiply-defined and intersecting protected groups. We formulate lottery-based mechanisms for choosing between fairness concerns, and demonstrate their performance in two recommendation domains.
arXiv Detail & Related papers (2020-09-05T20:15:14Z)
Overview of the TREC 2019 Fair Ranking Track [65.15263872493799]
The goal of the TREC Fair Ranking track was to develop a benchmark for evaluating retrieval systems in terms of fairness to different content providers. This paper presents an overview of the track, including the task definition, descriptions of the data and the annotation process.
arXiv Detail & Related papers (2020-03-25T21:34:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.