Improving Fairness of Large Language Models in Multi-document Summarization
- URL: http://arxiv.org/abs/2506.07479v2
- Date: Thu, 12 Jun 2025 05:36:59 GMT
- Title: Improving Fairness of Large Language Models in Multi-document Summarization
- Authors: Haoyuan Li, Rui Zhang, Snigdha Chaturvedi,
- Abstract summary: Fairness in multi-document summarization (MDS) is crucial for providing comprehensive views across documents with diverse social attribute values.<n>We propose FairPO, a preference tuning method that focuses on both summary-level and corpus-level fairness in MDS.<n>Our experiments show that FairPO outperforms strong baselines while maintaining the critical qualities of summaries.
- Score: 26.505839239378183
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fairness in multi-document summarization (MDS) is crucial for providing comprehensive views across documents with diverse social attribute values, which can significantly impact decision-making. For example, a summarization system that tends to overrepresent negative reviews of products can mislead customers into disregarding good products. Previous works measure fairness in MDS at two levels: summary-level and corpus-level. While summary-level fairness focuses on individual summaries, corpus-level fairness focuses on a corpus of summaries. Recent methods primarily focus on summary-level fairness. We propose FairPO, a preference tuning method that focuses on both summary-level and corpus-level fairness in MDS. To improve summary-level fairness, we propose to generate preference pairs by perturbing document sets. To improve corpus-level fairness, we propose fairness-aware preference tuning by dynamically adjusting the weights of preference pairs. Our experiments show that FairPO outperforms strong baselines while maintaining the critical qualities of summaries. The code is available at https://github.com/leehaoyuan/coverage_fairnes.
Related papers
- FedFACT: A Provable Framework for Controllable Group-Fairness Calibration in Federated Learning [13.575259448363557]
We propose a controllable group-fairness calibration framework, named FedFACT.<n>FedFACT identifies the Bayes-optimal classifiers under both global and local fairness constraints.<n>Experiments on multiple datasets demonstrate that FedFACT consistently outperforms baselines in balancing accuracy and global-local fairness.
arXiv Detail & Related papers (2025-06-04T09:39:57Z) - Estimating Commonsense Plausibility through Semantic Shifts [66.06254418551737]
We propose ComPaSS, a novel discriminative framework that quantifies commonsense plausibility by measuring semantic shifts.<n> Evaluations on two types of fine-grained commonsense plausibility estimation tasks show that ComPaSS consistently outperforms baselines.
arXiv Detail & Related papers (2025-02-19T06:31:06Z) - Fair-MoE: Fairness-Oriented Mixture of Experts in Vision-Language Models [7.808926474503611]
We propose Fair-MoE, a model specifically designed to ensure both fairness and effectiveness.<n>Fair-MoE comprises two key components: textitthe Fairness-Oriented Mixture of Experts (FO-MoE) and textitthe Fairness-Oriented Loss (FOL)
arXiv Detail & Related papers (2025-02-10T01:45:26Z) - Coverage-based Fairness in Multi-document Summarization [26.215433658613485]
We propose a new summary-level fairness measure, Equal Coverage, based on coverage of documents with different social attribute values.<n>We also propose a new corpus-level measure, Coverage Parity, to detect corpus-level unfairness.<n>We find that Claude3-sonnet is the fairest among all evaluated LLMs.
arXiv Detail & Related papers (2024-12-11T22:01:30Z) - Fair Summarization: Bridging Quality and Diversity in Extractive Summaries [4.214129657411282]
We introduce two novel methods for fair extractive summarization: FairExtract and FairGPT.<n>We evaluate these methods using Divsumm summarization dataset of White-aligned, Hispanic, and African-American dialect tweets.
arXiv Detail & Related papers (2024-11-12T03:37:53Z) - Towards Fair RAG: On the Impact of Fair Ranking in Retrieval-Augmented Generation [53.285436927963865]
We present the first comprehensive study of RAG systems that incorporate fairness-aware rankings.<n>We find that fairness-aware retrieval frequently retains or even improves ranking effectiveness and generation quality.<n>Our results underscore the importance of item-side fairness throughout both retrieval and generation phases.
arXiv Detail & Related papers (2024-09-17T23:10:04Z) - Fair Abstractive Summarization of Diverse Perspectives [103.08300574459783]
A fair summary should provide a comprehensive coverage of diverse perspectives without underrepresenting certain groups.
We first formally define fairness in abstractive summarization as not underrepresenting perspectives of any groups of people.
We propose four reference-free automatic metrics by measuring the differences between target and source perspectives.
arXiv Detail & Related papers (2023-11-14T03:38:55Z) - FFB: A Fair Fairness Benchmark for In-Processing Group Fairness Methods [84.1077756698332]
This paper introduces the Fair Fairness Benchmark (textsfFFB), a benchmarking framework for in-processing group fairness methods.
We provide a comprehensive analysis of state-of-the-art methods to ensure different notions of group fairness.
arXiv Detail & Related papers (2023-06-15T19:51:28Z) - Evaluating and Improving Factuality in Multimodal Abstractive
Summarization [91.46015013816083]
We propose CLIPBERTScore to leverage the robustness and strong factuality detection performance between image-summary and document-summary.
We show that this simple combination of two metrics in the zero-shot achieves higher correlations than existing factuality metrics for document summarization.
Our analysis demonstrates the robustness and high correlation of CLIPBERTScore and its components on four factuality metric-evaluation benchmarks.
arXiv Detail & Related papers (2022-11-04T16:50:40Z) - fairlib: A Unified Framework for Assessing and Improving Classification
Fairness [66.27822109651757]
fairlib is an open-source framework for assessing and improving classification fairness.
We implement 14 debiasing methods, including pre-processing, at-training-time, and post-processing approaches.
The built-in metrics cover the most commonly used fairness criterion and can be further generalized and customized for fairness evaluation.
arXiv Detail & Related papers (2022-05-04T03:50:23Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - Societal Biases in Retrieved Contents: Measurement Framework and
Adversarial Mitigation for BERT Rankers [9.811131801693856]
We provide a novel framework to measure the fairness in the retrieved text contents of ranking models.
We propose an adversarial bias mitigation approach applied to the state-of-the-art Bert rankers.
Our results on the MS MARCO benchmark show that, while the fairness of all ranking models is lower than the ones of ranker-agnostic baselines, the fairness in retrieved contents significantly improves when applying the proposed adversarial training.
arXiv Detail & Related papers (2021-04-28T08:53:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.