Related papers: Coverage-based Fairness in Multi-document Summarization

Coverage-based Fairness in Multi-document Summarization

URL: http://arxiv.org/abs/2412.08795v1
Date: Wed, 11 Dec 2024 22:01:30 GMT
Title: Coverage-based Fairness in Multi-document Summarization
Authors: Haoyuan Li, Yusen Zhang, Rui Zhang, Snigdha Chaturvedi,
Abstract summary: We propose a new summary-level fairness measure, Equal Coverage, based on coverage of documents with different social attribute values. We also propose a new corpus-level measure, Coverage Parity, to detect corpus-level unfairness. We find that Claude3-sonnet is the fairest among all evaluated LLMs.
Score: 26.215433658613485
License:
Abstract: Fairness in multi-document summarization (MDS) measures whether a system can generate a summary fairly representing information from documents with different social attribute values. Fairness in MDS is crucial since a fair summary can offer readers a comprehensive view. Previous works focus on quantifying summary-level fairness using Proportional Representation, a fairness measure based on Statistical Parity. However, Proportional Representation does not consider redundancy in input documents and overlooks corpus-level unfairness. In this work, we propose a new summary-level fairness measure, Equal Coverage, which is based on coverage of documents with different social attribute values and considers the redundancy within documents. To detect the corpus-level unfairness, we propose a new corpus-level measure, Coverage Parity. Our human evaluations show that our measures align more with our definition of fairness. Using our measures, we evaluate the fairness of thirteen different LLMs. We find that Claude3-sonnet is the fairest among all evaluated LLMs. We also find that almost all LLMs overrepresent different social attribute values.

Related papers

Fair Summarization: Bridging Quality and Diversity in Extractive Summaries [4.214129657411282]
We introduce two novel methods for fair extractive summarization: FairExtract and FairGPT. We evaluate these methods using Divsumm summarization dataset of White-aligned, Hispanic, and African-American dialect tweets.
arXiv Detail & Related papers (2024-11-12T03:37:53Z)
Fair Abstractive Summarization of Diverse Perspectives [103.08300574459783]
A fair summary should provide a comprehensive coverage of diverse perspectives without underrepresenting certain groups. We first formally define fairness in abstractive summarization as not underrepresenting perspectives of any groups of people. We propose four reference-free automatic metrics by measuring the differences between target and source perspectives.
arXiv Detail & Related papers (2023-11-14T03:38:55Z)
Evaluating the Fairness of Discriminative Foundation Models in Computer Vision [51.176061115977774]
We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP) We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy. Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, image retrieval and image captioning.
arXiv Detail & Related papers (2023-10-18T10:32:39Z)
Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs) We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing. We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z)
Evaluating and Improving Factuality in Multimodal Abstractive Summarization [91.46015013816083]
We propose CLIPBERTScore to leverage the robustness and strong factuality detection performance between image-summary and document-summary. We show that this simple combination of two metrics in the zero-shot achieves higher correlations than existing factuality metrics for document summarization. Our analysis demonstrates the robustness and high correlation of CLIPBERTScore and its components on four factuality metric-evaluation benchmarks.
arXiv Detail & Related papers (2022-11-04T16:50:40Z)
Measuring Fairness of Text Classifiers via Prediction Sensitivity [63.56554964580627]
ACCUMULATED PREDICTION SENSITIVITY measures fairness in machine learning models based on the model's prediction sensitivity to perturbations in input features. We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness.
arXiv Detail & Related papers (2022-03-16T15:00:33Z)
Fairness for Whom? Understanding the Reader's Perception of Fairness in Text Summarization [9.136419921943235]
We study the interplay between the fairness notions and how readers perceive them in textual summaries. Standard ROUGE evaluation metrics are unable to quantify the perceived (un)fairness of the summaries.
arXiv Detail & Related papers (2021-01-29T05:14:34Z)
Unsupervised Reference-Free Summary Quality Evaluation via Contrastive Learning [66.30909748400023]
We propose to evaluate the summary qualities without reference summaries by unsupervised contrastive learning. Specifically, we design a new metric which covers both linguistic qualities and semantic informativeness based on BERT. Experiments on Newsroom and CNN/Daily Mail demonstrate that our new evaluation method outperforms other metrics even without reference summaries.
arXiv Detail & Related papers (2020-10-05T05:04:14Z)
Machine learning fairness notions: Bridging the gap with real-world applications [4.157415305926584]
Fairness emerged as an important requirement to guarantee that Machine Learning predictive systems do not discriminate against specific individuals or entire sub-populations. This paper is a survey that illustrates the subtleties between fairness notions through a large number of examples and scenarios.
arXiv Detail & Related papers (2020-06-30T13:01:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.