Fair Abstractive Summarization of Diverse Perspectives
- URL: http://arxiv.org/abs/2311.07884v2
- Date: Sat, 30 Mar 2024 03:54:06 GMT
- Title: Fair Abstractive Summarization of Diverse Perspectives
- Authors: Yusen Zhang, Nan Zhang, Yixin Liu, Alexander Fabbri, Junru Liu, Ryo Kamoi, Xiaoxin Lu, Caiming Xiong, Jieyu Zhao, Dragomir Radev, Kathleen McKeown, Rui Zhang,
- Abstract summary: A fair summary should provide a comprehensive coverage of diverse perspectives without underrepresenting certain groups.
We first formally define fairness in abstractive summarization as not underrepresenting perspectives of any groups of people.
We propose four reference-free automatic metrics by measuring the differences between target and source perspectives.
- Score: 103.08300574459783
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: People from different social and demographic groups express diverse perspectives and conflicting opinions on a broad set of topics such as product reviews, healthcare, law, and politics. A fair summary should provide a comprehensive coverage of diverse perspectives without underrepresenting certain groups. However, current work in summarization metrics and Large Language Models (LLMs) evaluation has not explored fair abstractive summarization. In this paper, we systematically investigate fair abstractive summarization for user-generated data. We first formally define fairness in abstractive summarization as not underrepresenting perspectives of any groups of people, and we propose four reference-free automatic metrics by measuring the differences between target and source perspectives. We evaluate nine LLMs, including three GPT models, four LLaMA models, PaLM 2, and Claude, on six datasets collected from social media, online reviews, and recorded transcripts. Experiments show that both the model-generated and the human-written reference summaries suffer from low fairness. We conduct a comprehensive analysis of the common factors influencing fairness and propose three simple but effective methods to alleviate unfair summarization. Our dataset and code are available at https://github.com/psunlpgroup/FairSumm.
Related papers
- Summarization of Opinionated Political Documents with Varied Perspectives [11.399915001583059]
Models capable of generating accurate summaries of diverse perspectives can help reduce such polarization by exposing users to alternative perspectives.
We introduce a novel dataset and task for independently summarizing each political perspective in a set of passages from opinionated news articles.
We benchmark 10 models of varying sizes and architectures through both automatic and human evaluation.
arXiv Detail & Related papers (2024-11-06T18:14:48Z) - P^3SUM: Preserving Author's Perspective in News Summarization with Diffusion Language Models [57.571395694391654]
We find that existing approaches alter the political opinions and stances of news articles in more than 50% of summaries.
We propose P3SUM, a diffusion model-based summarization approach controlled by political perspective classifiers.
Experiments on three news summarization datasets demonstrate that P3SUM outperforms state-of-the-art summarization systems.
arXiv Detail & Related papers (2023-11-16T10:14:28Z) - Bias in News Summarization: Measures, Pitfalls and Corpora [4.917075909999548]
We introduce definitions for biased behaviours in summarization models, along with practical operationalizations.
We measure gender bias in English summaries generated by both purpose-built summarization models and general purpose chat models.
We find content selection in single document summarization to be largely unaffected by gender bias, while hallucinations exhibit evidence of bias.
arXiv Detail & Related papers (2023-09-14T22:20:27Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Prompted Opinion Summarization with GPT-3.5 [115.95460650578678]
We show that GPT-3.5 models achieve very strong performance in human evaluation.
We argue that standard evaluation metrics do not reflect this, and introduce three new metrics targeting faithfulness, factuality, and genericity.
arXiv Detail & Related papers (2022-11-29T04:06:21Z) - Template-based Abstractive Microblog Opinion Summarisation [26.777997436856076]
We introduce the task of microblog opinion summarisation (MOS) and share a dataset of 3100 gold-standard opinion summaries.
The dataset contains summaries of tweets spanning a 2-year period and covers more topics than any other public Twitter summarisation dataset.
arXiv Detail & Related papers (2022-08-08T12:16:01Z) - Fair Group-Shared Representations with Normalizing Flows [68.29997072804537]
We develop a fair representation learning algorithm which is able to map individuals belonging to different groups in a single group.
We show experimentally that our methodology is competitive with other fair representation learning algorithms.
arXiv Detail & Related papers (2022-01-17T10:49:49Z) - Fairness for Whom? Understanding the Reader's Perception of Fairness in
Text Summarization [9.136419921943235]
We study the interplay between the fairness notions and how readers perceive them in textual summaries.
Standard ROUGE evaluation metrics are unable to quantify the perceived (un)fairness of the summaries.
arXiv Detail & Related papers (2021-01-29T05:14:34Z) - Few-Shot Learning for Opinion Summarization [117.70510762845338]
Opinion summarization is the automatic creation of text reflecting subjective information expressed in multiple documents.
In this work, we show that even a handful of summaries is sufficient to bootstrap generation of the summary text.
Our approach substantially outperforms previous extractive and abstractive methods in automatic and human evaluation.
arXiv Detail & Related papers (2020-04-30T15:37:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.