Related papers: Fair Abstractive Summarization of Diverse Perspectives

Fair Abstractive Summarization of Diverse Perspectives

URL: http://arxiv.org/abs/2311.07884v2
Date: Sat, 30 Mar 2024 03:54:06 GMT
Title: Fair Abstractive Summarization of Diverse Perspectives
Authors: Yusen Zhang, Nan Zhang, Yixin Liu, Alexander Fabbri, Junru Liu, Ryo Kamoi, Xiaoxin Lu, Caiming Xiong, Jieyu Zhao, Dragomir Radev, Kathleen McKeown, Rui Zhang,
Abstract summary: A fair summary should provide a comprehensive coverage of diverse perspectives without underrepresenting certain groups. We first formally define fairness in abstractive summarization as not underrepresenting perspectives of any groups of people. We propose four reference-free automatic metrics by measuring the differences between target and source perspectives.
Score: 103.08300574459783
License: http://creativecommons.org/licenses/by/4.0/
Abstract: People from different social and demographic groups express diverse perspectives and conflicting opinions on a broad set of topics such as product reviews, healthcare, law, and politics. A fair summary should provide a comprehensive coverage of diverse perspectives without underrepresenting certain groups. However, current work in summarization metrics and Large Language Models (LLMs) evaluation has not explored fair abstractive summarization. In this paper, we systematically investigate fair abstractive summarization for user-generated data. We first formally define fairness in abstractive summarization as not underrepresenting perspectives of any groups of people, and we propose four reference-free automatic metrics by measuring the differences between target and source perspectives. We evaluate nine LLMs, including three GPT models, four LLaMA models, PaLM 2, and Claude, on six datasets collected from social media, online reviews, and recorded transcripts. Experiments show that both the model-generated and the human-written reference summaries suffer from low fairness. We conduct a comprehensive analysis of the common factors influencing fairness and propose three simple but effective methods to alleviate unfair summarization. Our dataset and code are available at https://github.com/psunlpgroup/FairSumm.

Related papers

Coverage-based Fairness in Multi-document Summarization [26.215433658613485]
We propose a new summary-level fairness measure, Equal Coverage, based on coverage of documents with different social attribute values. We also propose a new corpus-level measure, Coverage Parity, to detect corpus-level unfairness. We find that Claude3-sonnet is the fairest among all evaluated LLMs.
arXiv Detail & Related papers (2024-12-11T22:01:30Z)
Fair Summarization: Bridging Quality and Diversity in Extractive Summaries [4.214129657411282]
We introduce two novel methods for fair extractive summarization: FairExtract and FairGPT. We evaluate these methods using Divsumm summarization dataset of White-aligned, Hispanic, and African-American dialect tweets.
arXiv Detail & Related papers (2024-11-12T03:37:53Z)
P^3SUM: Preserving Author's Perspective in News Summarization with Diffusion Language Models [57.571395694391654]
We find that existing approaches alter the political opinions and stances of news articles in more than 50% of summaries. We propose P3SUM, a diffusion model-based summarization approach controlled by political perspective classifiers. Experiments on three news summarization datasets demonstrate that P3SUM outperforms state-of-the-art summarization systems.
arXiv Detail & Related papers (2023-11-16T10:14:28Z)
Bias in News Summarization: Measures, Pitfalls and Corpora [4.917075909999548]
We introduce definitions for biased behaviours in summarization models, along with practical operationalizations. We measure gender bias in English summaries generated by both purpose-built summarization models and general purpose chat models. We find content selection in single document summarization to be largely unaffected by gender bias, while hallucinations exhibit evidence of bias.
arXiv Detail & Related papers (2023-09-14T22:20:27Z)
Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs) We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing. We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z)
Prompted Opinion Summarization with GPT-3.5 [115.95460650578678]
We show that GPT-3.5 models achieve very strong performance in human evaluation. We argue that standard evaluation metrics do not reflect this, and introduce three new metrics targeting faithfulness, factuality, and genericity.
arXiv Detail & Related papers (2022-11-29T04:06:21Z)
Template-based Abstractive Microblog Opinion Summarisation [26.777997436856076]
We introduce the task of microblog opinion summarisation (MOS) and share a dataset of 3100 gold-standard opinion summaries. The dataset contains summaries of tweets spanning a 2-year period and covers more topics than any other public Twitter summarisation dataset.
arXiv Detail & Related papers (2022-08-08T12:16:01Z)
Fair Group-Shared Representations with Normalizing Flows [68.29997072804537]
We develop a fair representation learning algorithm which is able to map individuals belonging to different groups in a single group. We show experimentally that our methodology is competitive with other fair representation learning algorithms.
arXiv Detail & Related papers (2022-01-17T10:49:49Z)
Fairness for Whom? Understanding the Reader's Perception of Fairness in Text Summarization [9.136419921943235]
We study the interplay between the fairness notions and how readers perceive them in textual summaries. Standard ROUGE evaluation metrics are unable to quantify the perceived (un)fairness of the summaries.
arXiv Detail & Related papers (2021-01-29T05:14:34Z)
Few-Shot Learning for Opinion Summarization [117.70510762845338]
Opinion summarization is the automatic creation of text reflecting subjective information expressed in multiple documents. In this work, we show that even a handful of summaries is sufficient to bootstrap generation of the summary text. Our approach substantially outperforms previous extractive and abstractive methods in automatic and human evaluation.
arXiv Detail & Related papers (2020-04-30T15:37:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.