Prompted Opinion Summarization with GPT-3.5
- URL: http://arxiv.org/abs/2211.15914v2
- Date: Tue, 23 May 2023 06:19:47 GMT
- Title: Prompted Opinion Summarization with GPT-3.5
- Authors: Adithya Bhaskar, Alexander R. Fabbri and Greg Durrett
- Abstract summary: We show that GPT-3.5 models achieve very strong performance in human evaluation.
We argue that standard evaluation metrics do not reflect this, and introduce three new metrics targeting faithfulness, factuality, and genericity.
- Score: 115.95460650578678
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models have shown impressive performance across a wide variety
of tasks, including text summarization. In this paper, we show that this strong
performance extends to opinion summarization. We explore several pipeline
methods for applying GPT-3.5 to summarize a large collection of user reviews in
a prompted fashion. To handle arbitrarily large numbers of user reviews, we
explore recursive summarization as well as methods for selecting salient
content to summarize through supervised clustering or extraction. On two
datasets, an aspect-oriented summarization dataset of hotel reviews (SPACE) and
a generic summarization dataset of Amazon and Yelp reviews (FewSum), we show
that GPT-3.5 models achieve very strong performance in human evaluation. We
argue that standard evaluation metrics do not reflect this, and introduce three
new metrics targeting faithfulness, factuality, and genericity to contrast
these different methods.
Related papers
- Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs [70.15262704746378]
We propose a systematically created human-annotated dataset consisting of coherent summaries for five publicly available datasets and natural language user feedback.
Preliminary experiments with Falcon-40B and Llama-2-13B show significant performance improvements (10% Rouge-L) in terms of producing coherent summaries.
arXiv Detail & Related papers (2024-07-05T20:25:04Z) - Fair Abstractive Summarization of Diverse Perspectives [103.08300574459783]
A fair summary should provide a comprehensive coverage of diverse perspectives without underrepresenting certain groups.
We first formally define fairness in abstractive summarization as not underrepresenting perspectives of any groups of people.
We propose four reference-free automatic metrics by measuring the differences between target and source perspectives.
arXiv Detail & Related papers (2023-11-14T03:38:55Z) - Simple Yet Effective Synthetic Dataset Construction for Unsupervised
Opinion Summarization [28.52201592634964]
We propose two simple yet effective unsupervised approaches to generate both aspect-specific and general opinion summaries.
Our first approach, Seed Words Based Leave-One-Out (SW-LOO), identifies aspect-related portions of reviews simply by exact-matching aspect seed words.
Our second approach, Natural Language Inference Based Leave-One-Out (NLI-LOO), identifies aspect-related sentences utilizing an NLI model in a more general setting without using seed words.
arXiv Detail & Related papers (2023-03-21T08:08:04Z) - News Summarization and Evaluation in the Era of GPT-3 [73.48220043216087]
We study how GPT-3 compares against fine-tuned models trained on large summarization datasets.
We show that not only do humans overwhelmingly prefer GPT-3 summaries, prompted using only a task description, but these also do not suffer from common dataset-specific issues such as poor factuality.
arXiv Detail & Related papers (2022-09-26T01:04:52Z) - Efficient Few-Shot Fine-Tuning for Opinion Summarization [83.76460801568092]
Abstractive summarization models are typically pre-trained on large amounts of generic texts, then fine-tuned on tens or hundreds of thousands of annotated samples.
We show that a few-shot method based on adapters can easily store in-domain knowledge.
We show that this self-supervised adapter pre-training improves summary quality over standard fine-tuning by 2.0 and 1.3 ROUGE-L points on the Amazon and Yelp datasets.
arXiv Detail & Related papers (2022-05-04T16:38:37Z) - SummEval: Re-evaluating Summarization Evaluation [169.622515287256]
We re-evaluate 14 automatic evaluation metrics in a comprehensive and consistent fashion.
We benchmark 23 recent summarization models using the aforementioned automatic evaluation metrics.
We assemble the largest collection of summaries generated by models trained on the CNN/DailyMail news dataset.
arXiv Detail & Related papers (2020-07-24T16:25:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.