Extractive Summarization via ChatGPT for Faithful Summary Generation
- URL: http://arxiv.org/abs/2304.04193v2
- Date: Mon, 9 Oct 2023 23:40:26 GMT
- Title: Extractive Summarization via ChatGPT for Faithful Summary Generation
- Authors: Haopeng Zhang, Xiao Liu, Jiawei Zhang
- Abstract summary: This paper presents a thorough evaluation of ChatGPT's performance on extractive summarization.
We find that ChatGPT exhibits inferior extractive summarization performance in terms of ROUGE scores compared to existing supervised systems.
Applying an extract-then-generate pipeline with ChatGPT yields significant performance improvements over abstractive baselines in terms of summary faithfulness.
- Score: 12.966825834765814
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extractive summarization is a crucial task in natural language processing
that aims to condense long documents into shorter versions by directly
extracting sentences. The recent introduction of large language models has
attracted significant interest in the NLP community due to its remarkable
performance on a wide range of downstream tasks. This paper first presents a
thorough evaluation of ChatGPT's performance on extractive summarization and
compares it with traditional fine-tuning methods on various benchmark datasets.
Our experimental analysis reveals that ChatGPT exhibits inferior extractive
summarization performance in terms of ROUGE scores compared to existing
supervised systems, while achieving higher performance based on LLM-based
evaluation metrics. In addition, we explore the effectiveness of in-context
learning and chain-of-thought reasoning for enhancing its performance.
Furthermore, we find that applying an extract-then-generate pipeline with
ChatGPT yields significant performance improvements over abstractive baselines
in terms of summary faithfulness. These observations highlight potential
directions for enhancing ChatGPT's capabilities in faithful summarization using
two-stage approaches.
Related papers
- Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs [70.15262704746378]
We propose a systematically created human-annotated dataset consisting of coherent summaries for five publicly available datasets and natural language user feedback.
Preliminary experiments with Falcon-40B and Llama-2-13B show significant performance improvements (10% Rouge-L) in terms of producing coherent summaries.
arXiv Detail & Related papers (2024-07-05T20:25:04Z) - Information-Theoretic Distillation for Reference-less Summarization [67.51150817011617]
We present a novel framework to distill a powerful summarizer based on the information-theoretic objective for summarization.
We start off from Pythia-2.8B as the teacher model, which is not yet capable of summarization.
We arrive at a compact but powerful summarizer with only 568M parameters that performs competitively against ChatGPT.
arXiv Detail & Related papers (2024-03-20T17:42:08Z) - Chatbots Are Not Reliable Text Annotators [0.0]
ChatGPT is a closed-source product which has major drawbacks with regards to transparency, cost, and data protection.
Recent advances in open-source (OS) large language models (LLMs) offer alternatives which remedy these challenges.
arXiv Detail & Related papers (2023-11-09T22:28:14Z) - Pushing the Limits of ChatGPT on NLP Tasks [79.17291002710517]
Despite the success of ChatGPT, its performances on most NLP tasks are still well below the supervised baselines.
In this work, we looked into the causes, and discovered that its subpar performance was caused by the following factors.
We propose a collection of general modules to address these issues, in an attempt to push the limits of ChatGPT on NLP tasks.
arXiv Detail & Related papers (2023-06-16T09:40:05Z) - ChatGPT as a Factual Inconsistency Evaluator for Text Summarization [17.166794984161964]
We show that ChatGPT can evaluate factual inconsistency under a zero-shot setting.
It generally outperforms previous evaluation metrics on binary entailment inference, summary ranking, and consistency rating.
However, a closer inspection of ChatGPT's output reveals certain limitations including its preference for more lexically similar candidates, false reasoning, and inadequate understanding of instructions.
arXiv Detail & Related papers (2023-03-27T22:30:39Z) - Is ChatGPT a Good NLG Evaluator? A Preliminary Study [121.77986688862302]
We provide a preliminary meta-evaluation on ChatGPT to show its reliability as an NLG metric.
Experimental results show that compared with previous automatic metrics, ChatGPT achieves state-of-the-art or competitive correlation with human judgments.
We hope our preliminary study could prompt the emergence of a general-purposed reliable NLG metric.
arXiv Detail & Related papers (2023-03-07T16:57:20Z) - Exploring the Limits of ChatGPT for Query or Aspect-based Text
Summarization [28.104696513516117]
Large language models (LLMs) like GPT3 and ChatGPT have recently created significant interest in using these models for text summarization tasks.
Recent studies citegoyal2022news, zhang2023benchmarking have shown that LLMs-generated news summaries are already on par with humans.
Our experiments reveal that ChatGPT's performance is comparable to traditional fine-tuning methods in terms of Rouge scores.
arXiv Detail & Related papers (2023-02-16T04:41:30Z) - Comparing Methods for Extractive Summarization of Call Centre Dialogue [77.34726150561087]
We experimentally compare several such methods by using them to produce summaries of calls, and evaluating these summaries objectively.
We found that TopicSum and Lead-N outperform the other summarisation methods, whilst BERTSum received comparatively lower scores in both subjective and objective evaluations.
arXiv Detail & Related papers (2022-09-06T13:16:02Z) - SAIS: Supervising and Augmenting Intermediate Steps for Document-Level
Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction.
Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z) - Dynamic Sliding Window for Meeting Summarization [25.805553277418813]
We analyze the linguistic characteristics of meeting transcripts on a representative corpus, and find that the sentences comprising the summary correlate with the meeting agenda.
We propose a dynamic sliding window strategy for meeting summarization.
arXiv Detail & Related papers (2021-08-31T05:39:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.