ECTSum: A New Benchmark Dataset For Bullet Point Summarization of Long
Earnings Call Transcripts
- URL: http://arxiv.org/abs/2210.12467v2
- Date: Wed, 26 Oct 2022 16:21:37 GMT
- Title: ECTSum: A New Benchmark Dataset For Bullet Point Summarization of Long
Earnings Call Transcripts
- Authors: Rajdeep Mukherjee, Abhinav Bohra, Akash Banerjee, Soumya Sharma,
Manjunath Hegde, Afreen Shaikh, Shivani Shrivastava, Koustuv Dasgupta, Niloy
Ganguly, Saptarshi Ghosh, Pawan Goyal
- Abstract summary: We present a new dataset with transcripts of earnings calls (ECTs), hosted by publicly traded companies, as documents.
We also present a simple-yet-effective approach, ECT-BPS, to generate a set of bullet points that precisely capture the important facts discussed in the calls.
- Score: 19.974530405492885
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite tremendous progress in automatic summarization, state-of-the-art
methods are predominantly trained to excel in summarizing short newswire
articles, or documents with strong layout biases such as scientific articles or
government reports. Efficient techniques to summarize financial documents,
including facts and figures, have largely been unexplored, majorly due to the
unavailability of suitable datasets. In this work, we present ECTSum, a new
dataset with transcripts of earnings calls (ECTs), hosted by publicly traded
companies, as documents, and short experts-written telegram-style bullet point
summaries derived from corresponding Reuters articles. ECTs are long
unstructured documents without any prescribed length limit or format. We
benchmark our dataset with state-of-the-art summarizers across various metrics
evaluating the content quality and factual consistency of the generated
summaries. Finally, we present a simple-yet-effective approach, ECT-BPS, to
generate a set of bullet points that precisely capture the important facts
discussed in the calls.
Related papers
- DeepSynth-Eval: Objectively Evaluating Information Consolidation in Deep Survey Writing [53.85037373860246]
We introduce Deep Synth-Eval, a benchmark designed to objectively evaluate information consolidation capabilities.<n>We propose a fine-grained evaluation protocol using General Checklists (for factual coverage) and Constraint Checklists (for structural organization)<n>Our results demonstrate that agentic plan-and-write significantly outperform single-turn generation.
arXiv Detail & Related papers (2026-01-07T03:07:52Z) - Enhancing Business Analytics through Hybrid Summarization of Financial Reports [0.152292571922932]
Financial reports and earnings communications contain large volumes of structured and semi structured information.<n>We present a hybrid summarization framework that combines extractive and abstractive techniques to produce concise and factually reliable summaries.<n>These findings support the development of practical summarization systems for distilling lengthy financial texts into usable business insights.
arXiv Detail & Related papers (2025-12-28T16:25:12Z) - ABCD-LINK: Annotation Bootstrapping for Cross-Document Fine-Grained Links [57.514511353084565]
We introduce a new domain-agnostic framework for selecting a best-performing approach and annotating cross-document links.<n>We apply our framework in two distinct domains -- peer review and news.<n>The resulting novel datasets lay foundation for numerous cross-document tasks like media framing and peer review.
arXiv Detail & Related papers (2025-09-01T11:32:24Z) - Unfolding the Headline: Iterative Self-Questioning for News Retrieval and Timeline Summarization [93.56166917491487]
This paper proposes CHRONOS - Causal Headline Retrieval for Open-domain News Timeline SummarizatiOn via Iterative Self-Questioning.
Our experiments indicate that CHRONOS is not only adept at open-domain timeline summarization, but it also rivals the performance of existing state-of-the-art systems designed for closed-domain applications.
arXiv Detail & Related papers (2025-01-01T16:28:21Z) - SciER: An Entity and Relation Extraction Dataset for Datasets, Methods, and Tasks in Scientific Documents [49.54155332262579]
We release a new entity and relation extraction dataset for entities related to datasets, methods, and tasks in scientific articles.
Our dataset contains 106 manually annotated full-text scientific publications with over 24k entities and 12k relations.
arXiv Detail & Related papers (2024-10-28T15:56:49Z) - Integrating Planning into Single-Turn Long-Form Text Generation [66.08871753377055]
We propose to use planning to generate long form content.
Our main novelty lies in a single auxiliary task that does not require multiple rounds of prompting or planning.
Our experiments demonstrate on two datasets from different domains, that LLMs fine-tuned with the auxiliary task generate higher quality documents.
arXiv Detail & Related papers (2024-10-08T17:02:40Z) - Instruction-Guided Bullet Point Summarization of Long Financial Earnings Call Transcripts [25.4439290862464]
We study the problem of bullet point summarization of Earning Callum Transcripts (ECTs) using the recently released dataset.
We leverage an unsupervised question-based extractive module followed by a parameter efficient instruction-tuned abstractive module to solve this task.
Our proposed model FLAN-FinBPS achieves new state-of-the-art performances outperforming the strongest baseline with 14.88% average ROUGE score gain.
arXiv Detail & Related papers (2024-05-03T16:33:16Z) - All Data on the Table: Novel Dataset and Benchmark for Cross-Modality
Scientific Information Extraction [39.05577374775964]
We propose a semi-supervised pipeline for annotating entities in text, as well as entities and relations in tables, in an iterative procedure.
We release novel resources for the scientific community, including a high-quality benchmark, a large-scale corpus, and a semi-supervised annotation pipeline.
arXiv Detail & Related papers (2023-11-14T14:22:47Z) - Scientific Paper Extractive Summarization Enhanced by Citation Graphs [50.19266650000948]
We focus on leveraging citation graphs to improve scientific paper extractive summarization under different settings.
Preliminary results demonstrate that citation graph is helpful even in a simple unsupervised framework.
Motivated by this, we propose a Graph-based Supervised Summarization model (GSS) to achieve more accurate results on the task when large-scale labeled data are available.
arXiv Detail & Related papers (2022-12-08T11:53:12Z) - Evaluating and Improving Factuality in Multimodal Abstractive
Summarization [91.46015013816083]
We propose CLIPBERTScore to leverage the robustness and strong factuality detection performance between image-summary and document-summary.
We show that this simple combination of two metrics in the zero-shot achieves higher correlations than existing factuality metrics for document summarization.
Our analysis demonstrates the robustness and high correlation of CLIPBERTScore and its components on four factuality metric-evaluation benchmarks.
arXiv Detail & Related papers (2022-11-04T16:50:40Z) - SQuALITY: Building a Long-Document Summarization Dataset the Hard Way [31.832673451018543]
We hire highly-qualified contractors to read stories and write original summaries from scratch.
To amortize reading time, we collect five summaries per document, with the first giving an overview and the subsequent four addressing specific questions.
Experiments with state-of-the-art summarization systems show that our dataset is challenging and that existing automatic evaluation metrics are weak indicators of quality.
arXiv Detail & Related papers (2022-05-23T17:02:07Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - Bringing Structure into Summaries: a Faceted Summarization Dataset for
Long Scientific Documents [30.09742243490895]
FacetSum is a faceted summarization benchmark built on Emerald journal articles.
Analyses and empirical results on our dataset reveal the importance of bringing structure into summaries.
We believe FacetSum will spur further advances in summarization research and foster the development of NLP systems.
arXiv Detail & Related papers (2021-05-31T22:58:38Z) - Enhancing Extractive Text Summarization with Topic-Aware Graph Neural
Networks [21.379555672973975]
This paper proposes a graph neural network (GNN)-based extractive summarization model.
Our model integrates a joint neural topic model (NTM) to discover latent topics, which can provide document-level features for sentence selection.
The experimental results demonstrate that our model achieves substantially state-of-the-art results on CNN/DM and NYT datasets.
arXiv Detail & Related papers (2020-10-13T09:30:04Z) - Leveraging Graph to Improve Abstractive Multi-Document Summarization [50.62418656177642]
We develop a neural abstractive multi-document summarization (MDS) model which can leverage well-known graph representations of documents.
Our model utilizes graphs to encode documents in order to capture cross-document relations, which is crucial to summarizing long documents.
Our model can also take advantage of graphs to guide the summary generation process, which is beneficial for generating coherent and concise summaries.
arXiv Detail & Related papers (2020-05-20T13:39:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.