Long Text and Multi-Table Summarization: Dataset and Method
- URL: http://arxiv.org/abs/2302.03815v1
- Date: Wed, 8 Feb 2023 00:46:55 GMT
- Title: Long Text and Multi-Table Summarization: Dataset and Method
- Authors: Shuaiqi Liu, Jiannong Cao, Ruosong Yang, Zhiyuan Wen
- Abstract summary: FINDSum is built on 21,125 annual reports from 3,794 companies.
It has two subsets for summarizing each company's results of operations and liquidity.
We propose a set of evaluation metrics to assess the usage of numerical information in produced summaries.
- Score: 20.90939310713561
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic document summarization aims to produce a concise summary covering
the input document's salient information. Within a report document, the salient
information can be scattered in the textual and non-textual content. However,
existing document summarization datasets and methods usually focus on the text
and filter out the non-textual content. Missing tabular data can limit produced
summaries' informativeness, especially when summaries require covering
quantitative descriptions of critical metrics in tables. Existing datasets and
methods cannot meet the requirements of summarizing long text and multiple
tables in each report. To deal with the scarcity of available data, we propose
FINDSum, the first large-scale dataset for long text and multi-table
summarization. Built on 21,125 annual reports from 3,794 companies, it has two
subsets for summarizing each company's results of operations and liquidity. To
summarize the long text and dozens of tables in each report, we present three
types of summarization methods. Besides, we propose a set of evaluation metrics
to assess the usage of numerical information in produced summaries. Dataset
analyses and experimental results indicate the importance of jointly
considering input textual and tabular data when summarizing report documents.
Related papers
- The Power of Summary-Source Alignments [62.76959473193149]
Multi-document summarization (MDS) is a challenging task, often decomposed to subtasks of salience and redundancy detection.
alignment of corresponding sentences between a reference summary and its source documents has been leveraged to generate training data.
This paper proposes extending the summary-source alignment framework by applying it at the more fine-grained proposition span level.
arXiv Detail & Related papers (2024-06-02T19:35:19Z) - QFMTS: Generating Query-Focused Summaries over Multi-Table Inputs [63.98556480088152]
Table summarization is a crucial task aimed at condensing information into concise and comprehensible textual summaries.
We propose a novel method to address these limitations by introducing query-focused multi-table summarization.
Our approach, which comprises a table serialization module, a summarization controller, and a large language model, generates query-dependent table summaries tailored to users' information needs.
arXiv Detail & Related papers (2024-05-08T15:05:55Z) - Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles [136.84278943588652]
We propose a new task of summarizing diverse information encountered in multiple news articles encompassing the same event.
To facilitate this task, we outlined a data collection schema for identifying diverse information and curated a dataset named DiverseSumm.
The dataset includes 245 news stories, with each story comprising 10 news articles and paired with a human-validated reference.
arXiv Detail & Related papers (2023-09-17T20:28:17Z) - MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation
of Videos [106.06278332186106]
Multimodal summarization with multimodal output (MSMO) has emerged as a promising research direction.
Numerous limitations exist within existing public MSMO datasets.
We have meticulously curated the textbfMMSum dataset.
arXiv Detail & Related papers (2023-06-07T07:43:11Z) - QTSumm: Query-Focused Summarization over Tabular Data [58.62152746690958]
People primarily consult tables to conduct data analysis or answer specific questions.
We define a new query-focused table summarization task, where text generation models have to perform human-like reasoning.
We introduce a new benchmark named QTSumm for this task, which contains 7,111 human-annotated query-summary pairs over 2,934 tables.
arXiv Detail & Related papers (2023-05-23T17:43:51Z) - Generating a Structured Summary of Numerous Academic Papers: Dataset and
Method [20.90939310713561]
We propose BigSurvey, the first large-scale dataset for generating comprehensive summaries of numerous academic papers on each topic.
We collect target summaries from more than seven thousand survey papers and utilize their 430 thousand reference papers' abstracts as input documents.
To organize the diverse content from dozens of input documents, we propose a summarization method named category-based alignment and sparse transformer (CAST)
arXiv Detail & Related papers (2023-02-09T11:42:07Z) - CTE: A Dataset for Contextualized Table Extraction [1.1859913430860336]
The dataset comprises 75k fully annotated pages of scientific papers, including more than 35k tables.
Data are gathered from PubMed Central, merging the information provided by annotations in the PubTables-1M and PubLayNet datasets.
The generated annotations can be used to develop end-to-end pipelines for various tasks, including document layout analysis, table detection, structure recognition, and functional analysis.
arXiv Detail & Related papers (2023-02-02T22:38:23Z) - A Survey on Neural Abstractive Summarization Methods and Factual
Consistency of Summarization [18.763290930749235]
summarization is the process of shortening a set of textual data computationally, to create a subset (a summary)
Existing summarization methods can be roughly divided into two types: extractive and abstractive.
An extractive summarizer explicitly selects text snippets from the source document, while an abstractive summarizer generates novel text snippets to convey the most salient concepts prevalent in the source.
arXiv Detail & Related papers (2022-04-20T14:56:36Z) - Topic Modeling Based Extractive Text Summarization [0.0]
We propose a novel method to summarize a text document by clustering its contents based on latent topics.
We utilize the lesser used and challenging WikiHow dataset in our approach to text summarization.
arXiv Detail & Related papers (2021-06-29T12:28:19Z) - Text Summarization with Latent Queries [60.468323530248945]
We introduce LaQSum, the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms.
Under a deep generative framework, our system jointly optimize a latent query model and a conditional language model, allowing users to plug-and-play queries of any type at test time.
Our system robustly outperforms strong comparison systems across summarization benchmarks with different query types, document settings, and target domains.
arXiv Detail & Related papers (2021-05-31T21:14:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.