Related papers: Generating a Structured Summary of Numerous Academic Papers: Dataset and Method

Generating a Structured Summary of Numerous Academic Papers: Dataset and Method

URL: http://arxiv.org/abs/2302.04580v1
Date: Thu, 9 Feb 2023 11:42:07 GMT
Title: Generating a Structured Summary of Numerous Academic Papers: Dataset and Method
Authors: Shuaiqi Liu, Jiannong Cao, Ruosong Yang, Zhiyuan Wen
Abstract summary: We propose BigSurvey, the first large-scale dataset for generating comprehensive summaries of numerous academic papers on each topic. We collect target summaries from more than seven thousand survey papers and utilize their 430 thousand reference papers' abstracts as input documents. To organize the diverse content from dozens of input documents, we propose a summarization method named category-based alignment and sparse transformer (CAST)
Score: 20.90939310713561
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Writing a survey paper on one research topic usually needs to cover the salient content from numerous related papers, which can be modeled as a multi-document summarization (MDS) task. Existing MDS datasets usually focus on producing the structureless summary covering a few input documents. Meanwhile, previous structured summary generation works focus on summarizing a single document into a multi-section summary. These existing datasets and methods cannot meet the requirements of summarizing numerous academic papers into a structured summary. To deal with the scarcity of available data, we propose BigSurvey, the first large-scale dataset for generating comprehensive summaries of numerous academic papers on each topic. We collect target summaries from more than seven thousand survey papers and utilize their 430 thousand reference papers' abstracts as input documents. To organize the diverse content from dozens of input documents and ensure the efficiency of processing long text sequences, we propose a summarization method named category-based alignment and sparse transformer (CAST). The experimental results show that our CAST method outperforms various advanced summarization methods.

Related papers

PRISM: Fine-Grained Paper-to-Paper Retrieval with Multi-Aspect-Aware Query Optimization [61.783280234747394]
PRISM is a document-to-document retrieval method that introduces multiple, fine-grained representations for both the query and candidate papers.<n>We present SciFullBench, a novel benchmark in which the complete and segmented context of full papers for both queries and candidates is available.<n>Experiments show that PRISM improves performance by an average of 4.3% over existing retrieval baselines.
arXiv Detail & Related papers (2025-07-14T08:41:53Z)
Integrating Planning into Single-Turn Long-Form Text Generation [66.08871753377055]
We propose to use planning to generate long form content. Our main novelty lies in a single auxiliary task that does not require multiple rounds of prompting or planning. Our experiments demonstrate on two datasets from different domains, that LLMs fine-tuned with the auxiliary task generate higher quality documents.
arXiv Detail & Related papers (2024-10-08T17:02:40Z)
On Task-personalized Multimodal Few-shot Learning for Visually-rich Document Entity Retrieval [59.25292920967197]
Few-shot document entity retrieval (VDER) is an important topic in industrial NLP applications. FewVEX is a new dataset to boost future research in the field of entity-level few-shot VDER. We present a task-aware meta-learning based framework, with a central focus on achieving effective task personalization.
arXiv Detail & Related papers (2023-11-01T17:51:43Z)
Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles [136.84278943588652]
We propose a new task of summarizing diverse information encountered in multiple news articles encompassing the same event. To facilitate this task, we outlined a data collection schema for identifying diverse information and curated a dataset named DiverseSumm. The dataset includes 245 news stories, with each story comprising 10 news articles and paired with a human-validated reference.
arXiv Detail & Related papers (2023-09-17T20:28:17Z)
Mining both Commonality and Specificity from Multiple Documents for Multi-Document Summarization [1.4629756274247374]
The multi-document summarization task requires the designed summarizer to generate a short text that covers the important information of original documents. This paper proposes a multi-document summarization approach based on hierarchical clustering of documents.
arXiv Detail & Related papers (2023-03-05T14:25:05Z)
Unsupervised Summarization with Customized Granularities [76.26899748972423]
We propose the first unsupervised multi-granularity summarization framework, GranuSum. By inputting different numbers of events, GranuSum is capable of producing multi-granular summaries in an unsupervised manner.
arXiv Detail & Related papers (2022-01-29T05:56:35Z)
Modeling Endorsement for Multi-Document Abstractive Summarization [10.166639983949887]
A crucial difference between single- and multi-document summarization is how salient content manifests itself in the document(s) In this paper, we model the cross-document endorsement effect and its utilization in multiple document summarization. Our method generates a synopsis from each document, which serves as an endorser to identify salient content from other documents.
arXiv Detail & Related papers (2021-10-15T03:55:42Z)
Bringing Structure into Summaries: a Faceted Summarization Dataset for Long Scientific Documents [30.09742243490895]
FacetSum is a faceted summarization benchmark built on Emerald journal articles. Analyses and empirical results on our dataset reveal the importance of bringing structure into summaries. We believe FacetSum will spur further advances in summarization research and foster the development of NLP systems.
arXiv Detail & Related papers (2021-05-31T22:58:38Z)
Text Summarization with Latent Queries [60.468323530248945]
We introduce LaQSum, the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms. Under a deep generative framework, our system jointly optimize a latent query model and a conditional language model, allowing users to plug-and-play queries of any type at test time. Our system robustly outperforms strong comparison systems across summarization benchmarks with different query types, document settings, and target domains.
arXiv Detail & Related papers (2021-05-31T21:14:58Z)
WSL-DS: Weakly Supervised Learning with Distant Supervision for Query Focused Multi-Document Abstractive Summarization [16.048329028104643]
In the Query Focused Multi-Document Summarization (QF-MDS) task, a set of documents and a query are given where the goal is to generate a summary from these documents. One major challenge for this task is the lack of availability of labeled training datasets. We propose a novel weakly supervised learning approach via utilizing distant supervision.
arXiv Detail & Related papers (2020-11-03T02:02:55Z)
A Large-Scale Multi-Document Summarization Dataset from the Wikipedia Current Events Portal [10.553314461761968]
Multi-document summarization (MDS) aims to compress the content in large document collections into short summaries. This work presents a new dataset for MDS that is large both in the total number of document clusters and in the size of individual clusters.
arXiv Detail & Related papers (2020-05-20T14:33:33Z)
From Standard Summarization to New Tasks and Beyond: Summarization with Manifold Information [77.89755281215079]
Text summarization is the research area aiming at creating a short and condensed version of the original document. In real-world applications, most of the data is not in a plain text format. This paper focuses on the survey of these new summarization tasks and approaches in the real-world application.
arXiv Detail & Related papers (2020-05-10T14:59:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.