SciSummPip: An Unsupervised Scientific Paper Summarization Pipeline
- URL: http://arxiv.org/abs/2010.09190v1
- Date: Mon, 19 Oct 2020 03:29:21 GMT
- Title: SciSummPip: An Unsupervised Scientific Paper Summarization Pipeline
- Authors: Jiaxin Ju, Ming Liu, Longxiang Gao and Shirui Pan
- Abstract summary: We describe our text summarization system, SciSummPip, inspired by SummPip (Zhao et al., 2020)
Our SciSummPip includes a transformer-based language model SciBERT for contextual sentence representation.
Our work differs from previous method in that content selection and a summary length constraint is applied to adapt to the scientific domain.
- Score: 39.46301416663324
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Scholarly Document Processing (SDP) workshop is to encourage more efforts
on natural language understanding of scientific task. It contains three shared
tasks and we participate in the LongSumm shared task. In this paper, we
describe our text summarization system, SciSummPip, inspired by SummPip (Zhao
et al., 2020) that is an unsupervised text summarization system for
multi-document in news domain. Our SciSummPip includes a transformer-based
language model SciBERT (Beltagy et al., 2019) for contextual sentence
representation, content selection with PageRank (Page et al., 1999), sentence
graph construction with both deep and linguistic information, sentence graph
clustering and within-graph summary generation. Our work differs from previous
method in that content selection and a summary length constraint is applied to
adapt to the scientific domain. The experiment results on both training dataset
and blind test dataset show the effectiveness of our method, and we empirically
verify the robustness of modules used in SciSummPip with BERTScore (Zhang et
al., 2019a).
Related papers
- Integrating Planning into Single-Turn Long-Form Text Generation [66.08871753377055]
We propose to use planning to generate long form content.
Our main novelty lies in a single auxiliary task that does not require multiple rounds of prompting or planning.
Our experiments demonstrate on two datasets from different domains, that LLMs fine-tuned with the auxiliary task generate higher quality documents.
arXiv Detail & Related papers (2024-10-08T17:02:40Z) - Write Summary Step-by-Step: A Pilot Study of Stepwise Summarization [48.57273563299046]
We propose the task of Stepwise Summarization, which aims to generate a new appended summary each time a new document is proposed.
The appended summary should not only summarize the newly added content but also be coherent with the previous summary.
We show that SSG achieves state-of-the-art performance in terms of both automatic metrics and human evaluations.
arXiv Detail & Related papers (2024-06-08T05:37:26Z) - Classification and Clustering of Sentence-Level Embeddings of Scientific Articles Generated by Contrastive Learning [1.104960878651584]
Our approach consists of fine-tuning transformer language models to generate sentence-level embeddings from scientific articles.
We trained our models on three datasets with contrastive learning.
We show that fine-tuning sentence transformers with contrastive learning and using the generated embeddings in downstream tasks is a feasible approach to sentence classification in scientific articles.
arXiv Detail & Related papers (2024-03-30T02:52:14Z) - Lay Text Summarisation Using Natural Language Processing: A Narrative
Literature Review [1.8899300124593648]
The aim of this literature review is to describe and compare the different text summarisation approaches used to generate lay summaries.
We screened 82 articles and included eight relevant papers published between 2020 and 2021, using the same dataset.
A combination of extractive and abstractive summarisation methods in a hybrid approach was found to be most effective.
arXiv Detail & Related papers (2023-03-24T18:30:50Z) - Summ^N: A Multi-Stage Summarization Framework for Long Input Dialogues
and Documents [13.755637074366813]
SummN is a simple, flexible, and effective multi-stage framework for input texts longer than the maximum context lengths of typical pretrained LMs.
It can process input text of arbitrary length by adjusting the number of stages while keeping the LM context size fixed.
Our experiments demonstrate that SummN significantly outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2021-10-16T06:19:54Z) - KnowGraph@IITK at SemEval-2021 Task 11: Building KnowledgeGraph for NLP
Research [2.1012672709024294]
We develop a system for a research paper contributions-focused knowledge graph over Natural Language Processing literature.
The proposed system is agnostic to the subject domain and can be applied for building a knowledge graph for any area.
Our system achieved F1 score of 0.38, 0.63 and 0.76 in end-to-end pipeline testing, phrase extraction testing and triplet extraction testing respectively.
arXiv Detail & Related papers (2021-04-04T14:33:21Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - SummPip: Unsupervised Multi-Document Summarization with Sentence Graph
Compression [61.97200991151141]
SummPip is an unsupervised method for multi-document summarization.
We convert the original documents to a sentence graph, taking both linguistic and deep representation into account.
We then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary.
arXiv Detail & Related papers (2020-07-17T13:01:15Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.