SciSummPip: An Unsupervised Scientific Paper Summarization Pipeline
- URL: http://arxiv.org/abs/2010.09190v1
- Date: Mon, 19 Oct 2020 03:29:21 GMT
- Title: SciSummPip: An Unsupervised Scientific Paper Summarization Pipeline
- Authors: Jiaxin Ju, Ming Liu, Longxiang Gao and Shirui Pan
- Abstract summary: We describe our text summarization system, SciSummPip, inspired by SummPip (Zhao et al., 2020)
Our SciSummPip includes a transformer-based language model SciBERT for contextual sentence representation.
Our work differs from previous method in that content selection and a summary length constraint is applied to adapt to the scientific domain.
- Score: 39.46301416663324
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Scholarly Document Processing (SDP) workshop is to encourage more efforts
on natural language understanding of scientific task. It contains three shared
tasks and we participate in the LongSumm shared task. In this paper, we
describe our text summarization system, SciSummPip, inspired by SummPip (Zhao
et al., 2020) that is an unsupervised text summarization system for
multi-document in news domain. Our SciSummPip includes a transformer-based
language model SciBERT (Beltagy et al., 2019) for contextual sentence
representation, content selection with PageRank (Page et al., 1999), sentence
graph construction with both deep and linguistic information, sentence graph
clustering and within-graph summary generation. Our work differs from previous
method in that content selection and a summary length constraint is applied to
adapt to the scientific domain. The experiment results on both training dataset
and blind test dataset show the effectiveness of our method, and we empirically
verify the robustness of modules used in SciSummPip with BERTScore (Zhang et
al., 2019a).
Related papers
- Write Summary Step-by-Step: A Pilot Study of Stepwise Summarization [48.57273563299046]
We propose the task of Stepwise Summarization, which aims to generate a new appended summary each time a new document is proposed.
The appended summary should not only summarize the newly added content but also be coherent with the previous summary.
We show that SSG achieves state-of-the-art performance in terms of both automatic metrics and human evaluations.
arXiv Detail & Related papers (2024-06-08T05:37:26Z) - Classification and Clustering of Sentence-Level Embeddings of Scientific Articles Generated by Contrastive Learning [1.104960878651584]
Our approach consists of fine-tuning transformer language models to generate sentence-level embeddings from scientific articles.
We trained our models on three datasets with contrastive learning.
We show that fine-tuning sentence transformers with contrastive learning and using the generated embeddings in downstream tasks is a feasible approach to sentence classification in scientific articles.
arXiv Detail & Related papers (2024-03-30T02:52:14Z) - SKT5SciSumm - A Hybrid Generative Approach for Multi-Document Scientific
Summarization [24.706753105042463]
We propose SKT5SciSumm - a hybrid framework for multi-document scientific summarization (MDSS)
We leverage the Sentence-Transformer version of Scientific Paper Embeddings using Citation-Informed Transformers (SPECTER) to encode and represent textual sentences.
We employ the T5 family of models to generate abstractive summaries using extracted sentences.
arXiv Detail & Related papers (2024-02-27T08:33:31Z) - Lay Text Summarisation Using Natural Language Processing: A Narrative
Literature Review [1.8899300124593648]
The aim of this literature review is to describe and compare the different text summarisation approaches used to generate lay summaries.
We screened 82 articles and included eight relevant papers published between 2020 and 2021, using the same dataset.
A combination of extractive and abstractive summarisation methods in a hybrid approach was found to be most effective.
arXiv Detail & Related papers (2023-03-24T18:30:50Z) - Summ^N: A Multi-Stage Summarization Framework for Long Input Dialogues
and Documents [13.755637074366813]
SummN is a simple, flexible, and effective multi-stage framework for input texts longer than the maximum context lengths of typical pretrained LMs.
It can process input text of arbitrary length by adjusting the number of stages while keeping the LM context size fixed.
Our experiments demonstrate that SummN significantly outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2021-10-16T06:19:54Z) - KnowGraph@IITK at SemEval-2021 Task 11: Building KnowledgeGraph for NLP
Research [2.1012672709024294]
We develop a system for a research paper contributions-focused knowledge graph over Natural Language Processing literature.
The proposed system is agnostic to the subject domain and can be applied for building a knowledge graph for any area.
Our system achieved F1 score of 0.38, 0.63 and 0.76 in end-to-end pipeline testing, phrase extraction testing and triplet extraction testing respectively.
arXiv Detail & Related papers (2021-04-04T14:33:21Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - SummPip: Unsupervised Multi-Document Summarization with Sentence Graph
Compression [61.97200991151141]
SummPip is an unsupervised method for multi-document summarization.
We convert the original documents to a sentence graph, taking both linguistic and deep representation into account.
We then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary.
arXiv Detail & Related papers (2020-07-17T13:01:15Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.