Segmenting Scientific Abstracts into Discourse Categories: A Deep
Learning-Based Approach for Sparse Labeled Data
- URL: http://arxiv.org/abs/2005.05414v2
- Date: Wed, 27 May 2020 08:35:08 GMT
- Title: Segmenting Scientific Abstracts into Discourse Categories: A Deep
Learning-Based Approach for Sparse Labeled Data
- Authors: Soumya Banerjee, Debarshi Kumar Sanyal, Samiran Chattopadhyay, Plaban
Kumar Bhowmick and Parthapratim Das
- Abstract summary: We train a deep neural network on structured abstracts from PubMed to fine-tune it on a small hand-labeled corpus of computer science papers.
Our method appears to be a promising solution to the automatic segmentation of abstracts, where the data is sparse.
- Score: 8.635930195821265
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The abstract of a scientific paper distills the contents of the paper into a
short paragraph. In the biomedical literature, it is customary to structure an
abstract into discourse categories like BACKGROUND, OBJECTIVE, METHOD, RESULT,
and CONCLUSION, but this segmentation is uncommon in other fields like computer
science. Explicit categories could be helpful for more granular, that is,
discourse-level search and recommendation. The sparsity of labeled data makes
it challenging to construct supervised machine learning solutions for automatic
discourse-level segmentation of abstracts in non-bio domains. In this paper, we
address this problem using transfer learning. In particular, we define three
discourse categories BACKGROUND, TECHNIQUE, OBSERVATION-for an abstract because
these three categories are the most common. We train a deep neural network on
structured abstracts from PubMed, then fine-tune it on a small hand-labeled
corpus of computer science papers. We observe an accuracy of 75% on the test
corpus. We perform an ablation study to highlight the roles of the different
parts of the model. Our method appears to be a promising solution to the
automatic segmentation of abstracts, where the labeled data is sparse.
Related papers
- Data-driven Coreference-based Ontology Building [48.995395445597225]
Coreference resolution is traditionally used as a component in individual document understanding.
We take a more global view and explore what can we learn about a domain from the set of all document-level coreference relations.
We release the coreference chains resulting under a creative-commons license, along with the code.
arXiv Detail & Related papers (2024-10-22T14:30:40Z) - Ontology Embedding: A Survey of Methods, Applications and Resources [54.3453925775069]
Ontologies are widely used for representing domain knowledge and meta data.
One straightforward solution is to integrate statistical analysis and machine learning.
Numerous papers have been published on embedding, but a lack of systematic reviews hinders researchers from gaining a comprehensive understanding of this field.
arXiv Detail & Related papers (2024-06-16T14:49:19Z) - Empowering Interdisciplinary Research with BERT-Based Models: An Approach Through SciBERT-CNN with Topic Modeling [0.0]
This paper introduces a novel approach using the SciBERT model and CNNs to systematically categorize academic abstracts.
The CNN uses convolution and pooling to enhance feature extraction and reduce dimensionality.
arXiv Detail & Related papers (2024-04-16T05:21:47Z) - From Text Segmentation to Smart Chaptering: A Novel Benchmark for
Structuring Video Transcriptions [63.11097464396147]
We introduce a novel benchmark YTSeg focusing on spoken content that is inherently more unstructured and both topically and structurally diverse.
We also introduce an efficient hierarchical segmentation model MiniSeg, that outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-27T15:59:37Z) - Bloom-epistemic and sentiment analysis hierarchical classification in
course discussion forums [0.0]
Our proposed method is called the hierarchical approach of Bloom-Epistemic and Sentiment Analysis (BE-Sent)
This research has succeeded in producing a course learning subsystem that assesses opinions based on text reviews of discussion forums.
arXiv Detail & Related papers (2024-01-26T08:20:13Z) - Hierarchical Heterogeneous Graph Representation Learning for Short Text
Classification [60.233529926965836]
We propose a new method called SHINE, which is based on graph neural network (GNN) for short text classification.
First, we model the short text dataset as a hierarchical heterogeneous graph consisting of word-level component graphs.
Then, we dynamically learn a short document graph that facilitates effective label propagation among similar short texts.
arXiv Detail & Related papers (2021-10-30T05:33:05Z) - Extractive and Abstractive Sentence Labelling of Sentiment-bearing
Topics [5.014332673843021]
This paper tackles the problem of automatically labelling sentiment-bearing topics with descriptive sentence labels.
We propose two approaches to the problem, one extractive and the other abstractive.
We conclude that abstractive methods can effectively synthesise the rich information contained in sentiment-bearing topics.
arXiv Detail & Related papers (2021-08-29T11:08:39Z) - Semantic Analysis for Automated Evaluation of the Potential Impact of
Research Articles [62.997667081978825]
This paper presents a novel method for vector representation of text meaning based on information theory.
We show how this informational semantics is used for text classification on the basis of the Leicester Scientific Corpus.
We show that an informational approach to representing the meaning of a text has offered a way to effectively predict the scientific impact of research papers.
arXiv Detail & Related papers (2021-04-26T20:37:13Z) - BATS: A Spectral Biclustering Approach to Single Document Topic Modeling
and Segmentation [17.003488045214972]
Existing topic modeling and text segmentation methodologies generally require large datasets for training, limiting their capabilities when only small collections of text are available.
In developing a methodology to handle single documents, we face two major challenges.
First is sparse information: with access to only one document, we cannot train traditional topic models or deep learning algorithms.
Second is significant noise: a considerable portion of words in any single document will produce only noise and not help discern topics or segments.
arXiv Detail & Related papers (2020-08-05T16:34:33Z) - Large Scale Subject Category Classification of Scholarly Papers with
Deep Attentive Neural Networks [15.241086410108512]
We propose a deep attentive neural network (DANN) that classifies scholarly papers using only their abstracts.
The proposed network consists of two bi-directional recurrent neural networks followed by an attention layer.
Our best model achieves micro-F1 measure of 0.76 with F1 of individual subject categories ranging from 0.50-0.95.
arXiv Detail & Related papers (2020-07-27T19:42:42Z) - Heterogeneous Graph Neural Networks for Extractive Document
Summarization [101.17980994606836]
Cross-sentence relations are a crucial step in extractive document summarization.
We present a graph-based neural network for extractive summarization (HeterSumGraph)
We introduce different types of nodes into graph-based neural networks for extractive document summarization.
arXiv Detail & Related papers (2020-04-26T14:38:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.