Unsupervised Scientific Abstract Segmentation with Normalized Mutual
Information
- URL: http://arxiv.org/abs/2305.11553v1
- Date: Fri, 19 May 2023 09:53:45 GMT
- Title: Unsupervised Scientific Abstract Segmentation with Normalized Mutual
Information
- Authors: Yingqiang Gao, Jessica Lam, Nianlong Gu, Richard H.R. Hahnloser
- Abstract summary: We empirically explore using Normalized Mutual Information (NMI) for abstract segmentation.
On non-structured abstracts, our proposed unsupervised approach GreedyCAS achieves the best performance across all evaluation metrics.
The strong correlation of NMI to our evaluation metrics reveals the effectiveness of NMI for abstract segmentation.
- Score: 4.129225533930966
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The abstracts of scientific papers consist of premises and conclusions.
Structured abstracts explicitly highlight the conclusion sentences, whereas
non-structured abstracts may have conclusion sentences at uncertain positions.
This implicit nature of conclusion positions makes the automatic segmentation
of scientific abstracts into premises and conclusions a challenging task. In
this work, we empirically explore using Normalized Mutual Information (NMI) for
abstract segmentation. We consider each abstract as a recurrent cycle of
sentences and place segmentation boundaries by greedily optimizing the NMI
score between premises and conclusions. On non-structured abstracts, our
proposed unsupervised approach GreedyCAS achieves the best performance across
all evaluation metrics; on structured abstracts, GreedyCAS outperforms all
baseline methods measured by $P_k$. The strong correlation of NMI to our
evaluation metrics reveals the effectiveness of NMI for abstract segmentation.
Related papers
- Hierarchical Indexing for Retrieval-Augmented Opinion Summarization [60.5923941324953]
We propose a method for unsupervised abstractive opinion summarization that combines the attributability and scalability of extractive approaches with the coherence and fluency of Large Language Models (LLMs)
Our method, HIRO, learns an index structure that maps sentences to a path through a semantically organized discrete hierarchy.
At inference time, we populate the index and use it to identify and retrieve clusters of sentences containing popular opinions from input reviews.
arXiv Detail & Related papers (2024-03-01T10:38:07Z) - Hierarchical State Abstraction Based on Structural Information
Principles [70.24495170921075]
We propose a novel mathematical Structural Information principles-based State Abstraction framework, namely SISA, from the information-theoretic perspective.
SISA is a general framework that can be flexibly integrated with different representation-learning objectives to improve their performances further.
arXiv Detail & Related papers (2023-04-24T11:06:52Z) - Simple Yet Effective Synthetic Dataset Construction for Unsupervised
Opinion Summarization [28.52201592634964]
We propose two simple yet effective unsupervised approaches to generate both aspect-specific and general opinion summaries.
Our first approach, Seed Words Based Leave-One-Out (SW-LOO), identifies aspect-related portions of reviews simply by exact-matching aspect seed words.
Our second approach, Natural Language Inference Based Leave-One-Out (NLI-LOO), identifies aspect-related sentences utilizing an NLI model in a more general setting without using seed words.
arXiv Detail & Related papers (2023-03-21T08:08:04Z) - Document Summarization with Text Segmentation [7.954814600961461]
We exploit the innate document segment structure for improving the extractive summarization task.
We build two text segmentation models and find the most optimal strategy to introduce their output predictions.
arXiv Detail & Related papers (2023-01-20T22:24:22Z) - PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and
Entailment Recognition [63.51569687229681]
We argue for the need to recognize the textual entailment relation of each proposition in a sentence individually.
We propose PropSegmEnt, a corpus of over 45K propositions annotated by expert human raters.
Our dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2) classifying the entailment relation of each proposition with respect to a different yet topically-aligned document.
arXiv Detail & Related papers (2022-12-21T04:03:33Z) - Nested Named Entity Recognition as Holistic Structure Parsing [92.8397338250383]
This work models the full nested NEs in a sentence as a holistic structure, then we propose a holistic structure parsing algorithm to disclose the entire NEs once for all.
Experiments show that our model yields promising results on widely-used benchmarks which approach or even achieve state-of-the-art.
arXiv Detail & Related papers (2022-04-17T12:48:20Z) - Automatic evaluation of scientific abstracts through natural language
processing [0.0]
This paper proposes natural language processing algorithms to classify, segment and evaluate scientific work.
The proposed framework categorizes the abstract texts into according to the problems intended to be solved by employing a text classification approach.
The methodology of the abstract is ranked based on the sentiment analysis of its results.
arXiv Detail & Related papers (2021-11-14T12:55:29Z) - Abstract, Rationale, Stance: A Joint Model for Scientific Claim
Verification [18.330265729989843]
We propose an approach, named as ARSJoint, that jointly learns the modules for the three tasks with a machine reading comprehension framework.
The experimental results on the benchmark dataset SciFact show that our approach outperforms the existing works.
arXiv Detail & Related papers (2021-09-13T10:07:26Z) - Topic-Centric Unsupervised Multi-Document Summarization of Scientific
and News Articles [3.0504782036247438]
We propose a topic-centric unsupervised multi-document summarization framework to generate abstractive summaries.
The proposed algorithm generates an abstractive summary by developing salient language unit selection and text generation techniques.
Our approach matches the state-of-the-art when evaluated on automated extractive evaluation metrics and performs better for abstractive summarization on five human evaluation metrics.
arXiv Detail & Related papers (2020-11-03T04:04:21Z) - Introducing Syntactic Structures into Target Opinion Word Extraction
with Deep Learning [89.64620296557177]
We propose to incorporate the syntactic structures of the sentences into the deep learning models for targeted opinion word extraction.
We also introduce a novel regularization technique to improve the performance of the deep learning models.
The proposed model is extensively analyzed and achieves the state-of-the-art performance on four benchmark datasets.
arXiv Detail & Related papers (2020-10-26T07:13:17Z) - Constrained Abstractive Summarization: Preserving Factual Consistency
with Constrained Generation [93.87095877617968]
We propose Constrained Abstractive Summarization (CAS), a general setup that preserves the factual consistency of abstractive summarization.
We adopt lexically constrained decoding, a technique generally applicable to autoregressive generative models, to fulfill CAS.
We observe up to 13.8 ROUGE-2 gains when only one manual constraint is used in interactive summarization.
arXiv Detail & Related papers (2020-10-24T00:27:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.