Topic Segmentation Model Focusing on Local Context
- URL: http://arxiv.org/abs/2301.01935v1
- Date: Thu, 5 Jan 2023 06:57:42 GMT
- Title: Topic Segmentation Model Focusing on Local Context
- Authors: Jeonghwan Lee, Jiyeong Han, Sunghoon Baek and Min Song
- Abstract summary: We propose siamese sentence embedding layers which process two input sentences independently to get appropriate amount of information.
Also, we adopt multi-task learning techniques including Same Topic Prediction (STP), Topic Classification (TC) and Next Sentence Prediction (NSP)
- Score: 1.9871897882042773
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Topic segmentation is important in understanding scientific documents since
it can not only provide better readability but also facilitate downstream tasks
such as information retrieval and question answering by creating appropriate
sections or paragraphs. In the topic segmentation task, topic coherence is
critical in predicting segmentation boundaries. Most of the existing models
have tried to exploit as many contexts as possible to extract useful
topic-related information. However, additional context does not always bring
promising results, because the local context between sentences becomes
incoherent despite more sentences being supplemented. To alleviate this issue,
we propose siamese sentence embedding layers which process two input sentences
independently to get appropriate amount of information without being hampered
by excessive information. Also, we adopt multi-task learning techniques
including Same Topic Prediction (STP), Topic Classification (TC) and Next
Sentence Prediction (NSP). When these three classification layers are combined
in a multi-task manner, they can make up for each other's limitations,
improving performance in all three tasks. We experiment different combinations
of the three layers and report how each layer affects other layers in the same
combination as well as the overall segmentation performance. The model we
proposed achieves the state-of-the-art result in the WikiSection dataset.
Related papers
- Putting Context in Context: the Impact of Discussion Structure on Text
Classification [13.15873889847739]
We propose a series of experiments on a large dataset for stance detection in English.
We evaluate the contribution of different types of contextual information.
We show that structural information can be highly beneficial to text classification but only under certain circumstances.
arXiv Detail & Related papers (2024-02-05T12:56:22Z) - Improving Long Context Document-Level Machine Translation [51.359400776242786]
Document-level context for neural machine translation (NMT) is crucial to improve translation consistency and cohesion.
Many works have been published on the topic of document-level NMT, but most restrict the system to just local context.
We propose a constrained attention variant that focuses the attention on the most relevant parts of the sequence, while simultaneously reducing the memory consumption.
arXiv Detail & Related papers (2023-06-08T13:28:48Z) - AIMS: All-Inclusive Multi-Level Segmentation [93.5041381700744]
We propose a new task, All-Inclusive Multi-Level (AIMS), which segments visual regions into three levels: part, entity, and relation.
We also build a unified AIMS model through multi-dataset multi-task training to address the two major challenges of annotation inconsistency and task correlation.
arXiv Detail & Related papers (2023-05-28T16:28:49Z) - Topics in the Haystack: Extracting and Evaluating Topics beyond
Coherence [0.0]
We propose a method that incorporates a deeper understanding of both sentence and document themes.
This allows our model to detect latent topics that may include uncommon words or neologisms.
We present correlation coefficients with human identification of intruder words and achieve near-human level results at the word-intrusion task.
arXiv Detail & Related papers (2023-03-30T12:24:25Z) - PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and
Entailment Recognition [63.51569687229681]
We argue for the need to recognize the textual entailment relation of each proposition in a sentence individually.
We propose PropSegmEnt, a corpus of over 45K propositions annotated by expert human raters.
Our dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2) classifying the entailment relation of each proposition with respect to a different yet topically-aligned document.
arXiv Detail & Related papers (2022-12-21T04:03:33Z) - Distant finetuning with discourse relations for stance classification [55.131676584455306]
We propose a new method to extract data with silver labels from raw text to finetune a model for stance classification.
We also propose a 3-stage training framework where the noisy level in the data used for finetuning decreases over different stages.
Our approach ranks 1st among 26 competing teams in the stance classification track of the NLPCC 2021 shared task Argumentative Text Understanding for AI Debater.
arXiv Detail & Related papers (2022-04-27T04:24:35Z) - A Survey of Implicit Discourse Relation Recognition [9.57170901247685]
implicit discourse relation recognition (IDRR) is to detect implicit relation and classify its sense between two text segments without a connective.
This article provides a comprehensive and up-to-date survey for the IDRR task.
arXiv Detail & Related papers (2022-03-06T15:12:53Z) - Consistency and Coherence from Points of Contextual Similarity [0.0]
ESTIME measure, recently proposed specifically for factual consistency, achieves high correlations with human expert scores.
This is not a problem for current styles of summarization, but it may become an obstacle for future summarization systems.
arXiv Detail & Related papers (2021-12-22T03:04:20Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - Topic-Aware Multi-turn Dialogue Modeling [91.52820664879432]
This paper presents a novel solution for multi-turn dialogue modeling, which segments and extracts topic-aware utterances in an unsupervised way.
Our topic-aware modeling is implemented by a newly proposed unsupervised topic-aware segmentation algorithm and Topic-Aware Dual-attention Matching (TADAM) Network.
arXiv Detail & Related papers (2020-09-26T08:43:06Z) - BATS: A Spectral Biclustering Approach to Single Document Topic Modeling
and Segmentation [17.003488045214972]
Existing topic modeling and text segmentation methodologies generally require large datasets for training, limiting their capabilities when only small collections of text are available.
In developing a methodology to handle single documents, we face two major challenges.
First is sparse information: with access to only one document, we cannot train traditional topic models or deep learning algorithms.
Second is significant noise: a considerable portion of words in any single document will produce only noise and not help discern topics or segments.
arXiv Detail & Related papers (2020-08-05T16:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.