Neural Sequence Segmentation as Determining the Leftmost Segments
- URL: http://arxiv.org/abs/2104.07217v1
- Date: Thu, 15 Apr 2021 03:35:03 GMT
- Title: Neural Sequence Segmentation as Determining the Leftmost Segments
- Authors: Yangming Li, Lemao Liu, Kaisheng Yao
- Abstract summary: We propose a novel framework that incrementally segments natural language sentences at segment level.
For every step in segmentation, it recognizes the leftmost segment of the remaining sequence.
We have conducted extensive experiments on syntactic chunking and Chinese part-of-speech tagging across 3 datasets.
- Score: 25.378188980430256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prior methods to text segmentation are mostly at token level. Despite the
adequacy, this nature limits their full potential to capture the long-term
dependencies among segments. In this work, we propose a novel framework that
incrementally segments natural language sentences at segment level. For every
step in segmentation, it recognizes the leftmost segment of the remaining
sequence. Implementations involve LSTM-minus technique to construct the phrase
representations and recurrent neural networks (RNN) to model the iterations of
determining the leftmost segments. We have conducted extensive experiments on
syntactic chunking and Chinese part-of-speech (POS) tagging across 3 datasets,
demonstrating that our methods have significantly outperformed previous all
baselines and achieved new state-of-the-art results. Moreover, qualitative
analysis and the study on segmenting long-length sentences verify its
effectiveness in modeling long-term dependencies.
Related papers
- Image Segmentation in Foundation Model Era: A Survey [99.19456390358211]
Current research in image segmentation lacks a detailed analysis of distinct characteristics, challenges, and solutions associated with these advancements.
This survey seeks to fill this gap by providing a thorough review of cutting-edge research centered around FM-driven image segmentation.
An exhaustive overview of over 300 segmentation approaches is provided to encapsulate the breadth of current research efforts.
arXiv Detail & Related papers (2024-08-23T10:07:59Z) - From Text Segmentation to Smart Chaptering: A Novel Benchmark for
Structuring Video Transcriptions [63.11097464396147]
We introduce a novel benchmark YTSeg focusing on spoken content that is inherently more unstructured and both topically and structurally diverse.
We also introduce an efficient hierarchical segmentation model MiniSeg, that outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-27T15:59:37Z) - LISA: Reasoning Segmentation via Large Language Model [68.24075852136761]
We propose a new segmentation task -- reasoning segmentation.
The task is designed to output a segmentation mask given a complex and implicit query text.
We present LISA: large Language Instructed Assistant, which inherits the language generation capabilities of multimodal Large Language Models.
arXiv Detail & Related papers (2023-08-01T17:50:17Z) - Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic
Sentence Segmentation [65.6736056006381]
We present a multilingual punctuation-agnostic sentence segmentation method covering 85 languages.
Our method outperforms all the prior best sentence-segmentation tools by an average of 6.1% F1 points.
By using our method to match sentence segmentation to the segmentation used during training of MT models, we achieve an average improvement of 2.3 BLEU points.
arXiv Detail & Related papers (2023-05-30T09:49:42Z) - Ensembling Instance and Semantic Segmentation for Panoptic Segmentation [0.0]
Methods first performs instance segmentation and semantic segmentation separately, then combines the two to generate panoptic segmentation results.
We add several expert models of Mask R-CNN in instance segmentation to tackle the data imbalance problem in the training data.
In semantic segmentation, we trained several models with various backbones and use an ensemble strategy which further boosts the segmentation results.
arXiv Detail & Related papers (2023-04-20T14:02:01Z) - Context-aware Fine-tuning of Self-supervised Speech Models [56.95389222319555]
We study the use of context, i.e., surrounding segments, during fine-tuning.
We propose a new approach called context-aware fine-tuning.
We evaluate the proposed approach using the SLUE and Libri-light benchmarks for several downstream tasks.
arXiv Detail & Related papers (2022-12-16T15:46:15Z) - Neural Token Segmentation for High Token-Internal Complexity [7.569526565230962]
Tokenizing raw texts into word units is an essential pre-processing step for NLP pipelines.
We propose a novel neural segmentation model which combines contextualised token representation and char-level decoding.
Our model shows substantial improvements in segmentation accuracy on Hebrew and Arabic compared to the state-of-the-art.
arXiv Detail & Related papers (2022-03-21T10:07:17Z) - Unsupervised Word Segmentation with Bi-directional Neural Language Model [11.269066294359138]
We present an unsupervised word segmentation model, in which the learning objective is to maximize the generation probability of a sentence.
In order to better capture the long- and short-term dependencies, we propose to use bi-directional neural language models.
Two decoding algorithms are also described to combine the context features from both directions to generate the final segmentation.
arXiv Detail & Related papers (2021-03-02T02:21:22Z) - Segmenting Natural Language Sentences via Lexical Unit Analysis [47.273602658066196]
We present Lexical Unit Analysis (LUA), a framework for general sequence segmentation tasks.
LUA scores all the valid segmentation candidates and utilizes dynamic programming (DP) to extract the maximum scoring one.
We have conducted extensive experiments on 5 tasks, including syntactic chunking, named entity recognition (NER), slot filling, Chinese word segmentation, and Chinese part-of-speech (POS) tagging, across 15 datasets.
arXiv Detail & Related papers (2020-12-10T02:31:52Z) - On Target Segmentation for Direct Speech Translation [20.456325305495966]
Subword-level segmentation became the state of the art in neural machine translation.
We compare the two methods on three benchmarks covering 8 language directions and multilingual training.
Subword-level segmentation compares favorably in all settings, outperforming its character-level counterpart in a range of 1 to 3 BLEU points.
arXiv Detail & Related papers (2020-09-10T07:47:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.