Incorporating Uncertain Segmentation Information into Chinese NER for
Social Media Text
- URL: http://arxiv.org/abs/2004.06384v2
- Date: Mon, 15 Jun 2020 09:10:35 GMT
- Title: Incorporating Uncertain Segmentation Information into Chinese NER for
Social Media Text
- Authors: Shengbin Jia, Ling Ding, Xiaojun Chen, Shijia E, Yang Xiang
- Abstract summary: segmentation error propagation is a challenge for Chinese named entity recognition systems.
We propose a model (UIcwsNN) that specializes in identifying entities from Chinese social media text.
- Score: 18.455836845989523
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Chinese word segmentation is necessary to provide word-level information for
Chinese named entity recognition (NER) systems. However, segmentation error
propagation is a challenge for Chinese NER while processing colloquial data
like social media text. In this paper, we propose a model (UIcwsNN) that
specializes in identifying entities from Chinese social media text, especially
by leveraging ambiguous information of word segmentation. Such uncertain
information contains all the potential segmentation states of a sentence that
provides a channel for the model to infer deep word-level characteristics. We
propose a trilogy (i.e., candidate position embedding -> position selective
attention -> adaptive word convolution) to encode uncertain word segmentation
information and acquire appropriate word-level representation. Experiments
results on the social media corpus show that our model alleviates the
segmentation error cascading trouble effectively, and achieves a significant
performance improvement of more than 2% over previous state-of-the-art methods.
Related papers
- Betrayed by Captions: Joint Caption Grounding and Generation for Open
Vocabulary Instance Segmentation [80.48979302400868]
We focus on open vocabulary instance segmentation to expand a segmentation model to classify and segment instance-level novel categories.
Previous approaches have relied on massive caption datasets and complex pipelines to establish one-to-one mappings between image regions and captions in nouns.
We devise a joint textbfCaption Grounding and Generation (CGG) framework, which incorporates a novel grounding loss that only focuses on matching object to improve learning efficiency.
arXiv Detail & Related papers (2023-01-02T18:52:12Z) - ConNER: Consistency Training for Cross-lingual Named Entity Recognition [96.84391089120847]
Cross-lingual named entity recognition suffers from data scarcity in the target languages.
We propose ConNER as a novel consistency training framework for cross-lingual NER.
arXiv Detail & Related papers (2022-11-17T07:57:54Z) - Chinese Word Segmentation with Heterogeneous Graph Neural Network [8.569804490994219]
We propose a framework to improve Chinese word segmentation, named HGNSeg.
It exploits multi-level external information with the pre-trained language model and heterogeneous graph neural network.
In cross-domain scenarios, our method also shows a strong ability to alleviate the out-of-vocabulary (OOV) problem.
arXiv Detail & Related papers (2022-01-22T06:25:56Z) - Joint Chinese Word Segmentation and Part-of-speech Tagging via Two-stage
Span Labeling [0.2624902795082451]
We propose a neural model named SpanSegTag for joint Chinese word segmentation and part-of-speech tagging.
Our experiments show that our BERT-based model SpanSegTag achieved competitive performances on the CTB5, CTB6, and UD datasets.
arXiv Detail & Related papers (2021-12-17T12:59:02Z) - Multi-Modal Interaction Graph Convolutional Network for Temporal
Language Localization in Videos [55.52369116870822]
This paper focuses on tackling the problem of temporal language localization in videos.
It aims to identify the start and end points of a moment described by a natural language sentence in an untrimmed video.
arXiv Detail & Related papers (2021-10-12T14:59:25Z) - Exploiting Global Contextual Information for Document-level Named Entity
Recognition [46.99922251839363]
We propose a model called Global Context enhanced Document-level NER (GCDoc)
At word-level, a document graph is constructed to model a wider range of dependencies between words.
At sentence-level, for appropriately modeling wider context beyond single sentence, we employ a cross-sentence module.
Our model reaches F1 score of 92.22 (93.40 with BERT) on CoNLL 2003 dataset and 88.32 (90.49 with BERT) on Ontonotes 5.0 dataset.
arXiv Detail & Related papers (2021-06-02T01:52:07Z) - R$^2$-Net: Relation of Relation Learning Network for Sentence Semantic
Matching [58.72111690643359]
We propose a Relation of Relation Learning Network (R2-Net) for sentence semantic matching.
We first employ BERT to encode the input sentences from a global perspective.
Then a CNN-based encoder is designed to capture keywords and phrase information from a local perspective.
To fully leverage labels for better relation information extraction, we introduce a self-supervised relation of relation classification task.
arXiv Detail & Related papers (2020-12-16T13:11:30Z) - Named Entity Recognition for Social Media Texts with Semantic
Augmentation [70.44281443975554]
Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts.
We propose a neural-based approach to NER for social media texts where both local (from running text) and augmented semantics are taken into account.
arXiv Detail & Related papers (2020-10-29T10:06:46Z) - Improving Chinese Segmentation-free Word Embedding With Unsupervised
Association Measure [3.9435648520559177]
segmentation-free word embedding model is proposed by collecting n-grams vocabulary via a novel unsupervised association measure called pointwise association with times information(PATI)
The proposed method leverages more latent information from the corpus and thus is able to collect more valid n-grams that have stronger cohesion as embedding targets in unsegmented language data, such as Chinese texts.
arXiv Detail & Related papers (2020-07-05T13:55:19Z) - Integrating Boundary Assembling into a DNN Framework for Named Entity
Recognition in Chinese Social Media Text [3.7239227834407735]
Chinese word boundaries are also entity boundaries, so named entity recognition for Chinese text can benefit from word boundary detection.
In this paper, we integrate a boundary assembling method with the state-of-the-art deep neural network model, and incorporate the updated word boundary information into a conditional random field model for named entity recognition.
Our method shows a 2% absolute improvement over previous state-of-the-art results.
arXiv Detail & Related papers (2020-02-27T04:29:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.