Integrating Boundary Assembling into a DNN Framework for Named Entity
Recognition in Chinese Social Media Text
- URL: http://arxiv.org/abs/2002.11910v1
- Date: Thu, 27 Feb 2020 04:29:13 GMT
- Title: Integrating Boundary Assembling into a DNN Framework for Named Entity
Recognition in Chinese Social Media Text
- Authors: Zhaoheng Gong, Ping Chen, Jiang Zhou
- Abstract summary: Chinese word boundaries are also entity boundaries, so named entity recognition for Chinese text can benefit from word boundary detection.
In this paper, we integrate a boundary assembling method with the state-of-the-art deep neural network model, and incorporate the updated word boundary information into a conditional random field model for named entity recognition.
Our method shows a 2% absolute improvement over previous state-of-the-art results.
- Score: 3.7239227834407735
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Named entity recognition is a challenging task in Natural Language
Processing, especially for informal and noisy social media text. Chinese word
boundaries are also entity boundaries, therefore, named entity recognition for
Chinese text can benefit from word boundary detection, outputted by Chinese
word segmentation. Yet Chinese word segmentation poses its own difficulty
because it is influenced by several factors, e.g., segmentation criteria,
employed algorithm, etc. Dealt improperly, it may generate a cascading failure
to the quality of named entity recognition followed. In this paper we integrate
a boundary assembling method with the state-of-the-art deep neural network
model, and incorporate the updated word boundary information into a conditional
random field model for named entity recognition. Our method shows a 2% absolute
improvement over previous state-of-the-art results.
Related papers
- Semantic Connectivity-Driven Pseudo-labeling for Cross-domain
Segmentation [89.41179071022121]
Self-training is a prevailing approach in cross-domain semantic segmentation.
We propose a novel approach called Semantic Connectivity-driven pseudo-labeling.
This approach formulates pseudo-labels at the connectivity level and thus can facilitate learning structured and low-noise semantics.
arXiv Detail & Related papers (2023-12-11T12:29:51Z) - Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer [50.572974726351504]
We propose C-FNT, a novel E2E model that incorporates class-based LMs into FNT.
In C-FNT, the LM score of named entities can be associated with the name class instead of its surface form.
The experimental results show that our proposed C-FNT significantly reduces error in named entities without hurting performance in general word recognition.
arXiv Detail & Related papers (2023-09-14T12:14:49Z) - IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named
Entity Recognition using Knowledge Bases [53.054598423181844]
We present a novel NER cascade approach comprising three steps.
We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities.
Our system exhibits robust performance in the MultiCoNER2 shared task, even in the low-resource language setting.
arXiv Detail & Related papers (2023-04-20T20:30:34Z) - Associating Spatially-Consistent Grouping with Text-supervised Semantic
Segmentation [117.36746226803993]
We introduce self-supervised spatially-consistent grouping with text-supervised semantic segmentation.
Considering the part-like grouped results, we further adapt a text-supervised model from image-level to region-level recognition.
Our method achieves 59.2% mIoU and 32.4% mIoU on Pascal VOC and Pascal Context benchmarks.
arXiv Detail & Related papers (2023-04-03T16:24:39Z) - ConNER: Consistency Training for Cross-lingual Named Entity Recognition [96.84391089120847]
Cross-lingual named entity recognition suffers from data scarcity in the target languages.
We propose ConNER as a novel consistency training framework for cross-lingual NER.
arXiv Detail & Related papers (2022-11-17T07:57:54Z) - Improving Chinese Named Entity Recognition by Search Engine Augmentation [2.971423962840551]
We propose a neural-based approach to perform semantic augmentation using external knowledge from search engine for Chinese NER.
In particular, a multi-channel semantic fusion model is adopted to generate the augmented input representations, which aggregates external related texts retrieved from the search engine.
arXiv Detail & Related papers (2022-10-23T08:42:05Z) - ANEC: An Amharic Named Entity Corpus and Transformer Based Recognizer [0.0]
We present an Amharic named entity recognition system based on bidirectional long short-term memory with a conditional random fields layer.
Our named entity recognition system achieves an F_1 score of 93%, which is the new state-of-the-art result for Amharic named entity recognition.
arXiv Detail & Related papers (2022-07-02T09:50:37Z) - DAMO-NLP at SemEval-2022 Task 11: A Knowledge-based System for
Multilingual Named Entity Recognition [94.1865071914727]
MultiCoNER aims at detecting semantically ambiguous named entities in short and low-context settings for multiple languages.
Our team DAMO-NLP proposes a knowledge-based system, where we build a multilingual knowledge base based on Wikipedia.
Given an input sentence, our system effectively retrieves related contexts from the knowledge base.
Our system wins 10 out of 13 tracks in the MultiCoNER shared task.
arXiv Detail & Related papers (2022-03-01T15:29:35Z) - Joint Chinese Word Segmentation and Part-of-speech Tagging via Two-stage
Span Labeling [0.2624902795082451]
We propose a neural model named SpanSegTag for joint Chinese word segmentation and part-of-speech tagging.
Our experiments show that our BERT-based model SpanSegTag achieved competitive performances on the CTB5, CTB6, and UD datasets.
arXiv Detail & Related papers (2021-12-17T12:59:02Z) - Enhancing Sindhi Word Segmentation using Subword Representation Learning and Position-aware Self-attention [19.520840812910357]
Sindhi word segmentation is a challenging task due to space omission and insertion issues.
Existing Sindhi word segmentation methods rely on designing and combining hand-crafted features.
We propose a Subword-Guided Neural Word Segmenter (SGNWS) that addresses word segmentation as a sequence labeling task.
arXiv Detail & Related papers (2020-12-30T08:31:31Z) - Incorporating Uncertain Segmentation Information into Chinese NER for
Social Media Text [18.455836845989523]
segmentation error propagation is a challenge for Chinese named entity recognition systems.
We propose a model (UIcwsNN) that specializes in identifying entities from Chinese social media text.
arXiv Detail & Related papers (2020-04-14T09:39:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.