DAMO-NLP at NLPCC-2022 Task 2: Knowledge Enhanced Robust NER for Speech
Entity Linking
- URL: http://arxiv.org/abs/2209.13187v2
- Date: Thu, 29 Sep 2022 01:21:29 GMT
- Title: DAMO-NLP at NLPCC-2022 Task 2: Knowledge Enhanced Robust NER for Speech
Entity Linking
- Authors: Shen Huang, Yuchen Zhai, Xinwei Long, Yong Jiang, Xiaobin Wang, Yin
Zhang and Pengjun Xie
- Abstract summary: Speech Entity Linking aims to recognize and disambiguate named entities in spoken languages.
Conventional methods suffer from the unfettered speech styles and the noisy transcripts generated by ASR systems.
We propose Knowledge Enhanced Named Entity Recognition (KENER), which focuses on improving robustness through painlessly incorporating proper knowledge in the entity recognition stage.
Our system achieves 1st place in Track 1 and 2nd place in Track 2 of NLPCC-2022 Shared Task 2.
- Score: 32.915297772110364
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Speech Entity Linking aims to recognize and disambiguate named entities in
spoken languages. Conventional methods suffer gravely from the unfettered
speech styles and the noisy transcripts generated by ASR systems. In this
paper, we propose a novel approach called Knowledge Enhanced Named Entity
Recognition (KENER), which focuses on improving robustness through painlessly
incorporating proper knowledge in the entity recognition stage and thus
improving the overall performance of entity linking. KENER first retrieves
candidate entities for a sentence without mentions, and then utilizes the
entity descriptions as extra information to help recognize mentions. The
candidate entities retrieved by a dense retrieval module are especially useful
when the input is short or noisy. Moreover, we investigate various data
sampling strategies and design effective loss functions, in order to improve
the quality of retrieved entities in both recognition and disambiguation
stages. Lastly, a linking with filtering module is applied as the final
safeguard, making it possible to filter out wrongly-recognized mentions. Our
system achieves 1st place in Track 1 and 2nd place in Track 2 of NLPCC-2022
Shared Task 2.
Related papers
- AKEM: Aligning Knowledge Base to Queries with Ensemble Model for Entity
Recognition and Linking [15.548722102706867]
This paper presents a novel approach to address the Entity Recognition and Linking Challenge at NLPCC 2015.
The task involves extracting named entity mentions from short search queries and linking them to entities within a reference Chinese knowledge base.
Our method is computationally efficient and achieves an F1 score of 0.535.
arXiv Detail & Related papers (2023-09-12T12:37:37Z) - Improving Few-shot and Zero-shot Entity Linking with Coarse-to-Fine
Lexicon-based Retriever [30.096395104683193]
Few-shot and zero-shot entity linking focus on the tail and emerging entities.
We propose a coarse-to-fine lexicon-based retriever to retrieve entity candidates in an effective manner.
Our approach ranks the 1st in NLPCC 2023 Shared Task 6 on Chinese Few-shot and Zero-shot Entity Linking.
arXiv Detail & Related papers (2023-08-07T07:39:43Z) - PAI at SemEval-2023 Task 2: A Universal System for Named Entity
Recognition with External Entity Information [19.995198769980345]
The MultiCoNER II task aims to detect complex, ambiguous, and fine-grained named entities in low-context situations and noisy scenarios.
Our system retrieves entities with properties from the knowledge base (i.e. Wikipedia) for a given text, then retrieves entity information with the input sentence and feeds it into Transformer-based models.
arXiv Detail & Related papers (2023-05-10T12:40:48Z) - IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named
Entity Recognition using Knowledge Bases [53.054598423181844]
We present a novel NER cascade approach comprising three steps.
We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities.
Our system exhibits robust performance in the MultiCoNER2 shared task, even in the low-resource language setting.
arXiv Detail & Related papers (2023-04-20T20:30:34Z) - Introducing Semantics into Speech Encoders [91.37001512418111]
We propose an unsupervised way of incorporating semantic information from large language models into self-supervised speech encoders without labeled audio transcriptions.
Our approach achieves similar performance as supervised methods trained on over 100 hours of labeled audio transcripts.
arXiv Detail & Related papers (2022-11-15T18:44:28Z) - The Overlooked Classifier in Human-Object Interaction Recognition [82.20671129356037]
We encode the semantic correlation among classes into the classification head by initializing the weights with language embeddings of HOIs.
We propose a new loss named LSE-Sign to enhance multi-label learning on a long-tailed dataset.
Our simple yet effective method enables detection-free HOI classification, outperforming the state-of-the-arts that require object detection and human pose by a clear margin.
arXiv Detail & Related papers (2022-03-10T23:35:00Z) - Nested Named Entity Recognition as Latent Lexicalized Constituency
Parsing [29.705133932275892]
Recently, (Fu et al, 2021) adapt a span-based constituency to tackle nested NER.
In this work, we resort to more expressive structures, lexicalized constituency trees in which constituents are annotated by headwords.
We leverage the Eisner-Satta algorithm to perform partial marginalization and inference efficiently.
arXiv Detail & Related papers (2022-03-09T12:02:59Z) - DAMO-NLP at SemEval-2022 Task 11: A Knowledge-based System for
Multilingual Named Entity Recognition [94.1865071914727]
MultiCoNER aims at detecting semantically ambiguous named entities in short and low-context settings for multiple languages.
Our team DAMO-NLP proposes a knowledge-based system, where we build a multilingual knowledge base based on Wikipedia.
Given an input sentence, our system effectively retrieves related contexts from the knowledge base.
Our system wins 10 out of 13 tracks in the MultiCoNER shared task.
arXiv Detail & Related papers (2022-03-01T15:29:35Z) - Locate and Label: A Two-stage Identifier for Nested Named Entity
Recognition [9.809157050048375]
We propose a two-stage entity identifier for named entity recognition.
First, we generate span proposals by filtering and boundary regression on the seed spans to locate the entities, and then label the boundary-adjusted span proposals with the corresponding categories.
Our method effectively utilizes the boundary information of entities and partially matched spans during training.
arXiv Detail & Related papers (2021-05-14T12:52:34Z) - PIN: A Novel Parallel Interactive Network for Spoken Language
Understanding [68.53121591998483]
In the existing RNN-based approaches, ID and SF tasks are often jointly modeled to utilize the correlation information between them.
The experiments on two benchmark datasets, i.e., SNIPS and ATIS, demonstrate the effectiveness of our approach.
More encouragingly, by using the feature embedding of the utterance generated by the pre-trained language model BERT, our method achieves the state-of-the-art among all comparison approaches.
arXiv Detail & Related papers (2020-09-28T15:59:31Z) - Probing Linguistic Features of Sentence-Level Representations in Neural
Relation Extraction [80.38130122127882]
We introduce 14 probing tasks targeting linguistic properties relevant to neural relation extraction (RE)
We use them to study representations learned by more than 40 different encoder architecture and linguistic feature combinations trained on two datasets.
We find that the bias induced by the architecture and the inclusion of linguistic features are clearly expressed in the probing task performance.
arXiv Detail & Related papers (2020-04-17T09:17:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.