DEXTER: Deep Encoding of External Knowledge for Named Entity Recognition
in Virtual Assistants
- URL: http://arxiv.org/abs/2108.06633v1
- Date: Sun, 15 Aug 2021 00:14:47 GMT
- Title: DEXTER: Deep Encoding of External Knowledge for Named Entity Recognition
in Virtual Assistants
- Authors: Deepak Muralidharan, Joel Ruben Antony Moniz, Weicheng Zhang, Stephen
Pulman, Lin Li, Megan Barnes, Jingjing Pan, Jason Williams, Alex Acero
- Abstract summary: In intelligent voice assistants, where NER is an important component, input to NER may be noisy because of user or speech recognition error.
We describe a NER system intended to address these problems.
We show that this technique improves related tasks, such as semantic parsing, with an improvement of up to 5% in error rate.
- Score: 10.500933545429202
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Named entity recognition (NER) is usually developed and tested on text from
well-written sources. However, in intelligent voice assistants, where NER is an
important component, input to NER may be noisy because of user or speech
recognition error. In applications, entity labels may change frequently, and
non-textual properties like topicality or popularity may be needed to choose
among alternatives.
We describe a NER system intended to address these problems. We test and
train this system on a proprietary user-derived dataset. We compare with a
baseline text-only NER system; the baseline enhanced with external gazetteers;
and the baseline enhanced with the search and indirect labelling techniques we
describe below. The final configuration gives around 6% reduction in NER error
rate. We also show that this technique improves related tasks, such as semantic
parsing, with an improvement of up to 5% in error rate.
Related papers
- WhisperNER: Unified Open Named Entity and Speech Recognition [15.535663273628147]
We introduce WhisperNER, a novel model that allows joint speech transcription and entity recognition.
WhisperNER supports open-type NER, enabling recognition of diverse and evolving entities at inference.
Our experiments demonstrate that WhisperNER outperforms natural baselines on both out-of-domain open type NER and supervised finetuning.
arXiv Detail & Related papers (2024-09-12T15:00:56Z) - Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation [67.89838237013078]
Named entity recognition (NER) models often struggle with noisy inputs.
We propose a more realistic setting in which only noisy text and its NER labels are available.
We employ a multi-view training framework that improves robust NER without retrieving text during inference.
arXiv Detail & Related papers (2024-07-26T07:30:41Z) - Named Entity Recognition via Machine Reading Comprehension: A Multi-Task
Learning Approach [50.12455129619845]
Named Entity Recognition (NER) aims to extract and classify entity mentions in the text into pre-defined types.
We propose to incorporate the label dependencies among entity types into a multi-task learning framework for better MRC-based NER.
arXiv Detail & Related papers (2023-09-20T03:15:05Z) - Optimizing Bi-Encoder for Named Entity Recognition via Contrastive
Learning [80.36076044023581]
We present an efficient bi-encoder framework for named entity recognition (NER)
We frame NER as a metric learning problem that maximizes the similarity between the vector representations of an entity mention and its type.
A major challenge to this bi-encoder formulation for NER lies in separating non-entity spans from entity mentions.
arXiv Detail & Related papers (2022-08-30T23:19:04Z) - Empirical Study of Named Entity Recognition Performance Using
Distribution-aware Word Embedding [15.955385058787348]
We develop a distribution-aware word embedding and implement three different methods to make use of the distribution information in a NER framework.
The performance of NER will be improved if the word specificity is incorporated into existing NER methods.
arXiv Detail & Related papers (2021-09-03T17:28:04Z) - Cross-domain Speech Recognition with Unsupervised Character-level
Distribution Matching [60.8427677151492]
We propose CMatch, a Character-level distribution matching method to perform fine-grained adaptation between each character in two domains.
Experiments on the Libri-Adapt dataset show that our proposed approach achieves 14.39% and 16.50% relative Word Error Rate (WER) reduction on both cross-device and cross-environment ASR.
arXiv Detail & Related papers (2021-04-15T14:36:54Z) - Named Entity Recognition in the Legal Domain using a Pointer Generator
Network [0.0]
We study the problem of legal NER with noisy text extracted from PDF files of filed court cases from US courts.
The exact location of the entities in the text is unknown and the entities may contain typos and/or OCR mistakes.
We formulate the NER task as a text-to-text sequence generation task and train a pointer generator network to generate the entities in the document rather than label them.
arXiv Detail & Related papers (2020-12-17T21:10:34Z) - Noise Robust Named Entity Understanding for Voice Assistants [14.193603900541005]
We show that our proposed framework improves NER accuracy by up to 3.13% and EL accuracy by up to 3.6% in F1 score.
The features used also lead to better accuracies in other natural language understanding tasks, such as domain classification and semantic parsing.
arXiv Detail & Related papers (2020-05-29T06:14:53Z) - End-to-end Named Entity Recognition from English Speech [51.22888702264816]
We introduce a first publicly available NER annotated dataset for English speech and present an E2E approach, which jointly optimize the ASR and NER tagger components.
We also discuss how NER from speech can be used to handle out of vocabulary (OOV) words in an ASR system.
arXiv Detail & Related papers (2020-05-22T13:39:14Z) - Interpretability Analysis for Named Entity Recognition to Understand
System Predictions and How They Can Improve [49.878051587667244]
We examine the performance of several variants of LSTM-CRF architectures for named entity recognition.
We find that context representations do contribute to system performance, but that the main factor driving high performance is learning the name tokens themselves.
We enlist human annotators to evaluate the feasibility of inferring entity types from the context alone and find that, while people are not able to infer the entity type either for the majority of the errors made by the context-only system, there is some room for improvement.
arXiv Detail & Related papers (2020-04-09T14:37:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.