End-to-end Named Entity Recognition from English Speech
- URL: http://arxiv.org/abs/2005.11184v1
- Date: Fri, 22 May 2020 13:39:14 GMT
- Title: End-to-end Named Entity Recognition from English Speech
- Authors: Hemant Yadav, Sreyan Ghosh, Yi Yu, Rajiv Ratn Shah
- Abstract summary: We introduce a first publicly available NER annotated dataset for English speech and present an E2E approach, which jointly optimize the ASR and NER tagger components.
We also discuss how NER from speech can be used to handle out of vocabulary (OOV) words in an ASR system.
- Score: 51.22888702264816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Named entity recognition (NER) from text has been a widely studied problem
and usually extracts semantic information from text. Until now, NER from speech
is mostly studied in a two-step pipeline process that includes first applying
an automatic speech recognition (ASR) system on an audio sample and then
passing the predicted transcript to a NER tagger. In such cases, the error does
not propagate from one step to another as both the tasks are not optimized in
an end-to-end (E2E) fashion. Recent studies confirm that integrated approaches
(e.g., E2E ASR) outperform sequential ones (e.g., phoneme based ASR). In this
paper, we introduce a first publicly available NER annotated dataset for
English speech and present an E2E approach, which jointly optimizes the ASR and
NER tagger components. Experimental results show that the proposed E2E approach
outperforms the classical two-step approach. We also discuss how NER from
speech can be used to handle out of vocabulary (OOV) words in an ASR system.
Related papers
- WhisperNER: Unified Open Named Entity and Speech Recognition [15.535663273628147]
We introduce WhisperNER, a novel model that allows joint speech transcription and entity recognition.
WhisperNER supports open-type NER, enabling recognition of diverse and evolving entities at inference.
Our experiments demonstrate that WhisperNER outperforms natural baselines on both out-of-domain open type NER and supervised finetuning.
arXiv Detail & Related papers (2024-09-12T15:00:56Z) - One model to rule them all ? Towards End-to-End Joint Speaker
Diarization and Speech Recognition [50.055765860343286]
This paper presents a novel framework for joint speaker diarization and automatic speech recognition.
The framework, named SLIDAR, can process arbitrary length inputs and can handle any number of speakers.
Experiments performed on monaural recordings from the AMI corpus confirm the effectiveness of the method in both close-talk and far-field speech scenarios.
arXiv Detail & Related papers (2023-10-02T23:03:30Z) - Improving Code-Switching and Named Entity Recognition in ASR with Speech
Editing based Data Augmentation [22.38340990398735]
We propose a novel data augmentation method by applying the text-based speech editing model.
The experimental results on code-switching and NER tasks show that our proposed method can significantly outperform the audio splicing and neural TTS based data augmentation systems.
arXiv Detail & Related papers (2023-06-14T15:50:13Z) - Leveraging Large Text Corpora for End-to-End Speech Summarization [58.673480990374635]
End-to-end speech summarization (E2E SSum) is a technique to directly generate summary sentences from speech.
We present two novel methods that leverage a large amount of external text summarization data for E2E SSum training.
arXiv Detail & Related papers (2023-03-02T05:19:49Z) - ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition [100.30565531246165]
Speech recognition systems require dataset-specific tuning.
This tuning requirement can lead to systems failing to generalise to other datasets and domains.
We introduce the End-to-end Speech Benchmark (ESB) for evaluating the performance of a single automatic speech recognition system.
arXiv Detail & Related papers (2022-10-24T15:58:48Z) - Deliberation Model for On-Device Spoken Language Understanding [69.5587671262691]
We propose a novel deliberation-based approach to end-to-end (E2E) spoken language understanding (SLU)
We show that our approach can significantly reduce the degradation when moving from natural speech to synthetic speech training.
arXiv Detail & Related papers (2022-04-04T23:48:01Z) - AISHELL-NER: Named Entity Recognition from Chinese Speech [54.434118596263126]
We introduce a new dataset AISEHLL-NER for NER from Chinese speech.
The results demonstrate that the performance could be improved by combining-aware ASR and pretrained NER tagger.
arXiv Detail & Related papers (2022-02-17T09:18:48Z) - Exploring Machine Speech Chain for Domain Adaptation and Few-Shot
Speaker Adaptation [11.79922306758482]
Machine Speech Chain integrates end-to-end automatic speech recognition (ASR) and text-to-speech (TTS) into one circle for joint training.
We explore the TTS->ASR pipeline in speech chain to do domain adaptation for both neural TTS and E2E ASR models.
arXiv Detail & Related papers (2021-04-08T14:52:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.