WhisperNER: Unified Open Named Entity and Speech Recognition
- URL: http://arxiv.org/abs/2409.08107v1
- Date: Thu, 12 Sep 2024 15:00:56 GMT
- Title: WhisperNER: Unified Open Named Entity and Speech Recognition
- Authors: Gil Ayache, Menachem Pirchi, Aviv Navon, Aviv Shamsian, Gill Hetz, Joseph Keshet,
- Abstract summary: We introduce WhisperNER, a novel model that allows joint speech transcription and entity recognition.
WhisperNER supports open-type NER, enabling recognition of diverse and evolving entities at inference.
Our experiments demonstrate that WhisperNER outperforms natural baselines on both out-of-domain open type NER and supervised finetuning.
- Score: 15.535663273628147
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Integrating named entity recognition (NER) with automatic speech recognition (ASR) can significantly enhance transcription accuracy and informativeness. In this paper, we introduce WhisperNER, a novel model that allows joint speech transcription and entity recognition. WhisperNER supports open-type NER, enabling recognition of diverse and evolving entities at inference. Building on recent advancements in open NER research, we augment a large synthetic dataset with synthetic speech samples. This allows us to train WhisperNER on a large number of examples with diverse NER tags. During training, the model is prompted with NER labels and optimized to output the transcribed utterance along with the corresponding tagged entities. To evaluate WhisperNER, we generate synthetic speech for commonly used NER benchmarks and annotate existing ASR datasets with open NER tags. Our experiments demonstrate that WhisperNER outperforms natural baselines on both out-of-domain open type NER and supervised finetuning.
Related papers
- Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation [67.89838237013078]
Named entity recognition (NER) models often struggle with noisy inputs.
We propose a more realistic setting in which only noisy text and its NER labels are available.
We employ a multi-view training framework that improves robust NER without retrieving text during inference.
arXiv Detail & Related papers (2024-07-26T07:30:41Z) - In-Context Learning for Few-Shot Nested Named Entity Recognition [53.55310639969833]
We introduce an effective and innovative ICL framework for the setting of few-shot nested NER.
We improve the ICL prompt by devising a novel example demonstration selection mechanism, EnDe retriever.
In EnDe retriever, we employ contrastive learning to perform three types of representation learning, in terms of semantic similarity, boundary similarity, and label similarity.
arXiv Detail & Related papers (2024-02-02T06:57:53Z) - Using Large Language Model for End-to-End Chinese ASR and NER [35.876792804001646]
We present an encoder-decoder architecture that incorporates speech features through cross-attention.
We compare these two approaches using Chinese automatic speech recognition (ASR) and name entity recognition (NER) tasks.
Our experiments reveal that encoder-decoder architecture outperforms decoder-only architecture with a short context.
arXiv Detail & Related papers (2024-01-21T03:15:05Z) - Named Entity Recognition via Machine Reading Comprehension: A Multi-Task
Learning Approach [50.12455129619845]
Named Entity Recognition (NER) aims to extract and classify entity mentions in the text into pre-defined types.
We propose to incorporate the label dependencies among entity types into a multi-task learning framework for better MRC-based NER.
arXiv Detail & Related papers (2023-09-20T03:15:05Z) - Optimizing Bi-Encoder for Named Entity Recognition via Contrastive
Learning [80.36076044023581]
We present an efficient bi-encoder framework for named entity recognition (NER)
We frame NER as a metric learning problem that maximizes the similarity between the vector representations of an entity mention and its type.
A major challenge to this bi-encoder formulation for NER lies in separating non-entity spans from entity mentions.
arXiv Detail & Related papers (2022-08-30T23:19:04Z) - NERDA-Con: Extending NER models for Continual Learning -- Integrating
Distinct Tasks and Updating Distribution Shifts [0.0]
We propose NERDA-Con, a pipeline for training NERs with Large Language Models (LLMs) bases.
As we believe our work has implications to be utilized in the pipeline of continual learning and NER, we open-source our code as well as provide the fine-tuning library of the same name NERDA-Con.
arXiv Detail & Related papers (2022-06-28T03:22:55Z) - AISHELL-NER: Named Entity Recognition from Chinese Speech [54.434118596263126]
We introduce a new dataset AISEHLL-NER for NER from Chinese speech.
The results demonstrate that the performance could be improved by combining-aware ASR and pretrained NER tagger.
arXiv Detail & Related papers (2022-02-17T09:18:48Z) - DEXTER: Deep Encoding of External Knowledge for Named Entity Recognition
in Virtual Assistants [10.500933545429202]
In intelligent voice assistants, where NER is an important component, input to NER may be noisy because of user or speech recognition error.
We describe a NER system intended to address these problems.
We show that this technique improves related tasks, such as semantic parsing, with an improvement of up to 5% in error rate.
arXiv Detail & Related papers (2021-08-15T00:14:47Z) - End-to-end Named Entity Recognition from English Speech [51.22888702264816]
We introduce a first publicly available NER annotated dataset for English speech and present an E2E approach, which jointly optimize the ASR and NER tagger components.
We also discuss how NER from speech can be used to handle out of vocabulary (OOV) words in an ASR system.
arXiv Detail & Related papers (2020-05-22T13:39:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.