AISHELL-NER: Named Entity Recognition from Chinese Speech
- URL: http://arxiv.org/abs/2202.08533v1
- Date: Thu, 17 Feb 2022 09:18:48 GMT
- Title: AISHELL-NER: Named Entity Recognition from Chinese Speech
- Authors: Boli Chen, Guangwei Xu, Xiaobin Wang, Pengjun Xie, Meishan Zhang, Fei
Huang
- Abstract summary: We introduce a new dataset AISEHLL-NER for NER from Chinese speech.
The results demonstrate that the performance could be improved by combining-aware ASR and pretrained NER tagger.
- Score: 54.434118596263126
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Named Entity Recognition (NER) from speech is among Spoken Language
Understanding (SLU) tasks, aiming to extract semantic information from the
speech signal. NER from speech is usually made through a two-step pipeline that
consists of (1) processing the audio using an Automatic Speech Recognition
(ASR) system and (2) applying an NER tagger to the ASR outputs. Recent works
have shown the capability of the End-to-End (E2E) approach for NER from English
and French speech, which is essentially entity-aware ASR. However, due to the
many homophones and polyphones that exist in Chinese, NER from Chinese speech
is effectively a more challenging task. In this paper, we introduce a new
dataset AISEHLL-NER for NER from Chinese speech. Extensive experiments are
conducted to explore the performance of several state-of-the-art methods. The
results demonstrate that the performance could be improved by combining
entity-aware ASR and pretrained NER tagger, which can be easily applied to the
modern SLU pipeline. The dataset is publicly available at
github.com/Alibaba-NLP/AISHELL-NER.
Related papers
- WhisperNER: Unified Open Named Entity and Speech Recognition [15.535663273628147]
We introduce WhisperNER, a novel model that allows joint speech transcription and entity recognition.
WhisperNER supports open-type NER, enabling recognition of diverse and evolving entities at inference.
Our experiments demonstrate that WhisperNER outperforms natural baselines on both out-of-domain open type NER and supervised finetuning.
arXiv Detail & Related papers (2024-09-12T15:00:56Z) - Using Large Language Model for End-to-End Chinese ASR and NER [35.876792804001646]
We present an encoder-decoder architecture that incorporates speech features through cross-attention.
We compare these two approaches using Chinese automatic speech recognition (ASR) and name entity recognition (NER) tasks.
Our experiments reveal that encoder-decoder architecture outperforms decoder-only architecture with a short context.
arXiv Detail & Related papers (2024-01-21T03:15:05Z) - Learning Speech Representation From Contrastive Token-Acoustic
Pretraining [57.08426714676043]
We propose "Contrastive Token-Acoustic Pretraining (CTAP)", which uses two encoders to bring phoneme and speech into a joint multimodal space.
The proposed CTAP model is trained on 210k speech and phoneme pairs, achieving minimally-supervised TTS, VC, and ASR.
arXiv Detail & Related papers (2023-09-01T12:35:43Z) - ESPnet-SE++: Speech Enhancement for Robust Speech Recognition,
Translation, and Understanding [86.47555696652618]
This paper presents recent progress on integrating speech separation and enhancement into the ESPnet toolkit.
A new interface has been designed to combine speech enhancement front-ends with other tasks, including automatic speech recognition (ASR), speech translation (ST), and spoken language understanding (SLU)
Results show that the integration of SE front-ends with back-end tasks is a promising research direction even for tasks besides ASR.
arXiv Detail & Related papers (2022-07-19T18:55:29Z) - End-to-End Spoken Language Understanding: Performance analyses of a
voice command task in a low resource setting [0.3867363075280543]
We present a study identifying the signal features and other linguistic properties used by an E2E model to perform the Spoken Language Understanding task.
The study is carried out in the application domain of a smart home that has to handle non-English (here French) voice commands.
arXiv Detail & Related papers (2022-07-17T13:51:56Z) - Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo
Languages [58.43299730989809]
We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data.
We induce a pseudo language as a compact discrete representation, and formulate a self-supervised pseudo speech recognition task.
This process stands on its own, or can be applied as low-cost second-stage pre-training.
arXiv Detail & Related papers (2022-05-02T17:59:02Z) - Do We Still Need Automatic Speech Recognition for Spoken Language
Understanding? [14.575551366682872]
We show that learned speech features are superior to ASR transcripts on three classification tasks.
We highlight the intrinsic robustness of wav2vec 2.0 representations to out-of-vocabulary words as key to better performance.
arXiv Detail & Related papers (2021-11-29T15:13:36Z) - End-to-end Named Entity Recognition from English Speech [51.22888702264816]
We introduce a first publicly available NER annotated dataset for English speech and present an E2E approach, which jointly optimize the ASR and NER tagger components.
We also discuss how NER from speech can be used to handle out of vocabulary (OOV) words in an ASR system.
arXiv Detail & Related papers (2020-05-22T13:39:14Z) - ESPnet-ST: All-in-One Speech Translation Toolkit [57.76342114226599]
ESPnet-ST is a new project inside end-to-end speech processing toolkit, ESPnet.
It implements automatic speech recognition, machine translation, and text-to-speech functions for speech translation.
We provide all-in-one recipes including data pre-processing, feature extraction, training, and decoding pipelines.
arXiv Detail & Related papers (2020-04-21T18:38:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.