On the Use of External Data for Spoken Named Entity Recognition
- URL: http://arxiv.org/abs/2112.07648v1
- Date: Tue, 14 Dec 2021 18:49:26 GMT
- Title: On the Use of External Data for Spoken Named Entity Recognition
- Authors: Ankita Pasad, Felix Wu, Suwon Shon, Karen Livescu, Kyu J. Han
- Abstract summary: Recent advances in self-supervised speech representations have made it feasible to consider learning models with limited labeled data.
We draw on a variety of approaches, including self-training, knowledge distillation, and transfer learning, and consider their applicability to both end-to-end models and pipeline approaches.
- Score: 40.93448412171246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spoken language understanding (SLU) tasks involve mapping from speech audio
signals to semantic labels. Given the complexity of such tasks, good
performance might be expected to require large labeled datasets, which are
difficult to collect for each new task and domain. However, recent advances in
self-supervised speech representations have made it feasible to consider
learning SLU models with limited labeled data. In this work we focus on
low-resource spoken named entity recognition (NER) and address the question:
Beyond self-supervised pre-training, how can we use external speech and/or text
data that are not annotated for the task? We draw on a variety of approaches,
including self-training, knowledge distillation, and transfer learning, and
consider their applicability to both end-to-end models and pipeline (speech
recognition followed by text NER model) approaches. We find that several of
these approaches improve performance in resource-constrained settings beyond
the benefits from pre-trained representations alone. Compared to prior work, we
find improved F1 scores of up to 16%. While the best baseline model is a
pipeline approach, the best performance when using external data is ultimately
achieved by an end-to-end model. We provide detailed comparisons and analyses,
showing for example that end-to-end models are able to focus on the more
NER-specific words.
Related papers
- Synergizing Unsupervised and Supervised Learning: A Hybrid Approach for Accurate Natural Language Task Modeling [0.0]
This paper presents a novel hybrid approach that synergizes unsupervised and supervised learning to improve the accuracy of NLP task modeling.
Our methodology integrates an unsupervised module that learns representations from unlabeled corpora and a supervised module that leverages these representations to enhance task-specific models.
By synergizing techniques, our hybrid approach achieves SOTA results on benchmark datasets, paving the way for more data-efficient and robust NLP systems.
arXiv Detail & Related papers (2024-06-03T08:31:35Z) - Vocabulary-Defined Semantics: Latent Space Clustering for Improving In-Context Learning [32.178931149612644]
In-context learning enables language models to adapt to downstream data or incorporate tasks by few samples as demonstrations within the prompts.
However, the performance of in-context learning can be unstable depending on the quality, format, or order of demonstrations.
We propose a novel approach "vocabulary-defined semantics"
arXiv Detail & Related papers (2024-01-29T14:29:48Z) - SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community.
There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers.
Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z) - STOP: A dataset for Spoken Task Oriented Semantic Parsing [66.14615249745448]
End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single model.
We release the Spoken Task-Oriented semantic Parsing (STOP) dataset, the largest and most complex SLU dataset to be publicly available.
In addition to the human-recorded audio, we are releasing a TTS-generated version to benchmark the performance for low-resource domain adaptation of end-to-end SLU systems.
arXiv Detail & Related papers (2022-06-29T00:36:34Z) - Representative Subset Selection for Efficient Fine-Tuning in
Self-Supervised Speech Recognition [6.450618373898492]
We consider the task of identifying an optimal subset of data for efficient fine-tuning in self-supervised speech models for ASR.
We present the COWERAGE algorithm for representative subset selection in self-supervised ASR.
arXiv Detail & Related papers (2022-03-18T10:12:24Z) - SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation
on Natural Speech [44.68649535280397]
We propose a suite of benchmark tasks for Spoken Language Understanding Evaluation (SLUE)
SLUE consists of limited-size labeled training sets and corresponding evaluation sets.
We present the first phase of the SLUE benchmark suite, consisting of named entity recognition, sentiment analysis, and ASR on the corresponding datasets.
We provide new transcriptions and annotations on subsets of the VoxCeleb and VoxPopuli datasets, evaluation metrics and results for baseline models, and an open-source toolkit to reproduce the baselines and evaluate new models.
arXiv Detail & Related papers (2021-11-19T18:59:23Z) - An Exploration of Self-Supervised Pretrained Representations for
End-to-End Speech Recognition [98.70304981174748]
We focus on the general applications of pretrained speech representations, on advanced end-to-end automatic speech recognition (E2E-ASR) models.
We select several pretrained speech representations and present the experimental results on various open-source and publicly available corpora for E2E-ASR.
arXiv Detail & Related papers (2021-10-09T15:06:09Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - A Data Efficient End-To-End Spoken Language Understanding Architecture [22.823732899634518]
We introduce a data efficient system which is trained end-to-end, with no additional, pre-trained external module.
The proposed model achieves a reasonable size and competitive results with respect to state-of-the-art while using a small training dataset.
arXiv Detail & Related papers (2020-02-14T10:24:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.