Multilingual Contextual Adapters To Improve Custom Word Recognition In
Low-resource Languages
- URL: http://arxiv.org/abs/2307.00759v1
- Date: Mon, 3 Jul 2023 05:29:38 GMT
- Title: Multilingual Contextual Adapters To Improve Custom Word Recognition In
Low-resource Languages
- Authors: Devang Kulshreshtha, Saket Dingliwal, Brady Houston, Sravan Bodapati
- Abstract summary: We study Contextual Adapters, wherein an attention-based biasing model for CTC is used to improve the recognition of custom entities.
In this work, we propose a supervision loss for smoother training of the Contextual Adapters.
Our method achieves 48% F1 improvement in retrieving unseen custom entities for a low-resource language.
- Score: 3.7870350845913165
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Connectionist Temporal Classification (CTC) models are popular for their
balance between speed and performance for Automatic Speech Recognition (ASR).
However, these CTC models still struggle in other areas, such as
personalization towards custom words. A recent approach explores Contextual
Adapters, wherein an attention-based biasing model for CTC is used to improve
the recognition of custom entities. While this approach works well with enough
data, we showcase that it isn't an effective strategy for low-resource
languages. In this work, we propose a supervision loss for smoother training of
the Contextual Adapters. Further, we explore a multilingual strategy to improve
performance with limited training data. Our method achieves 48% F1 improvement
in retrieving unseen custom entities for a low-resource language.
Interestingly, as a by-product of training the Contextual Adapters, we see a
5-11% Word Error Rate (WER) reduction in the performance of the base CTC model
as well.
Related papers
- InstructionCP: A fast approach to transfer Large Language Models into target language [55.2480439325792]
InsCP integrates instruction tags into the CP process to prevent loss of conversational proficiency while acquiring new languages.
Our experiments demonstrate that InsCP retains conversational and Reinforcement Learning from Human Feedback abilities.
This approach requires only 0.1 billion tokens of high-quality instruction-following data, thereby reducing resource consumption.
arXiv Detail & Related papers (2024-05-30T15:45:13Z) - Acoustic Word Embeddings for Untranscribed Target Languages with
Continued Pretraining and Learned Pooling [28.758396218435635]
Acoustic word embeddings are created by training a pooling function using pairs of word-like units.
Mean-pooled representations from a self-supervised English model were suggested as a promising alternative, but their performance on target languages was not fully competitive.
We show that both approaches outperform a recent approach on word discrimination.
arXiv Detail & Related papers (2023-06-03T16:44:21Z) - Efficient Spoken Language Recognition via Multilabel Classification [53.662747523872305]
We show that our models obtain competitive results while being orders of magnitude smaller and faster than current state-of-the-art methods.
Our multilabel strategy is more robust to unseen non-target languages compared to multiclass classification.
arXiv Detail & Related papers (2023-06-02T23:04:19Z) - Strategies for improving low resource speech to text translation relying
on pre-trained ASR models [59.90106959717875]
This paper presents techniques and findings for improving the performance of low-resource speech to text translation (ST)
We conducted experiments on both simulated and real-low resource setups, on language pairs English - Portuguese, and Tamasheq - French respectively.
arXiv Detail & Related papers (2023-05-31T21:58:07Z) - Boosting Visual-Language Models by Exploiting Hard Samples [126.35125029639168]
HELIP is a cost-effective strategy tailored to enhance the performance of existing CLIP models.
Our method allows for effortless integration with existing models' training pipelines.
On comprehensive benchmarks, HELIP consistently boosts existing models to achieve leading performance.
arXiv Detail & Related papers (2023-05-09T07:00:17Z) - Improving Massively Multilingual ASR With Auxiliary CTC Objectives [40.10307386370194]
We introduce our work on improving performance on FLEURS, a 102-language open ASR benchmark.
We investigate techniques inspired from recent Connectionist Temporal Classification ( CTC) studies to help the model handle the large number of languages.
Our state-of-the-art systems using self-supervised models with the Conformer architecture improve over the results of prior work on FLEURS by a relative 28.4% CER.
arXiv Detail & Related papers (2023-02-24T18:59:51Z) - On the Usability of Transformers-based models for a French
Question-Answering task [2.44288434255221]
This paper focuses on the usability of Transformer-based language models in small-scale learning problems.
We introduce a new compact model for French FrALBERT which proves to be competitive in low-resource settings.
arXiv Detail & Related papers (2022-07-19T09:46:15Z) - Adaptive Activation Network For Low Resource Multilingual Speech
Recognition [30.460501537763736]
We introduce an adaptive activation network to the upper layers of ASR model.
We also proposed two approaches to train the model: (1) cross-lingual learning, replacing the activation function from source language to target language, and (2) multilingual learning.
Our experiments on IARPA Babel datasets demonstrated that our approaches outperform the from-scratch training and traditional bottleneck feature based methods.
arXiv Detail & Related papers (2022-05-28T04:02:59Z) - Continual Learning for Monolingual End-to-End Automatic Speech
Recognition [16.651146574124567]
Adapting Automatic Speech Recognition (ASR) models to new domains leads to a deterioration of performance on the original domain(s)
Even monolingual ASR models cannot be extended to new accents, dialects, topics, etc. without suffering from Catastrophic Forgetting (CF)
arXiv Detail & Related papers (2021-12-17T10:47:17Z) - Exploiting Adapters for Cross-lingual Low-resource Speech Recognition [52.40623653290499]
Cross-lingual speech adaptation aims to solve the problem of leveraging multiple rich-resource languages to build models for a low-resource target language.
We propose adapters to investigate the performance of multiple adapters for parameter-efficient cross-lingual speech adaptation.
arXiv Detail & Related papers (2021-05-18T08:30:37Z) - Building Low-Resource NER Models Using Non-Speaker Annotation [58.78968578460793]
Cross-lingual methods have had notable success in addressing these concerns.
We propose a complementary approach to building low-resource Named Entity Recognition (NER) models using non-speaker'' (NS) annotations.
We show that use of NS annotators produces results that are consistently on par or better than cross-lingual methods built on modern contextual representations.
arXiv Detail & Related papers (2020-06-17T03:24:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.