Related papers: Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource Languages

Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource Languages

URL: http://arxiv.org/abs/2307.00759v1
Date: Mon, 3 Jul 2023 05:29:38 GMT
Title: Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource Languages
Authors: Devang Kulshreshtha, Saket Dingliwal, Brady Houston, Sravan Bodapati
Abstract summary: We study Contextual Adapters, wherein an attention-based biasing model for CTC is used to improve the recognition of custom entities. In this work, we propose a supervision loss for smoother training of the Contextual Adapters. Our method achieves 48% F1 improvement in retrieving unseen custom entities for a low-resource language.
Score: 3.7870350845913165
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Connectionist Temporal Classification (CTC) models are popular for their balance between speed and performance for Automatic Speech Recognition (ASR). However, these CTC models still struggle in other areas, such as personalization towards custom words. A recent approach explores Contextual Adapters, wherein an attention-based biasing model for CTC is used to improve the recognition of custom entities. While this approach works well with enough data, we showcase that it isn't an effective strategy for low-resource languages. In this work, we propose a supervision loss for smoother training of the Contextual Adapters. Further, we explore a multilingual strategy to improve performance with limited training data. Our method achieves 48% F1 improvement in retrieving unseen custom entities for a low-resource language. Interestingly, as a by-product of training the Contextual Adapters, we see a 5-11% Word Error Rate (WER) reduction in the performance of the base CTC model as well.

Related papers

Metadata Conditioning Accelerates Language Model Pre-training [76.54265482251454]
We propose a new method, termed Metadata Conditioning then Cooldown (MeCo) to incorporate additional learning cues during pre-training. MeCo significantly accelerates pre-training across different model scales (600M to 8B parameters) and training sources (C4, RefinedWeb, and DCLM) MeCo is remarkably simple, adds no computational overhead, and demonstrates promise in producing more capable and steerable language models.
arXiv Detail & Related papers (2025-01-03T18:59:23Z)
How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario [72.02391485962127]
Speech Self-Supervised Learning (SSL) models achieve impressive performance on Automatic Speech Recognition (ASR) In low-resource language ASR, they encounter the domain mismatch problem between pre-trained and low-resource languages. We extend a conventional efficient fine-tuning scheme based on the adapter to handle these issues.
arXiv Detail & Related papers (2024-11-27T10:51:00Z)
SMILE: Speech Meta In-Context Learning for Low-Resource Language Automatic Speech Recognition [55.2480439325792]
Speech Meta In-Context LEarning (SMILE) is an innovative framework that combines meta-learning with speech in-context learning (SICL)<n>We show that SMILE consistently outperforms baseline methods in training-free few-shot multilingual ASR tasks.
arXiv Detail & Related papers (2024-09-16T16:04:16Z)
Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization [44.008094698200026]
We propose a new training paradigm termed Direct CLIP-Based Optimization (DiCO) Our approach jointly learns and optimize a reward model that is distilled from a learnable captioning evaluator with high human correlation. DiCO not only exhibits improved stability and enhanced quality in the generated captions but also aligns more closely with human preferences compared to existing methods.
arXiv Detail & Related papers (2024-08-26T18:00:33Z)
InstructionCP: A fast approach to transfer Large Language Models into target language [55.2480439325792]
InsCP integrates instruction tags into the CP process to prevent loss of conversational proficiency while acquiring new languages. Our experiments demonstrate that InsCP retains conversational and Reinforcement Learning from Human Feedback abilities. This approach requires only 0.1 billion tokens of high-quality instruction-following data, thereby reducing resource consumption.
arXiv Detail & Related papers (2024-05-30T15:45:13Z)
Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling [28.758396218435635]
Acoustic word embeddings are created by training a pooling function using pairs of word-like units. Mean-pooled representations from a self-supervised English model were suggested as a promising alternative, but their performance on target languages was not fully competitive. We show that both approaches outperform a recent approach on word discrimination.
arXiv Detail & Related papers (2023-06-03T16:44:21Z)
Efficient Spoken Language Recognition via Multilabel Classification [53.662747523872305]
We show that our models obtain competitive results while being orders of magnitude smaller and faster than current state-of-the-art methods. Our multilabel strategy is more robust to unseen non-target languages compared to multiclass classification.
arXiv Detail & Related papers (2023-06-02T23:04:19Z)
Strategies for improving low resource speech to text translation relying on pre-trained ASR models [59.90106959717875]
This paper presents techniques and findings for improving the performance of low-resource speech to text translation (ST) We conducted experiments on both simulated and real-low resource setups, on language pairs English - Portuguese, and Tamasheq - French respectively.
arXiv Detail & Related papers (2023-05-31T21:58:07Z)
Boosting Visual-Language Models by Exploiting Hard Samples [126.35125029639168]
HELIP is a cost-effective strategy tailored to enhance the performance of existing CLIP models. Our method allows for effortless integration with existing models' training pipelines. On comprehensive benchmarks, HELIP consistently boosts existing models to achieve leading performance.
arXiv Detail & Related papers (2023-05-09T07:00:17Z)
Improving Massively Multilingual ASR With Auxiliary CTC Objectives [40.10307386370194]
We introduce our work on improving performance on FLEURS, a 102-language open ASR benchmark. We investigate techniques inspired from recent Connectionist Temporal Classification ( CTC) studies to help the model handle the large number of languages. Our state-of-the-art systems using self-supervised models with the Conformer architecture improve over the results of prior work on FLEURS by a relative 28.4% CER.
arXiv Detail & Related papers (2023-02-24T18:59:51Z)
On the Usability of Transformers-based models for a French Question-Answering task [2.44288434255221]
This paper focuses on the usability of Transformer-based language models in small-scale learning problems. We introduce a new compact model for French FrALBERT which proves to be competitive in low-resource settings.
arXiv Detail & Related papers (2022-07-19T09:46:15Z)
Adaptive Activation Network For Low Resource Multilingual Speech Recognition [30.460501537763736]
We introduce an adaptive activation network to the upper layers of ASR model. We also proposed two approaches to train the model: (1) cross-lingual learning, replacing the activation function from source language to target language, and (2) multilingual learning. Our experiments on IARPA Babel datasets demonstrated that our approaches outperform the from-scratch training and traditional bottleneck feature based methods.
arXiv Detail & Related papers (2022-05-28T04:02:59Z)
Exploiting Adapters for Cross-lingual Low-resource Speech Recognition [52.40623653290499]
Cross-lingual speech adaptation aims to solve the problem of leveraging multiple rich-resource languages to build models for a low-resource target language. We propose adapters to investigate the performance of multiple adapters for parameter-efficient cross-lingual speech adaptation.
arXiv Detail & Related papers (2021-05-18T08:30:37Z)
Building Low-Resource NER Models Using Non-Speaker Annotation [58.78968578460793]
Cross-lingual methods have had notable success in addressing these concerns. We propose a complementary approach to building low-resource Named Entity Recognition (NER) models using non-speaker'' (NS) annotations. We show that use of NS annotators produces results that are consistently on par or better than cross-lingual methods built on modern contextual representations.
arXiv Detail & Related papers (2020-06-17T03:24:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.