Language Bias in Self-Supervised Learning For Automatic Speech Recognition
- URL: http://arxiv.org/abs/2501.19321v1
- Date: Fri, 31 Jan 2025 17:16:45 GMT
- Title: Language Bias in Self-Supervised Learning For Automatic Speech Recognition
- Authors: Edward Storey, Naomi Harte, Peter Bell,
- Abstract summary: Self-supervised learning (SSL) is used in deep learning to train on large datasets without the need for expensive labelling of the data.<n>In this paper, we identify language-specificworks within XLS-R and test the performance of theseworks on a variety of different languages.
- Score: 15.976590369684464
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning (SSL) is used in deep learning to train on large datasets without the need for expensive labelling of the data. Recently, large Automatic Speech Recognition (ASR) models such as XLS-R have utilised SSL to train on over one hundred different languages simultaneously. However, deeper investigation shows that the bulk of the training data for XLS-R comes from a small number of languages. Biases learned through SSL have been shown to exist in multiple domains, but language bias in multilingual SSL ASR has not been thoroughly examined. In this paper, we utilise the Lottery Ticket Hypothesis (LTH) to identify language-specific subnetworks within XLS-R and test the performance of these subnetworks on a variety of different languages. We are able to show that when fine-tuning, XLS-R bypasses traditional linguistic knowledge and builds only on weights learned from the languages with the largest data contribution to the pretraining data.
Related papers
- SSLR: A Semi-Supervised Learning Method for Isolated Sign Language Recognition [2.409285779772107]
Sign language recognition systems aim to recognize sign gestures and translate them into spoken language.
One of the main challenges in SLR is the scarcity of annotated datasets.
We propose a semi-supervised learning approach for SLR, employing a pseudo-label method to annotate unlabeled samples.
arXiv Detail & Related papers (2025-04-23T11:59:52Z) - How Do Multilingual Language Models Remember Facts? [50.13632788453612]
We show that previously identified recall mechanisms in English largely apply to multilingual contexts.
We localize the role of language during recall, finding that subject enrichment is language-independent.
In decoder-only LLMs, FVs compose these two pieces of information in two separate stages.
arXiv Detail & Related papers (2024-10-18T11:39:34Z) - Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - Joint Prediction and Denoising for Large-scale Multilingual
Self-supervised Learning [69.77973092264338]
We show that more powerful techniques can lead to more efficient pre-training, opening SSL to more research groups.
We propose WavLabLM, which extends WavLM's joint prediction and denoising to 40k hours of data across 136 languages.
We show that further efficiency can be achieved with a vanilla HuBERT Base model, which can maintain 94% of XLS-R's performance with only 3% of the data.
arXiv Detail & Related papers (2023-09-26T23:55:57Z) - CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large
Language Models in 167 Languages [86.90220551111096]
Training datasets for large language models (LLMs) are often not fully disclosed.
We present CulturaX, a substantial multilingual dataset with 6.3 trillion tokens in 167 languages.
arXiv Detail & Related papers (2023-09-17T23:49:10Z) - Efficient Spoken Language Recognition via Multilabel Classification [53.662747523872305]
We show that our models obtain competitive results while being orders of magnitude smaller and faster than current state-of-the-art methods.
Our multilabel strategy is more robust to unseen non-target languages compared to multiclass classification.
arXiv Detail & Related papers (2023-06-02T23:04:19Z) - The Tag-Team Approach: Leveraging CLS and Language Tagging for Enhancing
Multilingual ASR [0.2676349883103404]
Building a multilingual Automated Speech Recognition system in a linguistically diverse country like India can be a challenging task.
This problem can be solved by exploiting the fact that many of these languages are phonetically similar.
New approaches are explored and compared to improve the performance of CLS based multilingual ASR model.
arXiv Detail & Related papers (2023-05-31T06:09:11Z) - MRN: Multiplexed Routing Network for Incremental Multilingual Text
Recognition [56.408324994409405]
Multiplexed routing network (MRN) trains a recognizer for each language that is currently seen.
MRN effectively reduces the reliance on older data and better fights against catastrophic forgetting.
It outperforms existing general-purpose IL methods by large margins.
arXiv Detail & Related papers (2023-05-24T06:03:34Z) - How do languages influence each other? Studying cross-lingual data sharing during LM fine-tuning [14.02101305717738]
Multilingual large language models (MLLMs) are jointly trained on data from many different languages.
It remains unclear to what extent, and under which conditions, languages rely on each other's data.
We find that MLLMs rely on data from multiple languages from the early stages of fine-tuning and that this reliance gradually increases as fine-tuning progresses.
arXiv Detail & Related papers (2023-05-22T17:47:41Z) - LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and
Translation Using Neural Transducers [71.76680102779765]
Automatic speech recognition (ASR) and speech translation (ST) can both use neural transducers as the model structure.
We propose LAMASSU, a streaming language-agnostic multilingual speech recognition and translation model using neural transducers.
arXiv Detail & Related papers (2022-11-05T04:03:55Z) - Combining Spectral and Self-Supervised Features for Low Resource Speech
Recognition and Translation [27.857955394020475]
Self-Supervised Learning (SSL) models have been successfully applied in various deep learning-based speech tasks.
The quality of SSL representations depends highly on the relatedness between the SSL training domain(s) and the target data domain.
We propose a learnable and interpretable framework to combine SF and SSL representations.
arXiv Detail & Related papers (2022-04-05T20:09:15Z) - LeBenchmark: A Reproducible Framework for Assessing Self-Supervised
Representation Learning from Speech [63.84741259993937]
Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing.
Recent works also investigated SSL from speech.
We propose LeBenchmark: a reproducible framework for assessing SSL from speech.
arXiv Detail & Related papers (2021-04-23T08:27:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.