Related papers: Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples

Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples

URL: http://arxiv.org/abs/2303.17517v1
Date: Thu, 30 Mar 2023 16:34:10 GMT
Title: Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples
Authors: Hyeonggon Ryu, Arda Senocak, In So Kweon, Joon Son Chung
Abstract summary: The objective of this work is to explore the learning of visually grounded speech models (VGS) from multilingual perspective. Our key contribution in this work is to leverage the power of a high-resource language in a bilingual visually grounded speech model to improve the performance of a low-resource language.
Score: 89.16814518860357
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The objective of this work is to explore the learning of visually grounded speech models (VGS) from multilingual perspective. Bilingual VGS models are generally trained with an equal number of spoken captions from both languages. However, in reality, there can be an imbalance among the languages for the available spoken captions. Our key contribution in this work is to leverage the power of a high-resource language in a bilingual visually grounded speech model to improve the performance of a low-resource language. We introduce two methods to distill the knowledge of high-resource language into low-resource languages: (1) incorporating a strong pre-trained high-resource language encoder and (2) using semantically similar spoken captions. Our experiments show that combining these two approaches effectively enables the low-resource language to surpass the performances of monolingual and bilingual counterparts for cross-modal retrieval tasks.

Related papers

Leveraging Audio-Visual Data to Reduce the Multilingual Gap in Self-Supervised Speech Models [8.556170587475146]
Self-supervised learning (SSL) has made significant advances in speech representation learning.<n> multilingual SSL models tend to underperform their monolingual counterparts on each individual language.<n>We introduce limited visual grounding into bilingual speech SSL models to reduce this performance gap.
arXiv Detail & Related papers (2025-09-22T08:48:04Z)
Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models [13.855545744177586]
This paper evaluates audio language models on Thai, a low-resource language, and finds that they lack emergent cross-lingual abilities despite their multilingual foundations.<n>Our experiments provide insights into improving instruction-following in low-resource languages by balancing language-specific and multilingual training data.<n>The proposed model, Typhoon-Audio, significantly outperforms existing open-source models and achieves performance comparable to state-of-the-art Gemini-1.5-Pro in both English and Thai.
arXiv Detail & Related papers (2024-09-17T09:04:03Z)
Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon [78.12363425794214]
We focus on zero-shot sentiment analysis tasks across 34 languages, including 6 high/medium-resource languages, 25 low-resource languages, and 3 code-switching datasets. We demonstrate that pretraining using multilingual lexicons, without using any sentence-level sentiment data, achieves superior zero-shot performance compared to models fine-tuned on English sentiment datasets.
arXiv Detail & Related papers (2024-02-03T10:41:05Z)
Learning Cross-lingual Visual Speech Representations [108.68531445641769]
Cross-lingual self-supervised visual representation learning has been a growing research topic in the last few years. We use the recently-proposed Raw Audio-Visual Speechs (RAVEn) framework to pre-train an audio-visual model with unlabelled data. Our experiments show that: (1) multi-lingual models with more data outperform monolingual ones, but, when keeping the amount of data fixed, monolingual models tend to reach better performance.
arXiv Detail & Related papers (2023-03-14T17:05:08Z)
Adapting Multilingual Speech Representation Model for a New, Underresourced Language through Multilingual Fine-tuning and Continued Pretraining [2.3513645401551333]
We investigate the possibility for adapting an existing multilingual wav2vec 2.0 model for a new language. Our results show that continued pretraining is the most effective method to adapt a wav2vec 2.0 model for a new language. We find that if a model pretrained on a related speech variety or an unrelated language with similar phonological characteristics is available, multilingual fine-tuning using additional data from that language can have positive impact on speech recognition performance.
arXiv Detail & Related papers (2023-01-18T03:57:53Z)
Distilling a Pretrained Language Model to a Multilingual ASR Model [3.4012007729454816]
We distill the rich knowledge embedded inside a well-trained teacher text model to the student speech model. We show the superiority of our method on 20 low-resource languages of the CommonVoice dataset with less than 100 hours of speech data.
arXiv Detail & Related papers (2022-06-25T12:36:11Z)
Adaptive Activation Network For Low Resource Multilingual Speech Recognition [30.460501537763736]
We introduce an adaptive activation network to the upper layers of ASR model. We also proposed two approaches to train the model: (1) cross-lingual learning, replacing the activation function from source language to target language, and (2) multilingual learning. Our experiments on IARPA Babel datasets demonstrated that our approaches outperform the from-scratch training and traditional bottleneck feature based methods.
arXiv Detail & Related papers (2022-05-28T04:02:59Z)
Exploring Teacher-Student Learning Approach for Multi-lingual Speech-to-Intent Classification [73.5497360800395]
We develop an end-to-end system that supports multiple languages. We exploit knowledge from a pre-trained multi-lingual natural language processing model.
arXiv Detail & Related papers (2021-09-28T04:43:11Z)
Specializing Multilingual Language Models: An Empirical Study [50.7526245872855]
Contextualized word representations from pretrained multilingual language models have become the de facto standard for addressing natural language tasks. For languages rarely or never seen by these models, directly using such models often results in suboptimal representation or use of data.
arXiv Detail & Related papers (2021-06-16T18:13:55Z)
AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context. It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts. Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.