ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors
- URL: http://arxiv.org/abs/2502.14627v1
- Date: Thu, 20 Feb 2025 15:06:15 GMT
- Title: ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors
- Authors: Yuguo Yin, Yuxin Xie, Wenyuan Yang, Dongchao Yang, Jinghan Ru, Xianwei Zhuang, Liming Liang, Yuexian Zou,
- Abstract summary: Existing Multilingual audio-text retrieval schemes suffer from inconsistencies for instance similarity matching across languages.
We propose a consistent ML-ATR scheme using 1-to-k contrastive learning and audio-English co-anchor contrastive learning.
Our scheme achieves state-of-the-art performance on recall and consistency metrics for eight mainstream languages, including English.
- Score: 42.469131776062724
- License:
- Abstract: Multilingual audio-text retrieval (ML-ATR) is a challenging task that aims to retrieve audio clips or multilingual texts from databases. However, existing ML-ATR schemes suffer from inconsistencies for instance similarity matching across languages. We theoretically analyze the inconsistency in terms of both multilingual modal alignment direction error and weight error, and propose the theoretical weight error upper bound for quantifying the inconsistency. Based on the analysis of the weight error upper bound, we find that the inconsistency problem stems from the data distribution error caused by random sampling of languages. We propose a consistent ML-ATR scheme using 1-to-k contrastive learning and audio-English co-anchor contrastive learning, aiming to mitigate the negative impact of data distribution error on recall and consistency in ML-ATR. Experimental results on the translated AudioCaps and Clotho datasets show that our scheme achieves state-of-the-art performance on recall and consistency metrics for eight mainstream languages, including English. Our code will be available at https://github.com/ATRI-ACL/ATRI-ACL.
Related papers
- Mitigating the Language Mismatch and Repetition Issues in LLM-based Machine Translation via Model Editing [39.375342978538654]
We focus on utilizing Large Language Models (LLMs) to perform machine translation.
We observe that two patterns of errors frequently occur and drastically affect the translation quality: language mismatch and repetition.
We explore the potential for mitigating these two issues by leveraging model editing methods.
arXiv Detail & Related papers (2024-10-09T16:51:21Z) - Full-text Error Correction for Chinese Speech Recognition with Large Language Model [11.287933170894311]
Large Language Models (LLMs) have demonstrated substantial potential for error correction in Automatic Speech Recognition (ASR)
This paper investigates the effectiveness of LLMs for error correction in full-text generated by ASR systems from longer speech recordings.
arXiv Detail & Related papers (2024-09-12T06:50:45Z) - Robustness of LLMs to Perturbations in Text [2.0670689746336]
Large language models (LLMs) have shown impressive performance, but can they handle the inevitable noise in real-world data?
This work tackles this critical question by investigating LLMs' resilience against morphological variations in text.
Our findings show that contrary to popular beliefs, generative LLMs are quiet robust to noisy perturbations in text.
arXiv Detail & Related papers (2024-07-12T04:50:17Z) - Synthetic Data Generation and Joint Learning for Robust Code-Mixed Translation [34.57825234659946]
We tackle the problem of code-mixed (Hinglish and Bengalish) to English machine translation.
We propose RCMT, a robust perturbation based joint-training model that learns to handle noise in the real-world code-mixed text.
Our evaluation and comprehensive analyses demonstrate the superiority of RCMT over state-of-the-art code-mixed and robust translation methods.
arXiv Detail & Related papers (2024-03-25T13:50:11Z) - Machine Translation Meta Evaluation through Translation Accuracy
Challenge Sets [92.38654521870444]
We introduce ACES, a contrastive challenge set spanning 146 language pairs.
This dataset aims to discover whether metrics can identify 68 translation accuracy errors.
We conduct a large-scale study by benchmarking ACES on 50 metrics submitted to the WMT 2022 and 2023 metrics shared tasks.
arXiv Detail & Related papers (2024-01-29T17:17:42Z) - MRN: Multiplexed Routing Network for Incremental Multilingual Text
Recognition [56.408324994409405]
Multiplexed routing network (MRN) trains a recognizer for each language that is currently seen.
MRN effectively reduces the reliance on older data and better fights against catastrophic forgetting.
It outperforms existing general-purpose IL methods by large margins.
arXiv Detail & Related papers (2023-05-24T06:03:34Z) - Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization.
We show how to practically optimize this objective for large translation corpora using an iterated best response scheme.
Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z) - Improving Distinction between ASR Errors and Speech Disfluencies with
Feature Space Interpolation [0.0]
Fine-tuning pretrained language models (LMs) is a popular approach to automatic speech recognition (ASR) error detection during post-processing.
This paper proposes a scheme to improve existing LM-based ASR error detection systems.
arXiv Detail & Related papers (2021-08-04T02:11:37Z) - Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in
Non-Autoregressive Translation [98.11249019844281]
Knowledge distillation (KD) is commonly used to construct synthetic data for training non-autoregressive translation (NAT) models.
We propose reverse KD to rejuvenate more alignments for low-frequency target words.
Results demonstrate that the proposed approach can significantly and universally improve translation quality.
arXiv Detail & Related papers (2021-06-02T02:41:40Z) - Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.