Quantifying the Gaps Between Translation and Native Perception in Training for Multimodal, Multilingual Retrieval
- URL: http://arxiv.org/abs/2410.02027v2
- Date: Tue, 8 Oct 2024 15:22:53 GMT
- Title: Quantifying the Gaps Between Translation and Native Perception in Training for Multimodal, Multilingual Retrieval
- Authors: Kyle Buettner, Adriana Kovashka,
- Abstract summary: We empirically show performance gaps between training on captions that come from native German perception and captions that have been either machine-translated or human-translated from English into German.
While we achieve mean recall improvements (+1.3), gaps still remain, indicating an open area of future work for the community.
- Score: 28.589035749529955
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is a scarcity of multilingual vision-language models that properly account for the perceptual differences that are reflected in image captions across languages and cultures. In this work, through a multimodal, multilingual retrieval case study, we quantify the existing lack of model flexibility. We empirically show performance gaps between training on captions that come from native German perception and captions that have been either machine-translated or human-translated from English into German. To address these gaps, we further propose and evaluate caption augmentation strategies. While we achieve mean recall improvements (+1.3), gaps still remain, indicating an open area of future work for the community.
Related papers
- A Multimodal Recaptioning Framework to Account for Perceptual Diversity in Multilingual Vision-Language Modeling [25.43735315887918]
Machine translation of captions has pushed multilingual capabilities in vision-language models (VLMs)
Data comes mainly from English speakers, indicating a perceptual bias and lack of model flexibility.
We propose an LLM-based, multimodal recaptioning strategy that alters the object descriptions of English captions before translation.
arXiv Detail & Related papers (2025-04-19T17:23:12Z) - Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization [9.349707150988893]
We propose a continuous multilingual integration strategy that injects text-only multilingual data during visual instruction tuning.
Our approach significantly improves linguistic fidelity across languages without degradation in visual performance.
arXiv Detail & Related papers (2025-03-28T16:26:52Z) - Mitigating the Linguistic Gap with Phonemic Representations for Robust Cross-lingual Transfer [26.014079273740485]
Approaches to improving multilingual language understanding often struggle with significant performance gaps between high-resource and low-resource languages.
We present experiments on three representative cross-lingual tasks on 12 languages in total.
Phonemic representations exhibit higher similarities between languages compared to orthographic representations.
arXiv Detail & Related papers (2024-02-22T04:41:52Z) - Improving In-context Learning of Multilingual Generative Language Models with Cross-lingual Alignment [42.624862172666624]
We propose a simple yet effective cross-lingual alignment framework exploiting pairs of translation sentences.
It aligns the internal sentence representations across different languages via multilingual contrastive learning.
Experimental results show that even with less than 0.1 textperthousand of pre-training tokens, our alignment framework significantly boosts the cross-lingual abilities of generative language models.
arXiv Detail & Related papers (2023-11-14T11:24:08Z) - Exploring Anisotropy and Outliers in Multilingual Language Models for
Cross-Lingual Semantic Sentence Similarity [64.18762301574954]
Previous work has shown that the representations output by contextual language models are more anisotropic than static type embeddings.
This seems to be true for both monolingual and multilingual models, although much less work has been done on the multilingual context.
We investigate outlier dimensions and their relationship to anisotropy in multiple pre-trained multilingual language models.
arXiv Detail & Related papers (2023-06-01T09:01:48Z) - Language Model Tokenizers Introduce Unfairness Between Languages [98.92630681729518]
We show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked.
Character-level and byte-level models also exhibit over 4 times the difference in the encoding length for some language pairs.
We make the case that we should train future language models using multilingually fair subword tokenizers.
arXiv Detail & Related papers (2023-05-17T14:17:57Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - UC2: Universal Cross-lingual Cross-modal Vision-and-Language
Pre-training [52.852163987208826]
UC2 is the first machine translation-augmented framework for cross-lingual cross-modal representation learning.
We propose two novel pre-training tasks, namely Masked Region-to-Token Modeling (MRTM) and Visual Translation Language Modeling (VTLM)
Our proposed framework achieves new state-of-the-art on diverse non-English benchmarks while maintaining comparable performance to monolingual pre-trained models on English tasks.
arXiv Detail & Related papers (2021-04-01T08:30:53Z) - Are Multilingual Models Effective in Code-Switching? [57.78477547424949]
We study the effectiveness of multilingual language models to understand their capability and adaptability to the mixed-language setting.
Our findings suggest that pre-trained multilingual models do not necessarily guarantee high-quality representations on code-switching.
arXiv Detail & Related papers (2021-03-24T16:20:02Z) - First Align, then Predict: Understanding the Cross-Lingual Ability of
Multilingual BERT [2.2931318723689276]
Cross-lingual transfer emerges from fine-tuning on a task of interest in one language and evaluating on a distinct language, not seen during the fine-tuning.
We show that multilingual BERT can be viewed as the stacking of two sub-networks: a multilingual encoder followed by a task-specific language-agnostic predictor.
While the encoder is crucial for cross-lingual transfer and remains mostly unchanged during fine-tuning, the task predictor has little importance on the transfer and can be red during fine-tuning.
arXiv Detail & Related papers (2021-01-26T22:12:38Z) - Cross-lingual Visual Pre-training for Multimodal Machine Translation [36.4592103797139]
We combine cross-lingual and visual pre-training methods to learn cross-lingual representations.
We show that when fine-tuned for multimodal machine translation, these models obtain state-of-the-art performance.
arXiv Detail & Related papers (2021-01-25T12:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.