Related papers: A Simple and Effective Method To Eliminate the Self Language Bias in Multilingual Representations

A Simple and Effective Method To Eliminate the Self Language Bias in Multilingual Representations

URL: http://arxiv.org/abs/2109.04727v1
Date: Fri, 10 Sep 2021 08:15:37 GMT
Title: A Simple and Effective Method To Eliminate the Self Language Bias in Multilingual Representations
Authors: Ziyi Yang, Yinfei Yang, Daniel Cer and Eric Darve
Abstract summary: Language agnostic and semantic-language information isolation is an emerging research direction for multilingual representations models. "Language Information Removal (LIR)" factors out language identity information from semantic related components in multilingual representations pre-trained on multi-monolingual data. LIR reveals that for weak-alignment multilingual systems, the principal components of semantic spaces primarily encodes language identity information.
Score: 7.571549274473274
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Language agnostic and semantic-language information isolation is an emerging research direction for multilingual representations models. We explore this problem from a novel angle of geometric algebra and semantic space. A simple but highly effective method "Language Information Removal (LIR)" factors out language identity information from semantic related components in multilingual representations pre-trained on multi-monolingual data. A post-training and model-agnostic method, LIR only uses simple linear operations, e.g. matrix factorization and orthogonal projection. LIR reveals that for weak-alignment multilingual systems, the principal components of semantic spaces primarily encodes language identity information. We first evaluate the LIR on a cross-lingual question answer retrieval task (LAReQA), which requires the strong alignment for the multilingual embedding space. Experiment shows that LIR is highly effectively on this task, yielding almost 100% relative improvement in MAP for weak-alignment models. We then evaluate the LIR on Amazon Reviews and XEVAL dataset, with the observation that removing language information is able to improve the cross-lingual transfer performance.

Related papers

Less Data Less Tokens: Multilingual Unification Learning for Efficient Test-Time Reasoning in LLMs [13.618284161265123]
This paper explores the challenges of test-time scaling of large language models (LLMs)<n>We highlight the diversity of multi-lingual reasoning based on our pilot studies.<n>We introduce a novel approach, (L2) multi-lingual unification learning.
arXiv Detail & Related papers (2025-06-23T06:47:28Z)
Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models [52.22235443948351]
High-quality multilingual training data is essential for effectively pretraining large language models (LLMs)<n>Here, we introduce JQL, a systematic approach that efficiently curates diverse and high-quality multilingual data at scale.<n>JQL distills LLMs' annotation capabilities into lightweight annotators based on pretrained multilingual embeddings.
arXiv Detail & Related papers (2025-05-28T11:06:54Z)
Zero-shot Cross-lingual Transfer Learning with Multiple Source and Target Languages for Information Extraction: Language Selection and Adversarial Training [38.19963761398705]
This paper provides a detailed analysis on Cross-Lingual Multi-Transferability (many-to-many transfer learning) for the recent IE corpora. We first determine the correlation between single-language performance and a wide range of linguistic-based distances. Next, we investigate the more general zero-shot multi-lingual transfer settings where multiple languages are involved in the training and evaluation processes.
arXiv Detail & Related papers (2024-11-13T17:13:25Z)
Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
Lens is a novel approach to enhance multilingual capabilities of large language models (LLMs) It operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs. It achieves superior results with much fewer computational resources compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z)
Language Representations Can be What Recommenders Need: Findings and Potentials [57.90679739598295]
We show that item representations, when linearly mapped from advanced LM representations, yield superior recommendation performance. This outcome suggests the possible homomorphism between the advanced language representation space and an effective item representation space for recommendation. Our findings highlight the connection between language modeling and behavior modeling, which can inspire both natural language processing and recommender system communities.
arXiv Detail & Related papers (2024-07-07T17:05:24Z)
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora. But can these models relate corresponding concepts across languages, effectively being crosslingual? This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z)
Discovering Low-rank Subspaces for Language-agnostic Multilingual Representations [38.56175462620892]
Large pretrained multilingual language models (ML-LMs) have shown remarkable capabilities of zero-shot cross-lingual transfer. We present a novel view of projecting away language-specific factors from a multilingual embedding space. We show that applying our method consistently leads to improvements over commonly used ML-LMs.
arXiv Detail & Related papers (2024-01-11T09:54:11Z)
GradSim: Gradient-Based Language Grouping for Effective Multilingual Training [13.730907708289331]
We propose GradSim, a language grouping method based on gradient similarity. Our experiments on three diverse multilingual benchmark datasets show that it leads to the largest performance gains. Besides linguistic features, the topics of the datasets play an important role for language grouping.
arXiv Detail & Related papers (2023-10-23T18:13:37Z)
Multilingual Entity and Relation Extraction from Unified to Language-specific Training [29.778332361215636]
Existing approaches for entity and relation extraction tasks mainly focus on the English corpora and ignore other languages. We propose a two-stage multilingual training method and a joint model called Multilingual Entity and Relation Extraction framework (mERE) to mitigate language interference. Our method outperforms both the monolingual and multilingual baseline methods.
arXiv Detail & Related papers (2023-01-11T12:26:53Z)
Language Agnostic Multilingual Information Retrieval with Contrastive Learning [59.26316111760971]
We present an effective method to train multilingual information retrieval systems. We leverage parallel and non-parallel corpora to improve the pretrained multilingual language models. Our model can work well even with a small number of parallel sentences.
arXiv Detail & Related papers (2022-10-12T23:53:50Z)
FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning. During inference, the model makes predictions based on the text input in the target language and its translation in the source language. To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z)
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages. We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.