Constrained Density Matching and Modeling for Cross-lingual Alignment of
Contextualized Representations
- URL: http://arxiv.org/abs/2201.13429v1
- Date: Mon, 31 Jan 2022 18:41:28 GMT
- Title: Constrained Density Matching and Modeling for Cross-lingual Alignment of
Contextualized Representations
- Authors: Wei Zhao, Steffen Eger
- Abstract summary: We introduce supervised and unsupervised density-based approaches named Real-NVP and GAN-Real-NVP, driven by Normalizing Flow, to perform alignment.
Our experiments encompass 16 alignments, including our approaches, evaluated across 6 language pairs, synthetic data and 4 NLP tasks.
- Score: 27.74320705109685
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multilingual representations pre-trained with monolingual data exhibit
considerably unequal task performances across languages. Previous studies
address this challenge with resource-intensive contextualized alignment, which
assumes the availability of large parallel data, thereby leaving
under-represented language communities behind. In this work, we attribute the
data hungriness of previous alignment techniques to two limitations: (i) the
inability to sufficiently leverage data and (ii) these techniques are not
trained properly. To address these issues, we introduce supervised and
unsupervised density-based approaches named Real-NVP and GAN-Real-NVP, driven
by Normalizing Flow, to perform alignment, both dissecting the alignment of
multilingual subspaces into density matching and density modeling. We
complement these approaches with our validation criteria in order to guide the
training process. Our experiments encompass 16 alignments, including our
approaches, evaluated across 6 language pairs, synthetic data and 4 NLP tasks.
We demonstrate the effectiveness of our approaches in the scenarios of limited
and no parallel data. First, our supervised approach trained on 20k parallel
data mostly surpasses Joint-Align and InfoXLM trained on much larger parallel
data. Second, parallel data can be removed without sacrificing performance when
integrating our unsupervised approach in our bootstrapping procedure, which is
theoretically motivated to enforce equality of multilingual subspaces.
Moreover, we demonstrate the advantages of validation criteria over validation
data for guiding supervised training. Our code is available at
\url{https://github.com/AIPHES/Real-NVP}.
Related papers
- Enhancing Translation Accuracy of Large Language Models through Continual Pre-Training on Parallel Data [13.587157318352869]
We propose a two-phase training approach where pre-trained large language models are continually pre-trained on parallel data.
We evaluate these methods on thirteen test sets for Japanese-to-English and English-to-Japanese translation.
arXiv Detail & Related papers (2024-07-03T14:23:36Z) - Mitigating Data Imbalance and Representation Degeneration in
Multilingual Machine Translation [103.90963418039473]
Bi-ACL is a framework that uses only target-side monolingual data and a bilingual dictionary to improve the performance of the MNMT model.
We show that Bi-ACL is more effective both in long-tail languages and in high-resource languages.
arXiv Detail & Related papers (2023-05-22T07:31:08Z) - RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training [84.23022072347821]
We propose a regularized cross-lingual visio-textual contrastive learning objective that constrains the representation proximity of weakly-aligned visio-textual inputs.
Experiments on 5 downstream multi-modal tasks across 6 languages demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2023-05-13T14:41:05Z) - On the Role of Parallel Data in Cross-lingual Transfer Learning [30.737717433111776]
We examine the usage of unsupervised machine translation to generate synthetic parallel data.
We find that even model generated parallel data can be useful for downstream tasks.
Our findings suggest that existing multilingual models do not exploit the full potential of monolingual data.
arXiv Detail & Related papers (2022-12-20T11:23:04Z) - Bridging the Gap between Language Models and Cross-Lingual Sequence
Labeling [101.74165219364264]
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks.
Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages.
In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap.
Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
arXiv Detail & Related papers (2022-04-11T15:55:20Z) - Bridging the Data Gap between Training and Inference for Unsupervised
Neural Machine Translation [49.916963624249355]
A UNMT model is trained on the pseudo parallel data with translated source, and natural source sentences in inference.
The source discrepancy between training and inference hinders the translation performance of UNMT models.
We propose an online self-training approach, which simultaneously uses the pseudo parallel data natural source, translated target to mimic the inference scenario.
arXiv Detail & Related papers (2022-03-16T04:50:27Z) - Unsupervised Vision-and-Language Pre-training via Retrieval-based
Multi-Granular Alignment [66.77841319057299]
We propose a novel unsupervised Vision-and-Language pre-training curriculum for non-parallel texts and images.
We first construct a weakly aligned image-text corpus via a retrieval-based approach, then apply a set of multi-granular alignment pre-training tasks.
A comprehensive ablation study shows each granularity is helpful to learn a stronger pre-trained model.
arXiv Detail & Related papers (2022-03-01T05:34:01Z) - Word Alignment by Fine-tuning Embeddings on Parallel Corpora [96.28608163701055]
Word alignment over parallel corpora has a wide variety of applications, including learning translation lexicons, cross-lingual transfer of language processing tools, and automatic evaluation or analysis of translation outputs.
Recently, other work has demonstrated that pre-trained contextualized word embeddings derived from multilingually trained language models (LMs) prove an attractive alternative, achieving competitive results on the word alignment task even in the absence of explicit training on parallel data.
In this paper, we examine methods to marry the two approaches: leveraging pre-trained LMs but fine-tuning them on parallel text with objectives designed to improve alignment quality, and proposing
arXiv Detail & Related papers (2021-01-20T17:54:47Z) - SimAlign: High Quality Word Alignments without Parallel Training Data
using Static and Contextualized Embeddings [3.8424737607413153]
We propose word alignment methods that require no parallel data.
Key idea is to leverage multilingual word embeddings, both static and contextualized, for word alignment.
We find that alignments created from embeddings are superior for two language pairs compared to those produced by traditional statistical methods.
arXiv Detail & Related papers (2020-04-18T23:10:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.