On the Role of Parallel Data in Cross-lingual Transfer Learning
- URL: http://arxiv.org/abs/2212.10173v1
- Date: Tue, 20 Dec 2022 11:23:04 GMT
- Title: On the Role of Parallel Data in Cross-lingual Transfer Learning
- Authors: Machel Reid and Mikel Artetxe
- Abstract summary: We examine the usage of unsupervised machine translation to generate synthetic parallel data.
We find that even model generated parallel data can be useful for downstream tasks.
Our findings suggest that existing multilingual models do not exploit the full potential of monolingual data.
- Score: 30.737717433111776
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While prior work has established that the use of parallel data is conducive
for cross-lingual learning, it is unclear if the improvements come from the
data itself, or if it is the modeling of parallel interactions that matters.
Exploring this, we examine the usage of unsupervised machine translation to
generate synthetic parallel data, and compare it to supervised machine
translation and gold parallel data. We find that even model generated parallel
data can be useful for downstream tasks, in both a general setting (continued
pretraining) as well as the task-specific setting (translate-train), although
our best results are still obtained using real parallel data. Our findings
suggest that existing multilingual models do not exploit the full potential of
monolingual data, and prompt the community to reconsider the traditional
categorization of cross-lingual learning approaches.
Related papers
- Enhancing Translation Accuracy of Large Language Models through Continual Pre-Training on Parallel Data [13.587157318352869]
We propose a two-phase training approach where pre-trained large language models are continually pre-trained on parallel data.
We evaluate these methods on thirteen test sets for Japanese-to-English and English-to-Japanese translation.
arXiv Detail & Related papers (2024-07-03T14:23:36Z) - A Morphologically-Aware Dictionary-based Data Augmentation Technique for
Machine Translation of Under-Represented Languages [31.18983138590214]
We propose strategies to synthesize parallel data relying on morpho-syntactic information and using bilingual lexicons.
Our methodology adheres to a realistic scenario backed by the small parallel seed data.
It is linguistically informed, as it aims to create augmented data that is more likely to be grammatically correct.
arXiv Detail & Related papers (2024-02-02T22:25:44Z) - Parallel Data Helps Neural Entity Coreference Resolution [1.0914300987810126]
We propose a model to exploit coreference knowledge from parallel data.
In addition to the conventional modules learning coreference from annotations, we introduce an unsupervised module to capture cross-lingual coreference knowledge.
Our proposed cross-lingual model achieves consistent improvements, up to 1.74 percentage points, on the OntoNotes 5.0 English dataset.
arXiv Detail & Related papers (2023-05-28T12:30:23Z) - Language Agnostic Multilingual Information Retrieval with Contrastive
Learning [59.26316111760971]
We present an effective method to train multilingual information retrieval systems.
We leverage parallel and non-parallel corpora to improve the pretrained multilingual language models.
Our model can work well even with a small number of parallel sentences.
arXiv Detail & Related papers (2022-10-12T23:53:50Z) - Bridging the Data Gap between Training and Inference for Unsupervised
Neural Machine Translation [49.916963624249355]
A UNMT model is trained on the pseudo parallel data with translated source, and natural source sentences in inference.
The source discrepancy between training and inference hinders the translation performance of UNMT models.
We propose an online self-training approach, which simultaneously uses the pseudo parallel data natural source, translated target to mimic the inference scenario.
arXiv Detail & Related papers (2022-03-16T04:50:27Z) - Cross-lingual Intermediate Fine-tuning improves Dialogue State Tracking [84.50302759362698]
We enhance the transfer learning process by intermediate fine-tuning of pretrained multilingual models.
We use parallel and conversational movie subtitles datasets to design cross-lingual intermediate tasks.
We achieve impressive improvements (> 20% on goal accuracy) on the parallel MultiWoZ dataset and Multilingual WoZ dataset.
arXiv Detail & Related papers (2021-09-28T11:22:38Z) - Cross-language Sentence Selection via Data Augmentation and Rationale
Training [22.106577427237635]
It uses data augmentation and negative sampling techniques on noisy parallel sentence data to learn a cross-lingual embedding-based query relevance model.
Results show that this approach performs as well as or better than multiple state-of-the-art machine translation + monolingual retrieval systems trained on the same parallel data.
arXiv Detail & Related papers (2021-06-04T07:08:47Z) - Meta Back-translation [111.87397401837286]
We propose a novel method to generate pseudo-parallel data from a pre-trained back-translation model.
Our method is a meta-learning algorithm which adapts a pre-trained back-translation model so that the pseudo-parallel data it generates would train a forward-translation model to do well on a validation set.
arXiv Detail & Related papers (2021-02-15T20:58:32Z) - Word Alignment by Fine-tuning Embeddings on Parallel Corpora [96.28608163701055]
Word alignment over parallel corpora has a wide variety of applications, including learning translation lexicons, cross-lingual transfer of language processing tools, and automatic evaluation or analysis of translation outputs.
Recently, other work has demonstrated that pre-trained contextualized word embeddings derived from multilingually trained language models (LMs) prove an attractive alternative, achieving competitive results on the word alignment task even in the absence of explicit training on parallel data.
In this paper, we examine methods to marry the two approaches: leveraging pre-trained LMs but fine-tuning them on parallel text with objectives designed to improve alignment quality, and proposing
arXiv Detail & Related papers (2021-01-20T17:54:47Z) - Parallel Training of Deep Networks with Local Updates [84.30918922367442]
Local parallelism is a framework which parallelizes training of individual layers in deep networks by replacing global backpropagation with truncated layer-wise backpropagation.
We show results in both vision and language domains across a diverse set of architectures, and find that local parallelism is particularly effective in the high-compute regime.
arXiv Detail & Related papers (2020-12-07T16:38:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.