Data-adaptive Transfer Learning for Translation: A Case Study in Haitian
and Jamaican
- URL: http://arxiv.org/abs/2209.06295v1
- Date: Tue, 13 Sep 2022 20:58:46 GMT
- Title: Data-adaptive Transfer Learning for Translation: A Case Study in Haitian
and Jamaican
- Authors: Nathaniel R. Robinson, Cameron J. Hogan, Nancy Fulda and David R.
Mortensen
- Abstract summary: We show that transfer effectiveness is correlated with amount of training data and relationships between languages.
We contribute a rule-based French-Haitian orthographic and syntactic engine and a novel method for phonological embedding.
In very low-resource Jamaican MT, code-switching with a transfer language for orthographic resemblance yields a 6.63 BLEU point advantage.
- Score: 4.4096464238164295
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Multilingual transfer techniques often improve low-resource machine
translation (MT). Many of these techniques are applied without considering data
characteristics. We show in the context of Haitian-to-English translation that
transfer effectiveness is correlated with amount of training data and
relationships between knowledge-sharing languages. Our experiments suggest that
for some languages beyond a threshold of authentic data, back-translation
augmentation methods are counterproductive, while cross-lingual transfer from a
sufficiently related language is preferred. We complement this finding by
contributing a rule-based French-Haitian orthographic and syntactic engine and
a novel method for phonological embedding. When used with multilingual
techniques, orthographic transformation makes statistically significant
improvements over conventional methods. And in very low-resource Jamaican MT,
code-switching with a transfer language for orthographic resemblance yields a
6.63 BLEU point advantage.
Related papers
- Sharing, Teaching and Aligning: Knowledgeable Transfer Learning for
Cross-Lingual Machine Reading Comprehension [32.37236167127796]
X-STA is a new approach for cross-lingual machine reading comprehension.
We leverage an attentive teacher to subtly transfer the answer spans of the source language to the answer output space of the target.
A Gradient-Disentangled Knowledge Sharing technique is proposed as an improved cross-attention block.
arXiv Detail & Related papers (2023-11-12T07:20:37Z) - Investigating Bias in Multilingual Language Models: Cross-Lingual
Transfer of Debiasing Techniques [3.9673530817103333]
Cross-lingual transfer of debiasing techniques is not only feasible but also yields promising results.
Using translations of the CrowS-Pairs dataset, our analysis identifies SentenceDebias as the best technique across different languages.
arXiv Detail & Related papers (2023-10-16T11:43:30Z) - Self-Augmentation Improves Zero-Shot Cross-Lingual Transfer [92.80671770992572]
Cross-lingual transfer is a central task in multilingual NLP.
Earlier efforts on this task use parallel corpora, bilingual dictionaries, or other annotated alignment data.
We propose a simple yet effective method, SALT, to improve the zero-shot cross-lingual transfer.
arXiv Detail & Related papers (2023-09-19T19:30:56Z) - Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing [68.47787275021567]
Cross-lingual semantic parsing transfers parsing capability from a high-resource language (e.g., English) to low-resource languages with scarce training data.
We propose a new approach to cross-lingual semantic parsing by explicitly minimizing cross-lingual divergence between latent variables using Optimal Transport.
arXiv Detail & Related papers (2023-07-09T04:52:31Z) - Revisiting Machine Translation for Cross-lingual Classification [91.43729067874503]
Most research in the area focuses on the multilingual models rather than the Machine Translation component.
We show that, by using a stronger MT system and mitigating the mismatch between training on original text and running inference on machine translated text, translate-test can do substantially better than previously assumed.
arXiv Detail & Related papers (2023-05-23T16:56:10Z) - Viewing Knowledge Transfer in Multilingual Machine Translation Through a
Representational Lens [15.283483438956264]
We introduce Representational Transfer Potential (RTP), which measures representational similarities between languages.
We show that RTP can measure both positive and negative transfer (interference), and find that RTP is strongly correlated with changes in translation quality.
We develop a novel training scheme, which uses an auxiliary similarity loss that encourages representations to be more invariant across languages.
arXiv Detail & Related papers (2023-05-19T09:36:48Z) - A Simple and Effective Method to Improve Zero-Shot Cross-Lingual
Transfer Learning [6.329304732560936]
Existing zero-shot cross-lingual transfer methods rely on parallel corpora or bilingual dictionaries.
We propose Embedding-Push, Attention-Pull, and Robust targets to transfer English embeddings to virtual multilingual embeddings without semantic loss.
arXiv Detail & Related papers (2022-10-18T15:36:53Z) - High-resource Language-specific Training for Multilingual Neural Machine
Translation [109.31892935605192]
We propose the multilingual translation model with the high-resource language-specific training (HLT-MT) to alleviate the negative interference.
Specifically, we first train the multilingual model only with the high-resource pairs and select the language-specific modules at the top of the decoder.
HLT-MT is further trained on all available corpora to transfer knowledge from high-resource languages to low-resource languages.
arXiv Detail & Related papers (2022-07-11T14:33:13Z) - A Study of Cross-Lingual Ability and Language-specific Information in
Multilingual BERT [60.9051207862378]
multilingual BERT works remarkably well on cross-lingual transfer tasks.
Datasize and context window size are crucial factors to the transferability.
There is a computationally cheap but effective approach to improve the cross-lingual ability of multilingual BERT.
arXiv Detail & Related papers (2020-04-20T11:13:16Z) - Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models.
In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them.
We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.