Related papers: Searching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM's Translation Capability

Searching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM's Translation Capability

URL: http://arxiv.org/abs/2305.10266v1
Date: Wed, 17 May 2023 14:58:06 GMT
Title: Searching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM's Translation Capability
Authors: Eleftheria Briakou, Colin Cherry, George Foster
Abstract summary: We investigate the role of incidental bilingualism in large language models. We show that PaLM is exposed to over 30 million translation pairs across at least 44 languages. We show that its presence has a substantial impact on translation capabilities, although this impact diminishes with model scale.
Score: 16.01088313166145
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large, multilingual language models exhibit surprisingly good zero- or few-shot machine translation capabilities, despite having never seen the intentionally-included translation examples provided to typical neural translation systems. We investigate the role of incidental bilingualism -- the unintentional consumption of bilingual signals, including translation examples -- in explaining the translation capabilities of large language models, taking the Pathways Language Model (PaLM) as a case study. We introduce a mixed-method approach to measure and understand incidental bilingualism at scale. We show that PaLM is exposed to over 30 million translation pairs across at least 44 languages. Furthermore, the amount of incidental bilingual content is highly correlated with the amount of monolingual in-language content for non-English languages. We relate incidental bilingual content to zero-shot prompts and show that it can be used to mine new prompts to improve PaLM's out-of-English zero-shot translation quality. Finally, in a series of small-scale ablations, we show that its presence has a substantial impact on translation capabilities, although this impact diminishes with model scale.

Related papers

Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages [55.157295899188476]
neural machine translation systems learn to map sentences of different languages into a common representation space. In this work, we test this hypothesis by zero-shot translating from unseen languages. We demonstrate that this setup enables zero-shot translation from entirely unseen languages.
arXiv Detail & Related papers (2024-08-05T07:58:58Z)
Shortcomings of LLMs for Low-Resource Translation: Retrieval and Understanding are Both the Problem [4.830018386227]
This work investigates the in-context learning abilities of pretrained large language models (LLMs) when instructed to translate text from a low-resource language into a high-resource language as part of an automated machine translation pipeline. We conduct a set of experiments translating Southern Quechua to Spanish and examine the informativity of various types of context retrieved from a constrained database of digitized pedagogical materials and parallel corpora.
arXiv Detail & Related papers (2024-06-21T20:02:22Z)
Could We Have Had Better Multilingual LLMs If English Was Not the Central Language? [4.655168524016426]
Large Language Models (LLMs) demonstrate strong machine translation capabilities on languages they are trained on. Our study delves into Llama2's translation capabilities. Our experiments show that the 7B Llama2 model yields above 10 BLEU when translating into all languages it has seen.
arXiv Detail & Related papers (2024-02-21T16:32:38Z)
SLABERT Talk Pretty One Day: Modeling Second Language Acquisition with BERT [0.0]
Cross-linguistic transfer is the influence of linguistic structure of a speaker's native language on the successful acquisition of a foreign language. We find that NLP literature has not given enough attention to the phenomenon of negative transfer. Our findings call for further research using our novel Transformer-based SLA models.
arXiv Detail & Related papers (2023-05-31T06:22:07Z)
Chain-of-Dictionary Prompting Elicits Translation in Large Language Models [100.47154959254937]
Large language models (LLMs) have shown surprisingly good performance in multilingual neural machine translation (MNMT) We present a novel method, CoD, which augments LLMs with prior knowledge with the chains of multilingual dictionaries for a subset of input words to elicit translation abilities.
arXiv Detail & Related papers (2023-05-11T05:19:47Z)
Hallucinations in Large Multilingual Translation Models [70.10455226752015]
Large-scale multilingual machine translation systems have demonstrated remarkable ability to translate directly between numerous languages. When deployed in the wild, these models may generate hallucinated translations which have the potential to severely undermine user trust and raise safety concerns. Existing research on hallucinations has primarily focused on small bilingual models trained on high-resource languages.
arXiv Detail & Related papers (2023-03-28T16:17:59Z)
Back-translation for Large-Scale Multilingual Machine Translation [2.8747398859585376]
This paper aims to build a single multilingual translation system with a hypothesis that a universal cross-language representation leads to better multilingual translation performance. We extend the exploration of different back-translation methods from bilingual translation to multilingual translation. Surprisingly, the smaller size of vocabularies perform better, and the extensive monolingual English data offers a modest improvement.
arXiv Detail & Related papers (2021-09-17T18:33:15Z)
Cross-lingual Machine Reading Comprehension with Language Branch Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages. We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC) LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language. We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z)
Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations. We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics. We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z)
Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models. In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them. We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.