In-context Learning as Maintaining Coherency: A Study of On-the-fly
Machine Translation Using Large Language Models
- URL: http://arxiv.org/abs/2305.03573v1
- Date: Fri, 5 May 2023 14:30:20 GMT
- Title: In-context Learning as Maintaining Coherency: A Study of On-the-fly
Machine Translation Using Large Language Models
- Authors: Suzanna Sia, Kevin Duh
- Abstract summary: We present a perspective of in-context learning as the desired generation task maintaining coherency with its context.
We first investigate randomly sampled prompts across 4 domains, and find that translation performance improves when shown in-domain prompts.
In doing so, we demonstrate the efficacy of In-context Machine Translation for on-the-fly adaptation.
- Score: 15.309754694595322
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The phenomena of in-context learning has typically been thought of as
"learning from examples". In this work which focuses on Machine Translation, we
present a perspective of in-context learning as the desired generation task
maintaining coherency with its context, i.e., the prompt examples. We first
investigate randomly sampled prompts across 4 domains, and find that
translation performance improves when shown in-domain prompts. Next, we
investigate coherency for the in-domain setting, which uses prompt examples
from a moving window. We study this with respect to other factors that have
previously been identified in the literature such as length, surface similarity
and sentence embedding similarity. Our results across 3 models (GPTNeo2.7B,
Bloom3B, XGLM2.9B), and three translation directions
(\texttt{en}$\rightarrow$\{\texttt{pt, de, fr}\}) suggest that the long-term
coherency of the prompts and the test sentence is a good indicator of
downstream translation performance. In doing so, we demonstrate the efficacy of
In-context Machine Translation for on-the-fly adaptation.
Related papers
- Predicting Word Similarity in Context with Referential Translation Machines [0.0]
We identify the similarity between two words in English by casting the task as machine translation performance prediction (MTPP)
We use referential translation machines (RTMs) which allows a common representation of training and test sets.
RTMs can achieve the top results in Graded Word Similarity in Context (GWSC) task.
arXiv Detail & Related papers (2024-07-07T09:36:41Z) - Where does In-context Translation Happen in Large Language Models [18.379840329713407]
We characterize the region where large language models transition from in-text learners to translation models.
We demonstrate evidence of a "task recognition" point where the translation task is encoded into the input representations and attention to context is no longer necessary.
arXiv Detail & Related papers (2024-03-07T14:12:41Z) - Vocabulary-Defined Semantics: Latent Space Clustering for Improving In-Context Learning [32.178931149612644]
In-context learning enables language models to adapt to downstream data or incorporate tasks by few samples as demonstrations within the prompts.
However, the performance of in-context learning can be unstable depending on the quality, format, or order of demonstrations.
We propose a novel approach "vocabulary-defined semantics"
arXiv Detail & Related papers (2024-01-29T14:29:48Z) - In-Context Probing: Toward Building Robust Classifiers via Probing Large
Language Models [5.5089506884366735]
In this paper, we propose an alternative approach, which we term In-Context Probing (ICP)
Similar to in-context learning, we contextualize the representation of the input with an instruction, but instead of decoding the output prediction, we probe the contextualized representation to predict the label.
We show that ICP performs competitive or superior to finetuning and can be particularly helpful to build classifiers on top of smaller models.
arXiv Detail & Related papers (2023-05-23T15:43:04Z) - Fairness-guided Few-shot Prompting for Large Language Models [93.05624064699965]
In-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats.
We introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes.
We propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning.
arXiv Detail & Related papers (2023-03-23T12:28:25Z) - Prompting Large Language Model for Machine Translation: A Case Study [87.88120385000666]
We offer a systematic study on prompting strategies for machine translation.
We examine factors for prompt template and demonstration example selection.
We explore the use of monolingual data and the feasibility of cross-lingual, cross-domain, and sentence-to-document transfer learning.
arXiv Detail & Related papers (2023-01-17T18:32:06Z) - In-context Examples Selection for Machine Translation [101.50473468507697]
Large-scale generative models show an impressive ability to perform a wide range of Natural Language Processing (NLP) tasks using in-context learning.
For Machine Translation (MT), these examples are typically randomly sampled from the development dataset with a similar distribution as the evaluation set.
We show that the translation quality and the domain of the in-context examples matter and that 1-shot noisy unrelated example can have a catastrophic impact on output quality.
arXiv Detail & Related papers (2022-12-05T17:25:15Z) - An Explanation of In-context Learning as Implicit Bayesian Inference [117.19809377740188]
We study the role of the pretraining distribution on the emergence of in-context learning.
We prove that in-context learning occurs implicitly via Bayesian inference of the latent concept.
We empirically find that scaling model size improves in-context accuracy even when the pretraining loss is the same.
arXiv Detail & Related papers (2021-11-03T09:12:33Z) - Measuring and Increasing Context Usage in Context-Aware Machine
Translation [64.5726087590283]
We introduce a new metric, conditional cross-mutual information, to quantify the usage of context by machine translation models.
We then introduce a new, simple training method, context-aware word dropout, to increase the usage of context by context-aware models.
arXiv Detail & Related papers (2021-05-07T19:55:35Z) - CASE: Context-Aware Semantic Expansion [68.30244980290742]
This paper defines and studies a new task called Context-Aware Semantic Expansion (CASE)
Given a seed term in a sentential context, we aim to suggest other terms that well fit the context as the seed.
We show that annotations for this task can be harvested at scale from existing corpora, in a fully automatic manner.
arXiv Detail & Related papers (2019-12-31T06:38:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.