Related papers: Steering Large Language Models for Machine Translation with Finetuning and In-Context Learning

Steering Large Language Models for Machine Translation with Finetuning and In-Context Learning

URL: http://arxiv.org/abs/2310.13448v1
Date: Fri, 20 Oct 2023 12:29:51 GMT
Title: Steering Large Language Models for Machine Translation with Finetuning and In-Context Learning
Authors: Duarte M. Alves, Nuno M. Guerreiro, Jo\~ao Alves, Jos\'e Pombal, Ricardo Rei, Jos\'e G. C. de Souza, Pierre Colombo and Andr\'e F. T. Martins
Abstract summary: Large language models (LLMs) are a promising avenue for machine translation (MT) Their effectiveness highly depends on the choice of few-shot examples and they often require extra post-processing due to overgeneration. We show that adapter-based finetuning with LoRA matches the performance of traditional finetuning while reducing the number of training parameters by a factor of 50.
Score: 19.290966101497844
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) are a promising avenue for machine translation (MT). However, current LLM-based MT systems are brittle: their effectiveness highly depends on the choice of few-shot examples and they often require extra post-processing due to overgeneration. Alternatives such as finetuning on translation instructions are computationally expensive and may weaken in-context learning capabilities, due to overspecialization. In this paper, we provide a closer look at this problem. We start by showing that adapter-based finetuning with LoRA matches the performance of traditional finetuning while reducing the number of training parameters by a factor of 50. This method also outperforms few-shot prompting and eliminates the need for post-processing or in-context examples. However, we show that finetuning generally degrades few-shot performance, hindering adaptation capabilities. Finally, to obtain the best of both worlds, we propose a simple approach that incorporates few-shot examples during finetuning. Experiments on 10 language pairs show that our proposed approach recovers the original few-shot capabilities while keeping the added benefits of finetuning.

Related papers

Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods [69.36397993451742]
This work introduces Context-aware Prompt Tuning (CPT), a method inspired by ICL, PT, and adversarial attacks. We modify specific context tokens, considering the unique structure of input and output formats. Inspired by adversarial attacks, we adjust the input based on the labels present in the context, focusing on minimizing, rather than maximizing, the loss.
arXiv Detail & Related papers (2024-10-22T17:45:47Z)
TasTe: Teaching Large Language Models to Translate through Self-Reflection [82.83958470745381]
Large language models (LLMs) have exhibited remarkable performance in various natural language processing tasks. We propose the TasTe framework, which stands for translating through self-reflection. The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.
arXiv Detail & Related papers (2024-06-12T17:21:21Z)
Guardrail Baselines for Unlearning in LLMs [33.86316928349476]
Finetuning is a promising approach to 'unlearn' concepts from large language models. We show that guardrail-based approaches such as prompting and filtering can achieve unlearning results comparable to finetuning.
arXiv Detail & Related papers (2024-03-05T21:19:06Z)
Contextual Refinement of Translations: Large Language Models for Sentence and Document-Level Post-Editing [12.843274390224853]
Large Language Models (LLM's) have demonstrated considerable success in various Natural Language Processing tasks. We show that they have yet to attain state-of-the-art performance in Neural Machine Translation. We propose adapting LLM's as Automatic Post-Editors (APE) rather than direct translators.
arXiv Detail & Related papers (2023-10-23T12:22:15Z)
Towards Effective Disambiguation for Machine Translation with Large Language Models [65.80775710657672]
We study the capabilities of large language models to translate "ambiguous sentences" Experiments show that our methods can match or outperform state-of-the-art systems such as DeepL and NLLB in four out of five language directions.
arXiv Detail & Related papers (2023-09-20T22:22:52Z)
TIM: Teaching Large Language Models to Translate with Comparison [78.66926087162672]
We propose a novel framework using examples in comparison to teach LLMs to learn translation. Our approach involves presenting the model with examples of correct and incorrect translations and using a preference loss to guide the model's learning. Our findings offer a new perspective on fine-tuning LLMs for translation tasks and provide a promising solution for generating high-quality translations.
arXiv Detail & Related papers (2023-07-10T08:15:40Z)
Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models [48.0311578882384]
finetuning language models (LMs) with training examples and task descriptions has been seen as critical to recent successes in few-shot learning. We show that finetuning LMs in the few-shot setting can considerably reduce the need for prompt engineering.
arXiv Detail & Related papers (2021-06-24T23:38:10Z)
Selecting Informative Contexts Improves Language Model Finetuning [66.26521454263343]
We present a general fine-tuning method that we call information gain filtration. During fine-tuning, a secondary learner selects informative examples and skips uninformative ones. We show that our method has consistent improvement across datasets, fine-tuning tasks, and language model architectures.
arXiv Detail & Related papers (2020-05-01T02:01:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.