Prefix Text as a Yarn: Eliciting Non-English Alignment in Foundation Language Model
- URL: http://arxiv.org/abs/2404.16766v1
- Date: Thu, 25 Apr 2024 17:19:36 GMT
- Title: Prefix Text as a Yarn: Eliciting Non-English Alignment in Foundation Language Model
- Authors: Runzhe Zhan, Xinyi Yang, Derek F. Wong, Lidia S. Chao, Yue Zhang,
- Abstract summary: supervised fine-tuning (SFT) has been a straightforward approach for tailoring the output of foundation large language model (LLM) to specific preferences.
We critically examine this hypothesis within the scope of cross-lingual generation tasks.
We introduce a novel training-free alignment method named PreTTY, which employs minimal task-related prior tokens.
- Score: 50.339632513018934
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While supervised fine-tuning (SFT) has been a straightforward approach for tailoring the output of foundation large language model (LLM) to specific preferences, concerns have been raised about the depth of this alignment, with some critiques suggesting it is merely "superficial". We critically examine this hypothesis within the scope of cross-lingual generation tasks, proposing that the effectiveness of SFT may be constrained by its reliance on prior tokens to guide cross-lingual generation. Based on this crucial insight, and in response to the challenges posed by the costly and limited availability of non-English data for SFT, we introduce a novel training-free alignment method named PreTTY, which employs minimal task-related prior tokens to bridge the foundation LLM and the SFT LLM, achieving comparable performance without training. Experiments on machine translation and part-of-speech tagging across eight languages demonstrate the efficacy of PreTTY in cross-lingual settings. Remarkably, by initiating the decoding process with only one or two prior tokens, foundation LLMs can achieve performance comparable to their SFT counterparts. This method presents a cost-effective alternative to SFT and advances the democratization of multilingual LLMs.
Related papers
- Refining Translations with LLMs: A Constraint-Aware Iterative Prompting Approach [7.5069214839655345]
Large language models (LLMs) have demonstrated remarkable proficiency in machine translation (MT)
We propose a multi-step prompt chain that enhances translation faithfulness by prioritizing key terms crucial for semantic accuracy.
Experiments using Llama and Qwen as base models on the FLORES-200 and WMT datasets demonstrate significant improvements over baselines.
arXiv Detail & Related papers (2024-11-13T05:40:24Z) - A Simple yet Effective Training-free Prompt-free Approach to Chinese Spelling Correction Based on Large Language Models [39.35525969831397]
This work proposes a simple training-free prompt-free approach to leverage large language models (LLMs) for the Chinese spelling correction (CSC) task.
Experiments on five public datasets demonstrate that our approach significantly improves LLM performance.
arXiv Detail & Related papers (2024-10-05T04:06:56Z) - TasTe: Teaching Large Language Models to Translate through Self-Reflection [82.83958470745381]
Large language models (LLMs) have exhibited remarkable performance in various natural language processing tasks.
We propose the TasTe framework, which stands for translating through self-reflection.
The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.
arXiv Detail & Related papers (2024-06-12T17:21:21Z) - A Preference-driven Paradigm for Enhanced Translation with Large Language Models [33.51585908894444]
Large language models (LLMs) can achieve remarkable translation performance using only a small amount of parallel data.
SFT simply instructs the model to imitate the reference translations at the token level, making it vulnerable to the noise present in the references.
We propose a preference-based approach built upon the Plackett-Luce model to overcome this plateau.
arXiv Detail & Related papers (2024-04-17T11:52:47Z) - Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning [57.323716555996114]
Off-target translation remains an unsolved problem, especially for low-resource languages.
Recent works have either designed advanced prompting strategies to highlight the functionality of translation instructions or exploited the in-context learning ability of LLMs.
In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs.
arXiv Detail & Related papers (2024-03-21T13:47:40Z) - A Novel Paradigm Boosting Translation Capabilities of Large Language Models [11.537249547487045]
The paper proposes a novel paradigm consisting of three stages: Secondary Pre-training using Extensive Monolingual Data, Continual Pre-training with Interlinear Text Format Documents, and Leveraging Source-Language Consistent Instruction for Supervised Fine-Tuning.
Experimental results conducted using the Llama2 model, particularly on Chinese-Llama2, demonstrate the improved translation capabilities of LLMs.
arXiv Detail & Related papers (2024-03-18T02:53:49Z) - Analyzing and Adapting Large Language Models for Few-Shot Multilingual
NLU: Are We There Yet? [82.02076369811402]
Supervised fine-tuning (SFT), supervised instruction tuning (SIT) and in-context learning (ICL) are three alternative, de facto standard approaches to few-shot learning.
We present an extensive and systematic comparison of the three approaches, testing them on 6 high- and low-resource languages, three different NLU tasks, and a myriad of language and domain setups.
Our observations show that supervised instruction tuning has the best trade-off between performance and resource requirements.
arXiv Detail & Related papers (2024-03-04T10:48:13Z) - Bridging the Gap between Language Models and Cross-Lingual Sequence
Labeling [101.74165219364264]
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks.
Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages.
In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap.
Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
arXiv Detail & Related papers (2022-04-11T15:55:20Z) - SML: a new Semantic Embedding Alignment Transformer for efficient
cross-lingual Natural Language Inference [71.57324258813674]
The ability of Transformers to perform with precision a variety of tasks such as question answering, Natural Language Inference (NLI) or summarising, have enable them to be ranked as one of the best paradigms to address this kind of tasks at present.
NLI is one of the best scenarios to test these architectures, due to the knowledge required to understand complex sentences and established a relation between a hypothesis and a premise.
In this paper, we propose a new architecture, siamese multilingual transformer, to efficiently align multilingual embeddings for Natural Language Inference.
arXiv Detail & Related papers (2021-03-17T13:23:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.