Related papers: Prefix Text as a Yarn: Eliciting Non-English Alignment in Foundation Language Model

Prefix Text as a Yarn: Eliciting Non-English Alignment in Foundation Language Model

URL: http://arxiv.org/abs/2404.16766v1
Date: Thu, 25 Apr 2024 17:19:36 GMT
Title: Prefix Text as a Yarn: Eliciting Non-English Alignment in Foundation Language Model
Authors: Runzhe Zhan, Xinyi Yang, Derek F. Wong, Lidia S. Chao, Yue Zhang,
Abstract summary: supervised fine-tuning (SFT) has been a straightforward approach for tailoring the output of foundation large language model (LLM) to specific preferences. We critically examine this hypothesis within the scope of cross-lingual generation tasks. We introduce a novel training-free alignment method named PreTTY, which employs minimal task-related prior tokens.
Score: 50.339632513018934
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While supervised fine-tuning (SFT) has been a straightforward approach for tailoring the output of foundation large language model (LLM) to specific preferences, concerns have been raised about the depth of this alignment, with some critiques suggesting it is merely "superficial". We critically examine this hypothesis within the scope of cross-lingual generation tasks, proposing that the effectiveness of SFT may be constrained by its reliance on prior tokens to guide cross-lingual generation. Based on this crucial insight, and in response to the challenges posed by the costly and limited availability of non-English data for SFT, we introduce a novel training-free alignment method named PreTTY, which employs minimal task-related prior tokens to bridge the foundation LLM and the SFT LLM, achieving comparable performance without training. Experiments on machine translation and part-of-speech tagging across eight languages demonstrate the efficacy of PreTTY in cross-lingual settings. Remarkably, by initiating the decoding process with only one or two prior tokens, foundation LLMs can achieve performance comparable to their SFT counterparts. This method presents a cost-effective alternative to SFT and advances the democratization of multilingual LLMs.

Related papers

Enhancing Large Language Models'Machine Translation via Dynamic Focus Anchoring [22.297388572921477]
Large language models have demonstrated exceptional performance across multiple crosslingual NLP tasks, including machine translation (MT)<n> persistent challenges remain in addressing context-sensitive units (CSUs), such as polysemous words.<n>We propose a simple but effective method to enhance LLMs' MT capabilities by acquiring CSUs and applying semantic focus.
arXiv Detail & Related papers (2025-05-29T06:29:57Z)
DeFTX: Denoised Sparse Fine-Tuning for Zero-Shot Cross-Lingual Transfer [26.0360791797671]
We introduce DeFT-X, a novel composable SFT approach that denoises the weight matrices of a pretrained model before magnitude pruning.<n>We evaluate DeFT-X on a diverse set of extremely low-resource languages for sentiment classification (NusaX) and natural language inference (AmericasNLI)
arXiv Detail & Related papers (2025-05-21T04:20:30Z)
HYPEROFA: Expanding LLM Vocabulary to New Languages via Hypernetwork-Based Embedding Initialization [50.27950279695363]
Many pre-trained language models (PLMs) exhibit suboptimal performance on mid- and low-resource languages.<n>A common strategy to address this is to introduce new tokens specific to the target languages, initialize their embeddings, and apply continual pre-training on target-language data.<n>We propose HYPEROFA, a hypernetwork-based approach for more adaptive token embedding.
arXiv Detail & Related papers (2025-04-21T19:40:32Z)
Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models [12.500777267361102]
We introduce a novel textbfpreference-textbforiented supervised textbffine-textbftuning approach, namely PoFT. The intuition is to boost SFT by imposing a particular preference: textitfavoring the target model over aligned LLMs on the same SFT data. PoFT achieves stable and consistent improvements over the SFT baselines across different training datasets and base models.
arXiv Detail & Related papers (2024-12-17T12:49:14Z)
Token Prepending: A Training-Free Approach for Eliciting Better Sentence Embeddings from LLMs [10.213016513358598]
Token Prepending (TP) technique prepends each layer's decoded sentence embedding to the beginning of the sentence in the next layer's input. TP technique is a plug-and-play and training-free technique, which means it can be seamlessly integrated with prompt-based sentence embedding methods.
arXiv Detail & Related papers (2024-12-16T08:42:00Z)
Refining Translations with LLMs: A Constraint-Aware Iterative Prompting Approach [7.5069214839655345]
Large language models (LLMs) have demonstrated remarkable proficiency in machine translation (MT) We propose a multi-step prompt chain that enhances translation faithfulness by prioritizing key terms crucial for semantic accuracy. Experiments using Llama and Qwen as base models on the FLORES-200 and WMT datasets demonstrate significant improvements over baselines.
arXiv Detail & Related papers (2024-11-13T05:40:24Z)
A Simple yet Effective Training-free Prompt-free Approach to Chinese Spelling Correction Based on Large Language Models [39.35525969831397]
This work proposes a simple training-free prompt-free approach to leverage large language models (LLMs) for the Chinese spelling correction (CSC) task. Experiments on five public datasets demonstrate that our approach significantly improves LLM performance.
arXiv Detail & Related papers (2024-10-05T04:06:56Z)
TasTe: Teaching Large Language Models to Translate through Self-Reflection [82.83958470745381]
Large language models (LLMs) have exhibited remarkable performance in various natural language processing tasks. We propose the TasTe framework, which stands for translating through self-reflection. The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.
arXiv Detail & Related papers (2024-06-12T17:21:21Z)
A Preference-driven Paradigm for Enhanced Translation with Large Language Models [33.51585908894444]
Large language models (LLMs) can achieve remarkable translation performance using only a small amount of parallel data. SFT simply instructs the model to imitate the reference translations at the token level, making it vulnerable to the noise present in the references. We propose a preference-based approach built upon the Plackett-Luce model to overcome this plateau.
arXiv Detail & Related papers (2024-04-17T11:52:47Z)
Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning [57.323716555996114]
Off-target translation remains an unsolved problem, especially for low-resource languages. Recent works have either designed advanced prompting strategies to highlight the functionality of translation instructions or exploited the in-context learning ability of LLMs. In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs.
arXiv Detail & Related papers (2024-03-21T13:47:40Z)
A Novel Paradigm Boosting Translation Capabilities of Large Language Models [11.537249547487045]
The paper proposes a novel paradigm consisting of three stages: Secondary Pre-training using Extensive Monolingual Data, Continual Pre-training with Interlinear Text Format Documents, and Leveraging Source-Language Consistent Instruction for Supervised Fine-Tuning. Experimental results conducted using the Llama2 model, particularly on Chinese-Llama2, demonstrate the improved translation capabilities of LLMs.
arXiv Detail & Related papers (2024-03-18T02:53:49Z)
Analyzing and Adapting Large Language Models for Few-Shot Multilingual NLU: Are We There Yet? [82.02076369811402]
Supervised fine-tuning (SFT), supervised instruction tuning (SIT) and in-context learning (ICL) are three alternative, de facto standard approaches to few-shot learning. We present an extensive and systematic comparison of the three approaches, testing them on 6 high- and low-resource languages, three different NLU tasks, and a myriad of language and domain setups. Our observations show that supervised instruction tuning has the best trade-off between performance and resource requirements.
arXiv Detail & Related papers (2024-03-04T10:48:13Z)
Instruction Tuning for Large Language Models: A Survey [52.86322823501338]
We make a systematic review of the literature, including the general methodology of supervised fine-tuning (SFT) We also review the potential pitfalls of SFT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies.
arXiv Detail & Related papers (2023-08-21T15:35:16Z)
Bridging the Gap between Language Models and Cross-Lingual Sequence Labeling [101.74165219364264]
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks. Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages. In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap. Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
arXiv Detail & Related papers (2022-04-11T15:55:20Z)
SML: a new Semantic Embedding Alignment Transformer for efficient cross-lingual Natural Language Inference [71.57324258813674]
The ability of Transformers to perform with precision a variety of tasks such as question answering, Natural Language Inference (NLI) or summarising, have enable them to be ranked as one of the best paradigms to address this kind of tasks at present. NLI is one of the best scenarios to test these architectures, due to the knowledge required to understand complex sentences and established a relation between a hypothesis and a premise. In this paper, we propose a new architecture, siamese multilingual transformer, to efficiently align multilingual embeddings for Natural Language Inference.
arXiv Detail & Related papers (2021-03-17T13:23:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.