Better Alignment with Instruction Back-and-Forth Translation
- URL: http://arxiv.org/abs/2408.04614v2
- Date: Tue, 13 Aug 2024 18:00:57 GMT
- Title: Better Alignment with Instruction Back-and-Forth Translation
- Authors: Thao Nguyen, Jeffrey Li, Sewoong Oh, Ludwig Schmidt, Jason Weston, Luke Zettlemoyer, Xian Li,
- Abstract summary: We propose a new method, instruction back-and-forth translation, to construct high-quality synthetic data grounded in world knowledge.
Given documents from a web corpus, we generate and curate synthetic instructions using the backtranslation approach proposed by Li et al.
We find that instruction back-and-forth translation combines the best of both worlds -- making use of the information diversity and quantity found on the web, while ensuring the quality of the responses which is necessary for effective alignment.
- Score: 120.19298407990267
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a new method, instruction back-and-forth translation, to construct high-quality synthetic data grounded in world knowledge for aligning large language models (LLMs). Given documents from a web corpus, we generate and curate synthetic instructions using the backtranslation approach proposed by Li et al.(2023a), and rewrite the responses to improve their quality further based on the initial documents. Fine-tuning with the resulting (backtranslated instruction, rewritten response) pairs yields higher win rates on AlpacaEval than using other common instruction datasets such as Humpback, ShareGPT, Open Orca, Alpaca-GPT4 and Self-instruct. We also demonstrate that rewriting the responses with an LLM outperforms direct distillation, and the two generated text distributions exhibit significant distinction in embedding space. Further analysis shows that our backtranslated instructions are of higher quality than other sources of synthetic instructions, while our responses are more diverse and complex than those obtained from distillation. Overall we find that instruction back-and-forth translation combines the best of both worlds -- making use of the information diversity and quantity found on the web, while ensuring the quality of the responses which is necessary for effective alignment.
Related papers
- Instruction-Tuning Data Synthesis from Scratch via Web Reconstruction [83.0216122783429]
Web Reconstruction (WebR) is a fully automated framework for synthesizing high-quality instruction-tuning (IT) data directly from raw web documents.
We show that datasets generated by WebR outperform state-of-the-art baselines by up to 16.65% across four instruction-following benchmarks.
arXiv Detail & Related papers (2025-04-22T04:07:13Z) - Beyond Sample-Level Feedback: Using Reference-Level Feedback to Guide Data Synthesis [55.65459867300319]
LLMs demonstrate remarkable capabilities in following natural language instructions, largely due to instruction-tuning on high-quality datasets.
Recent approaches incorporate feedback to improve data quality, but typically operate at the sample level, generating and applying feedback for each response individually.
We propose Reference-Level Feedback, a novel methodology that instead collects feedback based on high-quality reference samples from carefully curated seed data.
arXiv Detail & Related papers (2025-02-06T21:29:00Z) - Salient Information Prompting to Steer Content in Prompt-based Abstractive Summarization [4.9201947803787744]
Large language models (LLMs) can generate fluent summaries across domains using prompting techniques.
We show that adding keyphrases in prompts can improve ROUGE F1 and recall.
We introduce Keyphrase Signal Extractor (SigExt), a lightweight model that can be finetuned to extract salient keyphrases.
arXiv Detail & Related papers (2024-10-03T17:54:56Z) - TG-LLaVA: Text Guided LLaVA via Learnable Latent Embeddings [61.9257731511557]
We propose Text Guided LLaVA (TG-LLaVA) to optimize vision-language models (VLMs)
We use learnable latent embeddings as a bridge to analyze textual instruction and add the analysis results to the vision encoder as guidance.
With the guidance of text, the vision encoder can extract text-related features, similar to how humans focus on the most relevant parts of an image when considering a question.
arXiv Detail & Related papers (2024-09-15T00:38:34Z) - REInstruct: Building Instruction Data from Unlabeled Corpus [49.82314244648043]
We propose REInstruct, a method to automatically build instruction data from an unlabeled corpus.
By training Llama-7b on a combination of 3k seed data and 32k synthetic data from REInstruct, fine-tuned model achieves a 65.41% win rate on AlpacaEval leaderboard against text-davinci-003.
arXiv Detail & Related papers (2024-08-20T09:05:03Z) - Advancing Translation Preference Modeling with RLHF: A Step Towards
Cost-Effective Solution [57.42593422091653]
We explore leveraging reinforcement learning with human feedback to improve translation quality.
A reward model with strong language capabilities can more sensitively learn the subtle differences in translation quality.
arXiv Detail & Related papers (2024-02-18T09:51:49Z) - Tuna: Instruction Tuning using Feedback from Large Language Models [74.04950416204551]
We propose finetuning an instruction-tuned large language model using our novel textitprobabilistic ranking and textitcontextual ranking approaches.
Probabilistic ranking enables the instruction-tuned model to inherit the relative rankings of high-quality and low-quality responses from the teacher LLM.
On the other hand, learning with contextual ranking allows the model to refine its own response distribution using the contextual understanding ability of stronger LLMs.
arXiv Detail & Related papers (2023-10-20T09:55:06Z) - Self-Alignment with Instruction Backtranslation [162.02529653768096]
We present a method to build a high quality instruction following language model by automatically labelling human-written text with corresponding instructions.
Our approach, named instruction backtranslation, starts with a language model finetuned on a small amount of seed data, and a given web corpus.
arXiv Detail & Related papers (2023-08-11T17:47:54Z) - WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive
Summarization [41.578594261746055]
We introduce WikiLingua, a large-scale, multilingual dataset for the evaluation of crosslingual abstractive summarization systems.
We extract article and summary pairs in 18 languages from WikiHow, a high quality, collaborative resource of how-to guides on a diverse set of topics written by human authors.
We create gold-standard article-summary alignments across languages by aligning the images that are used to describe each how-to step in an article.
arXiv Detail & Related papers (2020-10-07T00:28:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.