Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
- URL: http://arxiv.org/abs/2402.04833v2
- Date: Tue, 4 Jun 2024 17:20:01 GMT
- Title: Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
- Authors: Hao Zhao, Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion,
- Abstract summary: We show that fine-tuning on the longest responses should be the default baseline for any work on instruction fine-tuning.
We demonstrate this for several LLMs (Llama-2-7B, Llama-2-13B, Mistral-7B-v0.1) and datasets (Alpaca-52k, Evol-Instruct-70k)
- Score: 38.29072578390376
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is a consensus that instruction fine-tuning of LLMs requires high-quality data, but what are they? LIMA (NeurIPS 2023) and AlpaGasus (ICLR 2024) are state-of-the-art methods for selecting such high-quality examples, either via manual curation or using GPT-3.5-Turbo as a quality scorer. We show that the extremely simple baseline of selecting the 1,000 instructions with longest responses -- that intuitively contain more learnable information and are harder to overfit -- from standard datasets can consistently outperform these sophisticated methods according to GPT-4 and PaLM-2 as judges, while remaining competitive on the Open LLM benchmarks that test factual knowledge. We demonstrate this for several LLMs (Llama-2-7B, Llama-2-13B, Mistral-7B-v0.1) and datasets (Alpaca-52k, Evol-Instruct-70k). In addition, a lightweight refinement of such long instructions can further improve the abilities of the fine-tuned LLMs, and allows us to obtain competitive results on MT-Bench and the 2nd highest-ranked Llama-2-7B-based model on AlpacaEval 2.0, while training on only 1,000 examples and no extra preference data. We also conduct a thorough analysis of our models to ensure that their enhanced performance is not simply due to GPT-4's preference for longer responses. Overall, our findings suggest that fine-tuning on the longest responses should be the default baseline for any work on instruction fine-tuning. We provide our code at https://github.com/tml-epfl/long-is-more-for-alignment.
Related papers
- IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization [74.34707794886751]
This paper introduces TRACE, a benchmark for improving and evaluating the complex instructionfollowing ability.
We also propose IOPO, which takes both input and output preference pairs into consideration.
Experiments on both in-domain and out-of-domain datasets confirm the effectiveness of IOPO.
arXiv Detail & Related papers (2024-11-09T15:12:43Z) - RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs [60.38044044203333]
Large language models (LLMs) typically utilize the top-k contexts from a retriever in retrieval-augmented generation (RAG)
We propose a novel instruction fine-tuning framework RankRAG, which instruction-tunes a single LLM for the dual purpose of context ranking and answer generation in RAG.
For generation, we compare our model with many strong baselines, including GPT-4-0613, GPT-4-turbo-2024-0409, and ChatQA-1.5, an open-sourced model with the state-of-the-art performance on RAG benchmarks.
arXiv Detail & Related papers (2024-07-02T17:59:17Z) - Aligning to Thousands of Preferences via System Message Generalization [27.88755105590786]
Current large language model (LLM) alignment methods assume that aligning LLMs with the general public's preferences is optimal.
We propose a new paradigm where users specify what they value most within the system message.
We train a 7B LLM called Janus and test it on 921 prompts by adding various unseen system messages that reflect user preferences.
Janus achieves tie+win rate of 75.2%, 72.4%, and 66.4% against Mistral 7B Instruct v0.2, GPT-3.5 Turbo, and GPT-4, respectively.
arXiv Detail & Related papers (2024-05-28T09:06:18Z) - Ensemble-Instruct: Generating Instruction-Tuning Data with a
Heterogeneous Mixture of LMs [23.38507910115345]
In-context learning (ICL) techniques can train strong conversational agents with only a small amount of human supervision.
Here we explore the application of such techniques to language models that are much smaller (around 10B--40B parameters) and have permissive licenses.
We find the Self-Instruct approach to be less effective at these sizes and propose new ICL methods that draw on two main ideas.
arXiv Detail & Related papers (2023-10-21T10:21:17Z) - Tuna: Instruction Tuning using Feedback from Large Language Models [74.04950416204551]
We propose finetuning an instruction-tuned large language model using our novel textitprobabilistic ranking and textitcontextual ranking approaches.
Probabilistic ranking enables the instruction-tuned model to inherit the relative rankings of high-quality and low-quality responses from the teacher LLM.
On the other hand, learning with contextual ranking allows the model to refine its own response distribution using the contextual understanding ability of stronger LLMs.
arXiv Detail & Related papers (2023-10-20T09:55:06Z) - AlpaGasus: Training A Better Alpaca with Fewer Data [93.6949102689243]
We propose a simple and effective data selection strategy that automatically identifies and filters out low-quality data.
We introduce AlpaGasus, which is finetuned on only 9k high-quality data filtered from the 52k Alpaca data.
AlpaGasus significantly outperforms the original Alpaca on multiple test sets and the controlled human evaluation.
arXiv Detail & Related papers (2023-07-17T17:59:40Z) - Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting [65.00288634420812]
Pairwise Ranking Prompting (PRP) is a technique to significantly reduce the burden on Large Language Models (LLMs)
Our results are the first in the literature to achieve state-of-the-art ranking performance on standard benchmarks using moderate-sized open-sourced LLMs.
arXiv Detail & Related papers (2023-06-30T11:32:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.