DoG-Instruct: Towards Premium Instruction-Tuning Data via Text-Grounded Instruction Wrapping
- URL: http://arxiv.org/abs/2309.05447v2
- Date: Sat, 25 May 2024 10:05:52 GMT
- Title: DoG-Instruct: Towards Premium Instruction-Tuning Data via Text-Grounded Instruction Wrapping
- Authors: Yongrui Chen, Haiyun Jiang, Xinting Huang, Shuming Shi, Guilin Qi,
- Abstract summary: This paper proposes a scalable solution to the problem of finding high-quality instruction-response pairs.
It involves training LLMs to generate pairs based on human-written documents, rather than relying solely on self-generation without context.
Our proposed method not only exploits the advantages of human-written documents in reducing hallucinations but also utilizes an LLM to wrap the expression of documents.
- Score: 41.89443082174044
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The improvement of LLMs' instruction-following capabilities relies heavily on the availability of high-quality instruction-response pairs. Unfortunately, the current methods used to collect the pairs suffer from either unaffordable labor costs or severe hallucinations in the self-generation of LLM. To tackle these challenges, this paper proposes a scalable solution. It involves training LLMs to generate instruction-response pairs based on human-written documents, rather than relying solely on self-generation without context. Our proposed method not only exploits the advantages of human-written documents in reducing hallucinations but also utilizes an LLM to wrap the expression of documents, which enables us to bridge the gap between various document styles and the standard AI response. Experiments demonstrate that our method outperforms existing typical methods on multiple benchmarks. In particular, compared to the best-performing baseline, the LLM trained using our generated dataset exhibits a 10\% relative improvement in performance on AlpacaEval, despite utilizing only 1/5 of its training data. Furthermore, a comprehensive manual evaluation validates the quality of the data we generated. Our trained wrapper is publicly available at https://github.com/Bahuia/Dog-Instruct.
Related papers
- Instruction Tuning With Loss Over Instructions [42.9106826952674]
Instruction Modelling (IM) trains LMs by applying a loss function to the instruction and prompt part rather than solely to the output part.
We show that, in many scenarios, IM can effectively improve the LM performance on both NLP tasks and open-ended generation benchmarks.
Remarkably, in the most advantageous case, IM boosts model performance on AlpacaEval 1.0 by over 100%.
arXiv Detail & Related papers (2024-05-23T10:12:03Z) - InstUPR : Instruction-based Unsupervised Passage Reranking with Large Language Models [35.067998820937284]
InstUPR is an unsupervised passage reranking method based on large language models (LLMs)
We introduce a soft score aggregation technique and employ pairwise reranking for unsupervised passage reranking.
Experiments on the BEIR benchmark demonstrate that InstUPR outperforms unsupervised baselines as well as an instruction-tuned reranker.
arXiv Detail & Related papers (2024-03-25T05:31:22Z) - Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs [61.04246774006429]
We introduce a black-box prompt optimization method that uses an attacker LLM agent to uncover higher levels of memorization in a victim agent.
We observe that our instruction-based prompts generate outputs with 23.7% higher overlap with training data compared to the baseline prefix-suffix measurements.
Our findings show that instruction-tuned models can expose pre-training data as much as their base-models, if not more so, and using instructions proposed by other LLMs can open a new avenue of automated attacks.
arXiv Detail & Related papers (2024-03-05T19:32:01Z) - CoachLM: Automatic Instruction Revisions Improve the Data Quality in LLM Instruction Tuning [32.54921739100195]
We propose CoachLM, a novel approach to enhance the quality of instruction datasets through automatic revisions on samples in the dataset.
CoachLM is trained from the samples revised by human experts and significantly increases the proportion of high-quality samples in the dataset from 17.7% to 78.9%.
Results show that CoachLM improves the instruction-following capabilities of the instruction-tuned LLM by an average of 29.9%.
arXiv Detail & Related papers (2023-11-22T09:04:57Z) - Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves [57.974103113675795]
We present a method named Rephrase and Respond' (RaR) which allows Large Language Models to rephrase and expand questions posed by humans.
RaR serves as a simple yet effective prompting method for improving performance.
We show that RaR is complementary to the popular Chain-of-Thought (CoT) methods, both theoretically and empirically.
arXiv Detail & Related papers (2023-11-07T18:43:34Z) - Instruction Distillation Makes Large Language Models Efficient Zero-shot
Rankers [56.12593882838412]
We introduce a novel instruction distillation method to rank documents.
We first rank documents using the effective pairwise approach with complex instructions, and then distill the teacher predictions to the pointwise approach with simpler instructions.
Our approach surpasses the performance of existing supervised methods like monoT5 and is on par with the state-of-the-art zero-shot methods.
arXiv Detail & Related papers (2023-11-02T19:16:21Z) - Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning [79.32236399694077]
Low-quality data in the training set are usually detrimental to instruction tuning.
We propose a novel method, termed "reflection-tuning"
This approach utilizes an oracle LLM to recycle the original training data by introspecting and enhancing the quality of instructions and responses in the data.
arXiv Detail & Related papers (2023-10-18T05:13:47Z) - LeTI: Learning to Generate from Textual Interactions [60.425769582343506]
We explore LMs' potential to learn from textual interactions (LETI) that not only check their correctness with binary labels but also pinpoint and explain errors in their outputs through textual feedback.
Our focus is the code generation task, where the model produces code based on natural language instructions.
LETI iteratively fine-tunes the model, using the objective LM, on a concatenation of natural language instructions, LM-generated programs, and textual feedback.
arXiv Detail & Related papers (2023-05-17T15:53:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.