Turning Dust into Gold: Distilling Complex Reasoning Capabilities from
LLMs by Leveraging Negative Data
- URL: http://arxiv.org/abs/2312.12832v1
- Date: Wed, 20 Dec 2023 08:28:36 GMT
- Title: Turning Dust into Gold: Distilling Complex Reasoning Capabilities from
LLMs by Leveraging Negative Data
- Authors: Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Bin Sun, Xinglin
Wang, Heda Wang, Kan Li
- Abstract summary: Large Language Models (LLMs) have performed well on various reasoning tasks, but their inaccessibility and numerous parameters hinder wide application in practice.
We propose a model specialization framework to distill LLMs with negative samples besides positive ones.
We conduct extensive experiments across arithmetic reasoning tasks to demonstrate the role of negative data in distillation from LLM.
- Score: 15.088675135566646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have performed well on various reasoning tasks,
but their inaccessibility and numerous parameters hinder wide application in
practice. One promising way is distilling the reasoning ability from LLMs to
small models by the generated chain-of-thought reasoning paths. In some cases,
however, LLMs may produce incorrect reasoning chains, especially when facing
complex mathematical problems. Previous studies only transfer knowledge from
positive samples and drop the synthesized data with wrong answers. In this
work, we illustrate the merit of negative data and propose a model
specialization framework to distill LLMs with negative samples besides positive
ones. The framework consists of three progressive steps, covering from training
to inference stages, to absorb knowledge from negative data. We conduct
extensive experiments across arithmetic reasoning tasks to demonstrate the role
of negative data in distillation from LLM.
Related papers
- RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold [41.28168368547099]
Training on model-generated synthetic data is a promising approach for finetuning LLMs, but it remains unclear when it helps or hurts.
We show that training on per-step negatives can help to unlearn spurious correlations in the positive data.
arXiv Detail & Related papers (2024-06-20T17:45:54Z) - Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks.
LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning.
We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z) - Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models [79.46938238953916]
Fine-tuning large language models (LLMs) to diverse applications is crucial to meet complex demands.
Recent studies suggest decomposing a fine-tuned LLM into a base model and corresponding delta weights, which are then compressed using low-rank or low-bit approaches to reduce costs.
In this work, we observe that existing low-rank and low-bit compression methods can significantly harm the model performance for task-specific fine-tuned LLMs.
arXiv Detail & Related papers (2024-06-13T07:57:27Z) - Purifying Large Language Models by Ensembling a Small Language Model [39.57304668057076]
We propose a simple and easily implementable method for purifying LLMs from the negative effects caused by uncurated data.
We empirically confirm the efficacy of ensembling LLMs with benign and small language models (SLMs)
arXiv Detail & Related papers (2024-02-19T14:00:39Z) - C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations.
Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z) - Task Contamination: Language Models May Not Be Few-Shot Anymore [9.696290050028237]
Large language models (LLMs) offer impressive performance in various zero-shot and few-shot tasks.
However, their success in zero-shot and few-shot settings may be affected by task contamination.
This paper investigates how zero-shot and few-shot performance of LLMs has changed chronologically over time.
arXiv Detail & Related papers (2023-12-26T21:17:46Z) - Zero-Shot Question Answering over Financial Documents using Large
Language Models [0.18749305679160366]
We introduce a large language model (LLM) based approach to answer complex questions requiring multi-hop numerical reasoning over financial reports.
We use novel zero-shot prompts that guide the LLM to encode the required reasoning into a Python program or a domain specific language.
arXiv Detail & Related papers (2023-11-19T16:23:34Z) - Mind's Mirror: Distilling Self-Evaluation Capability and Comprehensive Thinking from Large Language Models [20.28989820878285]
Large language models (LLMs) have achieved remarkable advancements in natural language processing.
The massive scale and computational demands of these models present formidable challenges when considering their practical deployment in resource-constrained environments.
arXiv Detail & Related papers (2023-11-15T18:56:23Z) - ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases.
We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets.
Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv Detail & Related papers (2023-10-19T06:37:32Z) - TRACE: A Comprehensive Benchmark for Continual Learning in Large
Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety.
Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs.
We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z) - Scaling Relationship on Learning Mathematical Reasoning with Large
Language Models [75.29595679428105]
We investigate how the pre-training loss, supervised data amount, and augmented data amount influence the reasoning performances of a supervised LLM.
We find that rejection samples from multiple models push LLaMA-7B to an accuracy of 49.3% on GSM8K which outperforms the supervised fine-tuning (SFT) accuracy of 35.9% significantly.
arXiv Detail & Related papers (2023-08-03T15:34:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.