I Learn Better If You Speak My Language: Understanding the Superior Performance of Fine-Tuning Large Language Models with LLM-Generated Responses
- URL: http://arxiv.org/abs/2402.11192v2
- Date: Sat, 1 Jun 2024 03:36:23 GMT
- Title: I Learn Better If You Speak My Language: Understanding the Superior Performance of Fine-Tuning Large Language Models with LLM-Generated Responses
- Authors: Xuan Ren, Biao Wu, Lingqiao Liu,
- Abstract summary: fine-tuning a large language model (LLM) with responses generated by a LLM often yields better results than using responses generated by humans.
Training with LLM-generated responses not only enhances performance but also helps maintain the model's capabilities in other tasks after fine-tuning on a specific task.
- Score: 23.053791342294268
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper explores an intriguing observation: fine-tuning a large language model (LLM) with responses generated by a LLM often yields better results than using responses generated by humans. We conduct an in-depth investigation to understand why this occurs. Contrary to the common belief that these instances is simply due to the more detailed nature of LLM-generated content, our study identifies another contributing factor: an LLM is inherently more "familiar" with LLM generated responses. This familiarity is evidenced by lower perplexity before fine-tuning. We design a series of experiments to understand the impact of the "familiarity" and our conclusion reveals that this "familiarity" significantly impacts learning performance. Training with LLM-generated responses not only enhances performance but also helps maintain the model's capabilities in other tasks after fine-tuning on a specific task.
Related papers
- Investigating Context-Faithfulness in Large Language Models: The Roles of Memory Strength and Evidence Style [13.968658352075334]
We investigate the impact of memory strength and evidence presentation on Large Language Models' receptiveness to external evidence.
For questions with high memory strength, LLMs are more likely to rely on internal memory, particularly for larger LLMs such as GPT-4.
arXiv Detail & Related papers (2024-09-17T07:44:06Z) - Order Matters in Hallucination: Reasoning Order as Benchmark and Reflexive Prompting for Large-Language-Models [0.0]
Large language models (LLMs) have generated significant attention since their inception, finding applications across various academic and industrial domains.
LLMs often suffer from the "hallucination problem", where outputs, though grammatically and logically coherent, lack factual accuracy or are entirely fabricated.
arXiv Detail & Related papers (2024-08-09T14:34:32Z) - From Pre-training Corpora to Large Language Models: What Factors Influence LLM Performance in Causal Discovery Tasks? [51.42906577386907]
This study explores the factors influencing the performance of Large Language Models (LLMs) in causal discovery tasks.
A higher frequency of causal mentions correlates with better model performance, suggesting that extensive exposure to causal information during training enhances the models' causal discovery capabilities.
arXiv Detail & Related papers (2024-07-29T01:45:05Z) - The Strong Pull of Prior Knowledge in Large Language Models and Its Impact on Emotion Recognition [74.04775677110179]
In-context Learning (ICL) has emerged as a powerful paradigm for performing natural language tasks with Large Language Models (LLM)
We show that LLMs have strong yet inconsistent priors in emotion recognition that ossify their predictions.
Our results suggest that caution is needed when using ICL with larger LLMs for affect-centered tasks outside their pre-training domain.
arXiv Detail & Related papers (2024-03-25T19:07:32Z) - Meaningful Learning: Advancing Abstract Reasoning in Large Language Models via Generic Fact Guidance [38.49506722997423]
Large language models (LLMs) have developed impressive performance and strong explainability across various reasoning scenarios.
Despite this, when tasked with simple questions supported by a generic fact, LLMs often fail to provide consistent and precise answers.
This has sparked a vigorous debate about whether LLMs are genuinely reasoning or merely memorizing.
arXiv Detail & Related papers (2024-03-14T04:06:13Z) - Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs [60.40396361115776]
This paper introduces a novel collaborative approach, namely SlimPLM, that detects missing knowledge in large language models (LLMs) with a slim proxy model.
We employ a proxy model which has far fewer parameters, and take its answers as answers.
Heuristic answers are then utilized to predict the knowledge required to answer the user question, as well as the known and unknown knowledge within the LLM.
arXiv Detail & Related papers (2024-02-19T11:11:08Z) - TinyLLM: Learning a Small Student from Multiple Large Language Models [23.736611338497244]
TinyLLM is a new knowledge distillation paradigm to learn a small student LLM from multiple large teacher LLMs.
We introduce an in-context example generator and a teacher-forcing Chain-of-Thought strategy to ensure that the rationales are accurate and grounded in contextually appropriate scenarios.
arXiv Detail & Related papers (2024-02-07T06:48:24Z) - Learning to Generate Explainable Stock Predictions using Self-Reflective
Large Language Models [54.21695754082441]
We propose a framework to teach Large Language Models (LLMs) to generate explainable stock predictions.
A reflective agent learns how to explain past stock movements through self-reasoning, while the PPO trainer trains the model to generate the most likely explanations.
Our framework can outperform both traditional deep-learning and LLM methods in prediction accuracy and Matthews correlation coefficient.
arXiv Detail & Related papers (2024-02-06T03:18:58Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - MenatQA: A New Dataset for Testing the Temporal Comprehension and
Reasoning Abilities of Large Language Models [17.322480769274062]
Large language models (LLMs) have shown nearly saturated performance on many natural language processing (NLP) tasks.
This paper constructs Multiple Sensitive Factors Time QA (MenatQA) with total 2,853 samples for evaluating the time comprehension and reasoning abilities of LLMs.
arXiv Detail & Related papers (2023-10-08T13:19:52Z) - Check Your Facts and Try Again: Improving Large Language Models with
External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks.
This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.