ItD: Large Language Models Can Teach Themselves Induction through
Deduction
- URL: http://arxiv.org/abs/2403.05789v1
- Date: Sat, 9 Mar 2024 04:20:46 GMT
- Title: ItD: Large Language Models Can Teach Themselves Induction through
Deduction
- Authors: Wangtao Sun, Haotian Xu, Xuanqing Yu, Pei Chen, Shizhu He, Jun Zhao,
Kang Liu
- Abstract summary: We propose a novel framework, Induction through Deduction (ItD), to enable the LLMs to teach themselves induction through deduction.
ItD is composed of two main components: a Deductive Data Generation module to generate induction data and a Naive Bayesian Induction module to optimize the fine-tuning and decoding of LLMs.
Our empirical results showcase the effectiveness of ItD on two induction benchmarks, achieving relative performance improvement of 36% and 10% compared with previous state-of-the-art, respectively.
- Score: 27.75250905887343
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although Large Language Models (LLMs) are showing impressive performance on a
wide range of Natural Language Processing tasks, researchers have found that
they still have limited ability to conduct induction. Recent works mainly adopt
``post processes'' paradigms to improve the performance of LLMs on induction
(e.g., the hypothesis search & refinement methods), but their performance is
still constrained by the inherent inductive capability of the LLMs. In this
paper, we propose a novel framework, Induction through Deduction (ItD), to
enable the LLMs to teach themselves induction through deduction. The ItD
framework is composed of two main components: a Deductive Data Generation
module to generate induction data and a Naive Bayesian Induction module to
optimize the fine-tuning and decoding of LLMs. Our empirical results showcase
the effectiveness of ItD on two induction benchmarks, achieving relative
performance improvement of 36% and 10% compared with previous state-of-the-art,
respectively. Our ablation study verifies the effectiveness of two key modules
of ItD. We also verify the effectiveness of ItD across different LLMs and
deductors. The data and code of this paper can be found at
https://anonymous.4open.science/r/ItD-E844.
Related papers
- Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search [57.28671084993782]
Large language models (LLMs) have demonstrated remarkable reasoning capabilities across diverse domains.
Recent studies have shown that increasing test-time computation enhances LLMs' reasoning capabilities.
We propose a two-stage training paradigm: 1) a small-scale format tuning stage to internalize the COAT reasoning format and 2) a large-scale self-improvement stage leveraging reinforcement learning.
arXiv Detail & Related papers (2025-02-04T17:26:58Z) - VaiBot: Shuttle Between the Instructions and Parameters of Large Language Models [22.676819780878198]
This paper proposes a neural network framework, VaiBot, that integrates VAE and VIB, designed to uniformly model, learn, and infer both deduction and induction tasks.
We show that VaiBot performs on par with existing baseline methods in terms of deductive capabilities while significantly surpassing them in inductive capabilities.
arXiv Detail & Related papers (2025-02-04T13:36:54Z) - Interpretable Language Modeling via Induction-head Ngram Models [74.26720927767398]
We propose Induction-head ngram models (Induction-Gram) to bolster modern ngram models with a hand-engineered "induction head"
This induction head uses a custom neural similarity metric to efficiently search the model's input context for potential next-word completions.
Experiments show that this simple method significantly improves next-word prediction over baseline interpretable models.
arXiv Detail & Related papers (2024-10-31T12:33:26Z) - MIRAGE: Evaluating and Explaining Inductive Reasoning Process in Language Models [19.81485079689837]
We evaluate large language models' capabilities in inductive and deductive stages.
We find that the models tend to consistently conduct correct deduction without correct inductive rules.
In the inductive reasoning process, the model tends to focus on observed facts that are close to the current test example in feature space.
arXiv Detail & Related papers (2024-10-12T14:12:36Z) - Rethinking Semantic Parsing for Large Language Models: Enhancing LLM Performance with Semantic Hints [20.844061807562436]
We propose SENSE, a novel prompting approach that embeds semantic hints within the prompt.
Experiments show that SENSE consistently improves LLMs' performance across various tasks.
arXiv Detail & Related papers (2024-09-22T14:35:09Z) - Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs [99.76347807139615]
Reasoning encompasses two typical types: deductive reasoning and inductive reasoning.
Despite extensive research into the reasoning capabilities of Large Language Models (LLMs), most studies have failed to rigorously differentiate between inductive and deductive reasoning.
This raises an essential question: In LLM reasoning, which poses a greater challenge - deductive or inductive reasoning?
arXiv Detail & Related papers (2024-07-31T18:47:11Z) - TasTe: Teaching Large Language Models to Translate through Self-Reflection [82.83958470745381]
Large language models (LLMs) have exhibited remarkable performance in various natural language processing tasks.
We propose the TasTe framework, which stands for translating through self-reflection.
The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.
arXiv Detail & Related papers (2024-06-12T17:21:21Z) - Accelerating LLaMA Inference by Enabling Intermediate Layer Decoding via
Instruction Tuning with LITE [62.13435256279566]
Large Language Models (LLMs) have achieved remarkable performance across a wide variety of natural language tasks.
However, their large size makes their inference slow and computationally expensive.
We show that it enables these layers to acquire 'good' generation ability without affecting the generation ability of the final layer.
arXiv Detail & Related papers (2023-10-28T04:07:58Z) - Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models [56.34029644009297]
Large language models (LLMs) have demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems.
LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning.
We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model.
arXiv Detail & Related papers (2023-10-02T01:00:50Z) - Deduction under Perturbed Evidence: Probing Student Simulation
Capabilities of Large Language Models [27.943334687742244]
We show that even the most advanced GPT models struggle to reason on manipulated facts.
Our findings have practical implications for understanding the performance of LLMs in real-world applications.
arXiv Detail & Related papers (2023-05-23T20:26:03Z) - A Cohesive Distillation Architecture for Neural Language Models [0.0]
A recent trend in Natural Language Processing is the exponential growth in Language Model (LM) size.
This study investigates methods for Knowledge Distillation (KD) to provide efficient alternatives to large-scale models.
arXiv Detail & Related papers (2023-01-12T08:01:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.