VaiBot: Shuttle Between the Instructions and Parameters of Large Language Models
- URL: http://arxiv.org/abs/2502.02315v2
- Date: Thu, 13 Feb 2025 02:33:32 GMT
- Title: VaiBot: Shuttle Between the Instructions and Parameters of Large Language Models
- Authors: Wangtao Sun, Haotian Xu, Huanxuan Liao, Xuanqing Yu, Zhongtao Jiang, Shizhu He, Jun Zhao, Kang Liu,
- Abstract summary: This paper proposes a neural network framework, VaiBot, that integrates VAE and VIB, designed to uniformly model, learn, and infer both deduction and induction tasks.
We show that VaiBot performs on par with existing baseline methods in terms of deductive capabilities while significantly surpassing them in inductive capabilities.
- Score: 22.676819780878198
- License:
- Abstract: How to interact with LLMs through \emph{instructions} has been widely studied by researchers. However, previous studies have treated the emergence of instructions and the training of LLMs on task data as separate processes, overlooking the inherent unity between the two. This paper proposes a neural network framework, VaiBot, that integrates VAE and VIB, designed to uniformly model, learn, and infer both deduction and induction tasks under LLMs. Through experiments, we demonstrate that VaiBot performs on par with existing baseline methods in terms of deductive capabilities while significantly surpassing them in inductive capabilities. We also find that VaiBot can scale up using general instruction-following data and exhibits excellent one-shot induction abilities. We finally synergistically integrate the deductive and inductive processes of VaiBot. Through T-SNE dimensionality reduction, we observe that its inductive-deductive process significantly improves the distribution of training parameters, enabling it to outperform baseline methods in inductive reasoning tasks. The code and data for this paper can be found at https://anonymous.4open.science/r/VaiBot-021F.
Related papers
- Combining Induction and Transduction for Abstract Reasoning [13.399370315305408]
We train neural models for induction (inferring latent functions) and transduction (directly predicting the test output for a given test input) on ARC.
We find inductive and transductive models solve different kinds of test problems, despite having the same training problems and sharing the same neural architecture.
arXiv Detail & Related papers (2024-11-04T17:03:55Z) - Interpretable Language Modeling via Induction-head Ngram Models [74.26720927767398]
We propose Induction-head ngram models (Induction-Gram) to bolster modern ngram models with a hand-engineered "induction head"
This induction head uses a custom neural similarity metric to efficiently search the model's input context for potential next-word completions.
Experiments show that this simple method significantly improves next-word prediction over baseline interpretable models.
arXiv Detail & Related papers (2024-10-31T12:33:26Z) - Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory [66.88278207591294]
We propose Pointer-Augmented Neural Memory (PANM) to help neural networks understand and apply symbol processing to new, longer sequences of data.
PANM integrates an external neural memory that uses novel physical addresses and pointer manipulation techniques to mimic human and computer symbol processing abilities.
arXiv Detail & Related papers (2024-04-18T03:03:46Z) - ItD: Large Language Models Can Teach Themselves Induction through
Deduction [27.75250905887343]
We propose a novel framework, Induction through Deduction (ItD), to enable the LLMs to teach themselves induction through deduction.
ItD is composed of two main components: a Deductive Data Generation module to generate induction data and a Naive Bayesian Induction module to optimize the fine-tuning and decoding of LLMs.
Our empirical results showcase the effectiveness of ItD on two induction benchmarks, achieving relative performance improvement of 36% and 10% compared with previous state-of-the-art, respectively.
arXiv Detail & Related papers (2024-03-09T04:20:46Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - Instilling Inductive Biases with Subnetworks [19.444844580405594]
Subtask Induction instills inductive biases towards solutions utilizing a subtask.
We show that Subtask Induction significantly reduces the amount of training data required for a model to adopt a specific, generalizable solution.
arXiv Detail & Related papers (2023-10-17T00:12:19Z) - Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning [49.92517970237088]
We tackle the problem of training a robot to understand multimodal prompts.
This type of task poses a major challenge to robots' capability to understand the interconnection and complementarity between vision and language signals.
We introduce an effective framework that learns a policy to perform robot manipulation with multimodal prompts.
arXiv Detail & Related papers (2023-10-14T22:24:58Z) - Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners.
We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting.
Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z) - Learning Neuro-Symbolic Skills for Bilevel Planning [63.388694268198655]
Decision-making is challenging in robotics environments with continuous object-centric states, continuous actions, long horizons, and sparse feedback.
Hierarchical approaches, such as task and motion planning (TAMP), address these challenges by decomposing decision-making into two or more levels of abstraction.
Our main contribution is a method for learning parameterized polices in combination with operators and samplers.
arXiv Detail & Related papers (2022-06-21T19:01:19Z) - Continual Learning from Demonstration of Robotics Skills [5.573543601558405]
Methods for teaching motion skills to robots focus on training for a single skill at a time.
We propose an approach for continual learning from demonstration using hypernetworks and neural ordinary differential equation solvers.
arXiv Detail & Related papers (2022-02-14T16:26:52Z) - Steadily Learn to Drive with Virtual Memory [11.67256846037979]
This paper proposes an algorithm called Learn to drive with Virtual Memory (LVM) to overcome these problems.
LVM compresses the high-dimensional information into compact latent states and learns a latent dynamic model to summarize the agent's experience.
The effectiveness of LVM is demonstrated by an image-input autonomous driving task.
arXiv Detail & Related papers (2021-02-16T10:46:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.