Re-Tuning: Overcoming the Compositionality Limits of Large Language Models with Recursive Tuning
- URL: http://arxiv.org/abs/2407.04787v1
- Date: Fri, 5 Jul 2024 18:02:28 GMT
- Title: Re-Tuning: Overcoming the Compositionality Limits of Large Language Models with Recursive Tuning
- Authors: Eric Pasewark, Kyle Montgomery, Kefei Duan, Dawn Song, Chenguang Wang,
- Abstract summary: We present a new method for large language models to solve compositional tasks.
Our method, Re-Tuning, tunes models to break down a problem into subproblems, solve those subproblems, and combine the results.
- Score: 44.910762928636565
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a new method for large language models to solve compositional tasks. Although they have shown strong performance on traditional language understanding tasks, large language models struggle to solve compositional tasks, where the solution depends on solving smaller instances of the same problem. We propose a natural approach to solve compositional tasks recursively. Our method, Re-Tuning, tunes models to break down a problem into subproblems, solve those subproblems, and combine the results. We show that our method significantly improves model performance on three representative compositional tasks: integer addition, dynamic programming, and parity. Compared to state-of-the-art methods that keep intermediate steps towards solving the problems, Re-Tuning achieves significantly higher accuracy and is more GPU memory efficient.
Related papers
- Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [74.02034188307857]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data.
We find existing methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance.
Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z) - Algorithmic Phase Transitions in Language Models: A Mechanistic Case Study of Arithmetic [0.0]
Large language models can zero-shot some tasks but not others.
Algorithm instability may be a contributing factor to language models' poor zero-shot performance.
arXiv Detail & Related papers (2024-12-10T10:32:01Z) - Skeleton: A New Framework for Accelerating Language Models via Task Neuron Localized Prompt Tuning [15.695487920048816]
We propose a novel prompt tuning framework called Skeleton to efficiently utilize a language model in terms of memory and time complexity.
Our method significantly enhances inference efficiency (at most x 1.73 speed up) for various widely used benchmarks.
arXiv Detail & Related papers (2024-04-18T05:43:50Z) - Distilling Algorithmic Reasoning from LLMs via Explaining Solution Programs [2.3020018305241337]
Distilling explicit chain-of-thought reasoning paths has emerged as an effective method for improving the reasoning abilities of large language models.
We propose a novel approach to distill reasoning abilities from LLMs by leveraging their capacity to explain solutions.
Our experiments demonstrate that learning from explanations enables the Reasoner to more effectively guide program implementation by a Coder.
arXiv Detail & Related papers (2024-04-11T22:19:50Z) - NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks [3.9052860539161918]
We contribute Neuralr, a novel recurrent solver that can learn algorithms from smaller problems (in terms of observation size) and execute those algorithms in large problems.
It can be naturally applied in both same-size problems, where the input and output sizes are the same, and in different-size problems, where the size of the input and output differ.
We show that Neuralr consistently outperforms the prior stateof-the-art recurrent solvers in extrapolating to larger problems and requiring less parameters than other approaches.
arXiv Detail & Related papers (2024-02-23T15:51:45Z) - Concrete Subspace Learning based Interference Elimination for Multi-task
Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks.
We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z) - Improving Large Language Model Fine-tuning for Solving Math Problems [20.417053742869403]
A large gap exists between large language models' pass-at-one and pass-at-N performance in solving math problems.
Using the challenging MATH dataset, we investigate three fine-tuning strategies.
We design a fine-tuning recipe that yields approximately 58.8% accuracy on the MATH dataset with fine-tuned PaLM 2-L models.
arXiv Detail & Related papers (2023-10-16T04:11:19Z) - JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for
Multi-task Mathematical Problem Solving [77.51817534090789]
We propose textbfJiuZhang2.0, a unified Chinese PLM specially for multi-task mathematical problem solving.
Our idea is to maintain a moderate-sized model and employ the emphcross-task knowledge sharing to improve the model capacity in a multi-task setting.
arXiv Detail & Related papers (2023-06-19T15:45:36Z) - Recursion of Thought: A Divide-and-Conquer Approach to Multi-Context
Reasoning with Language Models [58.41943058963672]
We propose a new inference framework called Recursion of Thought (RoT)
RoT introduces several special tokens that the models can output to trigger context-related operations.
Experiments with multiple architectures including GPT-3 show that RoT dramatically improves LMs' inference capability to solve problems.
arXiv Detail & Related papers (2023-06-12T06:34:16Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.