Related papers: Learning to Perform Complex Tasks through Compositional Fine-Tuning of Language Models

Learning to Perform Complex Tasks through Compositional Fine-Tuning of Language Models

URL: http://arxiv.org/abs/2210.12607v1
Date: Sun, 23 Oct 2022 03:22:34 GMT
Title: Learning to Perform Complex Tasks through Compositional Fine-Tuning of Language Models
Authors: Victor S. Bursztyn, David Demeter, Doug Downey, Larry Birnbaum
Abstract summary: compositional fine-tuning is an approach based on explicitly decomposing a target task into component tasks. We show that CFT outperforms end-to-end learning even with equal amounts of data.
Score: 20.173322408302134
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: How to usefully encode compositional task structure has long been a core challenge in AI. Recent work in chain of thought prompting has shown that for very large neural language models (LMs), explicitly demonstrating the inferential steps involved in a target task may improve performance over end-to-end learning that focuses on the target task alone. However, chain of thought prompting has significant limitations due to its dependency on huge pretrained LMs. In this work, we present compositional fine-tuning (CFT): an approach based on explicitly decomposing a target task into component tasks, and then fine-tuning smaller LMs on a curriculum of such component tasks. We apply CFT to recommendation tasks in two domains, world travel and local dining, as well as a previously studied inferential task (sports understanding). We show that CFT outperforms end-to-end learning even with equal amounts of data, and gets consistently better as more component tasks are modeled via fine-tuning. Compared with chain of thought prompting, CFT performs at least as well using LMs only 7.4% of the size, and is moreover applicable to task domains for which data are not available during pretraining.

Related papers

Learning Composable Chains-of-Thought [57.73731224510169]
We train large language models (LLMs) to reason on chain-of-thought (CoT) traces of in-distribution reasoning problems.<n>We take a step towards compositional generalization of reasoning skills when addressing a target compositional task that has no labeled CoT data.<n>We can train "atomic CoT" models on the atomic tasks with Composable CoT data and combine them with multitask learning or model merging for better zero-shot performance on the target compositional task.
arXiv Detail & Related papers (2025-05-28T17:51:10Z)
Exploiting Task Relationships for Continual Learning Using Transferability-Aware Task Embeddings [8.000144830397911]
Continual learning (CL) has been an essential topic in the contemporary application of deep neural networks. We propose a transferability-aware task embedding named H-embedding and train a hypernet under its guidance to learn task-conditioned model weights for CL tasks.
arXiv Detail & Related papers (2025-02-17T09:52:19Z)
Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning. We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads. We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z)
Task Addition in Multi-Task Learning by Geometrical Alignment [4.220885199861056]
We propose a task addition approach for GATE to improve performance on target tasks with limited data. It is achieved through supervised multi-task pre-training on a large dataset, followed by the addition and training of task-specific modules for each target task. Our experiments demonstrate the superior performance of the task addition strategy for GATE over conventional multi-task methods, with comparable computational costs.
arXiv Detail & Related papers (2024-09-25T05:56:00Z)
Cross-Task Affinity Learning for Multitask Dense Scene Predictions [5.939164722752263]
Multitask learning (MTL) has become prominent for its ability to predict multiple tasks jointly. We introduce the Cross-Task Affinity Learning (CTAL) module, a lightweight framework that enhances task refinement in multitask networks. Our results demonstrate state-of-the-art MTL performance for both CNN and transformer backbones, using significantly fewer parameters than single-task learning.
arXiv Detail & Related papers (2024-01-20T05:31:47Z)
Data-CUBE: Data Curriculum for Instruction-based Sentence Representation Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training. In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk. In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z)
Distribution Matching for Multi-Task Learning of Classification Tasks: a Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space. We show that MTL can be successful with classification tasks with little, or non-overlapping annotations. We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z)
TaskLAMA: Probing the Complex Task Understanding of Language Models [13.336015994186955]
Structured Complex Task Decomposition (SCTD) is a problem of breaking down a complex real-world task into a directed acyclic graph over individual steps that contribute to achieving the task. We probe how accurately SCTD can be done with the knowledge extracted from Large Language Models (LLMs) Our experiments reveal that LLMs are able to decompose complex tasks into individual steps effectively, with a relative improvement of 15% to 280% over the best baseline.
arXiv Detail & Related papers (2023-08-29T13:36:45Z)
Task Residual for Tuning Vision-Language Models [69.22958802711017]
We propose a new efficient tuning approach for vision-language models (VLMs) named Task Residual Tuning (TaskRes) TaskRes explicitly decouples the prior knowledge of the pre-trained models and new knowledge regarding a target task. The proposed TaskRes is simple yet effective, which significantly outperforms previous methods on 11 benchmark datasets.
arXiv Detail & Related papers (2022-11-18T15:09:03Z)
Task Compass: Scaling Multi-task Pre-training with Task Prefix [122.49242976184617]
Existing studies show that multi-task learning with large-scale supervised tasks suffers from negative effects across tasks. We propose a task prefix guided multi-task pre-training framework to explore the relationships among tasks. Our model can not only serve as the strong foundation backbone for a wide range of tasks but also be feasible as a probing tool for analyzing task relationships.
arXiv Detail & Related papers (2022-10-12T15:02:04Z)
On-edge Multi-task Transfer Learning: Model and Practice with Data-driven Task Allocation [20.20889051697198]
We show that task allocation with task importance for Multi-task Transfer Learning (MTL) is a variant of the NP-complete Knapsack problem. We propose a Data-driven Cooperative Task Allocation (DCTA) approach to solve TATIM with high computational efficiency. Our DCTA reduces 3.24 times of processing time, and saves 48.4% energy consumption compared with the state-of-the-art when solving TATIM.
arXiv Detail & Related papers (2021-07-06T08:24:25Z)
Weighted Training for Cross-Task Learning [71.94908559469475]
We introduce Target-Aware Weighted Training (TAWT), a weighted training algorithm for cross-task learning. We show that TAWT is easy to implement, is computationally efficient, requires little hyper parameter tuning, and enjoys non-asymptotic learning-theoretic guarantees. As a byproduct, the proposed representation-based task distance allows one to reason in a theoretically principled way about several critical aspects of cross-task learning.
arXiv Detail & Related papers (2021-05-28T20:27:02Z)
Exploring and Predicting Transferability across NLP Tasks [115.6278033699853]
We study the transferability between 33 NLP tasks across three broad classes of problems. Our results show that transfer learning is more beneficial than previously thought. We also develop task embeddings that can be used to predict the most transferable source tasks for a given target task.
arXiv Detail & Related papers (2020-05-02T09:39:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.