Related papers: From Symbolic Tasks to Code Generation: Diversification Yields Better Task Performers

From Symbolic Tasks to Code Generation: Diversification Yields Better Task Performers

URL: http://arxiv.org/abs/2405.19787v2
Date: Fri, 31 May 2024 01:23:41 GMT
Title: From Symbolic Tasks to Code Generation: Diversification Yields Better Task Performers
Authors: Dylan Zhang, Justin Wang, Francois Charton,
Abstract summary: We show that a more diverse instruction set, extending beyond code-related tasks, improves the performance of code generation. Our observations suggest that a more diverse semantic space for instruction-tuning sets greatly improves the model's ability to follow instructions and perform tasks.
Score: 1.6958018695660049
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Instruction tuning -- tuning large language models on instruction-output pairs -- is a promising technique for making models better adapted to the real world. Yet, the key factors driving the model's capability to understand and follow instructions not seen during training remain under-explored. Our investigation begins with a series of synthetic experiments within the theoretical framework of a Turing-complete algorithm called Markov algorithm, which allows fine-grained control over the instruction-tuning data. Generalization and robustness with respect to the training distribution emerge once a diverse enough set of tasks is provided, even though very few examples are provided for each task. We extend these initial results to a real-world application scenario of code generation and find that a more diverse instruction set, extending beyond code-related tasks, improves the performance of code generation. Our observations suggest that a more diverse semantic space for instruction-tuning sets greatly improves the model's ability to follow instructions and perform tasks.

Related papers

HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model [37.85614317331844]
Instruction tuning is widely used to improve a pre-trained Multimodal Large Language Model (MLLM) It is infeasible to collect all possible instruction datasets simultaneously in real-world scenarios. We propose a task-specific expansion and task-general fusion framework based on the variations in Centered Kernel Alignment (CKA) similarity.
arXiv Detail & Related papers (2025-03-17T08:56:03Z)
Fine-tuning Large Language Models with Sequential Instructions [2.546845645875049]
We find that existing instruction-tuned models struggle to respond to queries with multiple instructions. We contend that part of the fine-tuning data mixture should be sequential--containing a chain of interrelated tasks. We automate this process by turning instructions in existing datasets into diverse and complex sequential instructions. Models that underwent our sequential instruction tuning show improved results in coding, maths, and open-ended generation.
arXiv Detail & Related papers (2024-03-12T16:33:30Z)
Chain-of-Instructions: Compositional Instruction Tuning on Large Language Models [15.444719480373001]
We propose a novel concept of compositional instructions called chain-of-instructions (CoI) Unlike the conventional practice of solving single instruction tasks, our proposed method encourages a model to solve each subtask step by step until the final answer is reached. CoI-tuning improves the model's ability to handle instructions composed of multiple subtasks as well as unseen composite tasks such as multilingual summarization.
arXiv Detail & Related papers (2024-02-18T10:10:40Z)
Instruction Diversity Drives Generalization To Unseen Tasks [1.9059113568275998]
Generalization emerges once a diverse enough set of tasks is provided, even though very few examples are provided for each task. Generalization emerges once a diverse enough set of tasks is provided, even though very few examples are provided for each task.
arXiv Detail & Related papers (2024-02-16T18:47:21Z)
Code Representation Learning At Scale [75.04686476303436]
We fuel code representation learning with a vast amount of code data via a two-stage pretraining scheme. We first train the encoders via a mix that leverages both randomness in masking language modeling and the structure aspect of programming language. We then enhance the representations via contrastive learning with hard negative and hard positive constructed in an unsupervised manner.
arXiv Detail & Related papers (2024-02-02T22:19:15Z)
Instruction Position Matters in Sequence Generation with Large Language Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization. We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z)
On Conditional and Compositional Language Model Differentiable Prompting [75.76546041094436]
Prompts have been shown to be an effective method to adapt a frozen Pretrained Language Model (PLM) to perform well on downstream tasks. We propose a new model, Prompt Production System (PRopS), which learns to transform task instructions or input metadata, into continuous prompts.
arXiv Detail & Related papers (2023-07-04T02:47:42Z)
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning [24.741736629886564]
Instruction tuning is a new learning paradigm that fine-tunes pre-trained language models on tasks specified through instructions. We introduce MUL-TIINSTRUCT, the first multimodal instruction tuning benchmark dataset. We show strong zero-shot performance on various unseen multimodal tasks and the benefit of transfer learning from a text-only instruction dataset.
arXiv Detail & Related papers (2022-12-21T05:17:06Z)
Enhancing Semantic Code Search with Multimodal Contrastive Learning and Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search. We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z)
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks [73.63892022944198]
We present a generic perception architecture named Uni-Perceiver. It processes a variety of modalities and tasks with unified modeling and shared parameters. Results show that our pre-trained model without any tuning can achieve reasonable performance even on novel tasks.
arXiv Detail & Related papers (2021-12-02T18:59:50Z)
Few-shot Sequence Learning with Transformers [79.87875859408955]
Few-shot algorithms aim at learning new tasks provided only a handful of training examples. In this work we investigate few-shot learning in the setting where the data points are sequences of tokens. We propose an efficient learning algorithm based on Transformers.
arXiv Detail & Related papers (2020-12-17T12:30:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.