MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning
- URL: http://arxiv.org/abs/2311.02303v1
- Date: Sat, 4 Nov 2023 02:22:40 GMT
- Title: MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning
- Authors: Bingchang Liu, Chaoyu Chen, Cong Liao, Zi Gong, Huan Wang, Zhichao
Lei, Ming Liang, Dajun Chen, Min Shen, Hailian Zhou, Hang Yu, Jianguo Li
- Abstract summary: We present a multi-task fine-tuning framework, MFTcoder, that enables simultaneous and parallel fine-tuning on multiple tasks.
Experiments have conclusively demonstrated that our multi-task fine-tuning approach outperforms both individual fine-tuning on single tasks and fine-tuning on a mixed ensemble of tasks.
- Score: 28.12788291168137
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Code LLMs have emerged as a specialized research field, with remarkable
studies dedicated to enhancing model's coding capabilities through fine-tuning
on pre-trained models. Previous fine-tuning approaches were typically tailored
to specific downstream tasks or scenarios, which meant separate fine-tuning for
each task, requiring extensive training resources and posing challenges in
terms of deployment and maintenance. Furthermore, these approaches failed to
leverage the inherent interconnectedness among different code-related tasks. To
overcome these limitations, we present a multi-task fine-tuning framework,
MFTcoder, that enables simultaneous and parallel fine-tuning on multiple tasks.
By incorporating various loss functions, we effectively address common
challenges in multi-task learning, such as data imbalance, varying difficulty
levels, and inconsistent convergence speeds. Extensive experiments have
conclusively demonstrated that our multi-task fine-tuning approach outperforms
both individual fine-tuning on single tasks and fine-tuning on a mixed ensemble
of tasks. Moreover, MFTcoder offers efficient training capabilities, including
efficient data tokenization modes and PEFT fine-tuning, resulting in
significantly improved speed compared to traditional fine-tuning methods.
MFTcoder seamlessly integrates with several mainstream open-source LLMs, such
as CodeLLama and Qwen. Leveraging the CodeLLama foundation, our MFTcoder
fine-tuned model, \textsc{CodeFuse-CodeLLama-34B}, achieves an impressive
pass@1 score of 74.4\% on the HumaneEval benchmark, surpassing GPT-4
performance (67\%, zero-shot). MFTCoder is open-sourced at
\url{https://github.com/codefuse-ai/MFTCOder}
Related papers
- Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning [59.001091197106085]
Multi-Task Learning (MTL) for Vision Transformer aims at enhancing the model capability by tackling multiple tasks simultaneously.
Most recent works have predominantly focused on designing Mixture-of-Experts (MoE) structures and in tegrating Low-Rank Adaptation (LoRA) to efficiently perform multi-task learning.
We propose a novel approach dubbed Efficient Multi-Task Learning (EMTAL) by transforming a pre-trained Vision Transformer into an efficient multi-task learner.
arXiv Detail & Related papers (2025-01-12T17:41:23Z) - R-MTLLMF: Resilient Multi-Task Large Language Model Fusion at the Wireless Edge [78.26352952957909]
Multi-task large language models (MTLLMs) are important for many applications at the wireless edge, where users demand specialized models to handle multiple tasks efficiently.
The concept of model fusion via task vectors has emerged as an efficient approach for combining fine-tuning parameters to produce an MTLLM.
In this paper, the problem of enabling edge users to collaboratively craft such MTLMs via tasks vectors is studied, under the assumption of worst-case adversarial attacks.
arXiv Detail & Related papers (2024-11-27T10:57:06Z) - AdapMTL: Adaptive Pruning Framework for Multitask Learning Model [5.643658120200373]
AdapMTL is an adaptive pruning framework for multitask models.
It balances sparsity allocation and accuracy performance across multiple tasks.
It showcases superior performance compared to state-of-the-art pruning methods.
arXiv Detail & Related papers (2024-08-07T17:19:15Z) - Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment [0.0]
"Harmonized Transfer Learning and Modality alignment (HarMA)" is a method that simultaneously satisfies task constraints, modality alignment, and single-modality uniform alignment.
HarMA achieves state-of-the-art performance in two popular multimodal retrieval tasks in the field of remote sensing.
arXiv Detail & Related papers (2024-04-28T17:20:08Z) - MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning [1.4396109429521227]
Adapting models pre-trained on large-scale datasets to a variety of downstream tasks is a common strategy in deep learning.
parameter-efficient fine-tuning methods have emerged as a promising way to adapt pre-trained models to different tasks while training only a minimal number of parameters.
We introduce MTLoRA, a novel framework for parameter-efficient training of Multi-Task Learning models.
arXiv Detail & Related papers (2024-03-29T17:43:58Z) - Fair Resource Allocation in Multi-Task Learning [12.776767874217663]
Multi-task learning (MTL) can leverage the shared knowledge across tasks, resulting in improved data efficiency and generalization performance.
A major challenge in MTL lies in the presence of conflicting gradients, which can hinder the fair optimization of some tasks.
Inspired by fair resource allocation in communication networks, we propose FairGrad, a novel MTL optimization method.
arXiv Detail & Related papers (2024-02-23T22:46:14Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task
Learning with Model-Accelerator Co-design [95.41238363769892]
Multi-task learning (MTL) encapsulates multiple learned tasks in a single model and often lets those tasks learn better jointly.
Current MTL regimes have to activate nearly the entire model even to just execute a single task.
We present a model-accelerator co-design framework to enable efficient on-device MTL.
arXiv Detail & Related papers (2022-10-26T15:40:24Z) - Multi-Task Learning as a Bargaining Game [63.49888996291245]
In Multi-task learning (MTL), a joint model is trained to simultaneously make predictions for several tasks.
Since the gradients of these different tasks may conflict, training a joint model for MTL often yields lower performance than its corresponding single-task counterparts.
We propose viewing the gradients combination step as a bargaining game, where tasks negotiate to reach an agreement on a joint direction of parameter update.
arXiv Detail & Related papers (2022-02-02T13:21:53Z) - Transfer Learning for Sequence Generation: from Single-source to
Multi-source [50.34044254589968]
We propose a two-stage finetuning method to alleviate the pretrain-finetune discrepancy and introduce a novel MSG model with a fine encoder to learn better representations in MSG tasks.
Our approach achieves new state-of-the-art results on the WMT17 APE task and multi-source translation task using the WMT14 test set.
arXiv Detail & Related papers (2021-05-31T09:12:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.