MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning
- URL: http://arxiv.org/abs/2311.02303v1
- Date: Sat, 4 Nov 2023 02:22:40 GMT
- Title: MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning
- Authors: Bingchang Liu, Chaoyu Chen, Cong Liao, Zi Gong, Huan Wang, Zhichao
Lei, Ming Liang, Dajun Chen, Min Shen, Hailian Zhou, Hang Yu, Jianguo Li
- Abstract summary: We present a multi-task fine-tuning framework, MFTcoder, that enables simultaneous and parallel fine-tuning on multiple tasks.
Experiments have conclusively demonstrated that our multi-task fine-tuning approach outperforms both individual fine-tuning on single tasks and fine-tuning on a mixed ensemble of tasks.
- Score: 28.12788291168137
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Code LLMs have emerged as a specialized research field, with remarkable
studies dedicated to enhancing model's coding capabilities through fine-tuning
on pre-trained models. Previous fine-tuning approaches were typically tailored
to specific downstream tasks or scenarios, which meant separate fine-tuning for
each task, requiring extensive training resources and posing challenges in
terms of deployment and maintenance. Furthermore, these approaches failed to
leverage the inherent interconnectedness among different code-related tasks. To
overcome these limitations, we present a multi-task fine-tuning framework,
MFTcoder, that enables simultaneous and parallel fine-tuning on multiple tasks.
By incorporating various loss functions, we effectively address common
challenges in multi-task learning, such as data imbalance, varying difficulty
levels, and inconsistent convergence speeds. Extensive experiments have
conclusively demonstrated that our multi-task fine-tuning approach outperforms
both individual fine-tuning on single tasks and fine-tuning on a mixed ensemble
of tasks. Moreover, MFTcoder offers efficient training capabilities, including
efficient data tokenization modes and PEFT fine-tuning, resulting in
significantly improved speed compared to traditional fine-tuning methods.
MFTcoder seamlessly integrates with several mainstream open-source LLMs, such
as CodeLLama and Qwen. Leveraging the CodeLLama foundation, our MFTcoder
fine-tuned model, \textsc{CodeFuse-CodeLLama-34B}, achieves an impressive
pass@1 score of 74.4\% on the HumaneEval benchmark, surpassing GPT-4
performance (67\%, zero-shot). MFTCoder is open-sourced at
\url{https://github.com/codefuse-ai/MFTCOder}
Related papers
- AdapMTL: Adaptive Pruning Framework for Multitask Learning Model [5.643658120200373]
AdapMTL is an adaptive pruning framework for multitask models.
It balances sparsity allocation and accuracy performance across multiple tasks.
It showcases superior performance compared to state-of-the-art pruning methods.
arXiv Detail & Related papers (2024-08-07T17:19:15Z) - Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment [0.0]
"Harmonized Transfer Learning and Modality alignment (HarMA)" is a method that simultaneously satisfies task constraints, modality alignment, and single-modality uniform alignment.
HarMA achieves state-of-the-art performance in two popular multimodal retrieval tasks in the field of remote sensing.
arXiv Detail & Related papers (2024-04-28T17:20:08Z) - Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning [50.73666458313015]
Large Language Models (LLMs) have demonstrated significant potential in performing multiple tasks in multimedia applications.
MoE has been emerged as a promising solution with its sparse architecture for effective task decoupling.
Intuition-MoR1E achieves superior efficiency and 2.15% overall accuracy improvement across 14 public datasets.
arXiv Detail & Related papers (2024-04-13T12:14:58Z) - MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning [1.4396109429521227]
Adapting models pre-trained on large-scale datasets to a variety of downstream tasks is a common strategy in deep learning.
parameter-efficient fine-tuning methods have emerged as a promising way to adapt pre-trained models to different tasks while training only a minimal number of parameters.
We introduce MTLoRA, a novel framework for parameter-efficient training of Multi-Task Learning models.
arXiv Detail & Related papers (2024-03-29T17:43:58Z) - Fair Resource Allocation in Multi-Task Learning [12.776767874217663]
Multi-task learning (MTL) can leverage the shared knowledge across tasks, resulting in improved data efficiency and generalization performance.
A major challenge in MTL lies in the presence of conflicting gradients, which can hinder the fair optimization of some tasks.
Inspired by fair resource allocation in communication networks, we propose FairGrad, a novel MTL optimization method.
arXiv Detail & Related papers (2024-02-23T22:46:14Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - Efficient Controllable Multi-Task Architectures [85.76598445904374]
We propose a multi-task model consisting of a shared encoder and task-specific decoders where both encoder and decoder channel widths are slimmable.
Our key idea is to control the task importance by varying the capacities of task-specific decoders, while controlling the total computational cost.
This improves overall accuracy by allowing a stronger encoder for a given budget, increases control over computational cost, and delivers high-quality slimmed sub-architectures.
arXiv Detail & Related papers (2023-08-22T19:09:56Z) - M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task
Learning with Model-Accelerator Co-design [95.41238363769892]
Multi-task learning (MTL) encapsulates multiple learned tasks in a single model and often lets those tasks learn better jointly.
Current MTL regimes have to activate nearly the entire model even to just execute a single task.
We present a model-accelerator co-design framework to enable efficient on-device MTL.
arXiv Detail & Related papers (2022-10-26T15:40:24Z) - Multi-Task Learning as a Bargaining Game [63.49888996291245]
In Multi-task learning (MTL), a joint model is trained to simultaneously make predictions for several tasks.
Since the gradients of these different tasks may conflict, training a joint model for MTL often yields lower performance than its corresponding single-task counterparts.
We propose viewing the gradients combination step as a bargaining game, where tasks negotiate to reach an agreement on a joint direction of parameter update.
arXiv Detail & Related papers (2022-02-02T13:21:53Z) - Transfer Learning for Sequence Generation: from Single-source to
Multi-source [50.34044254589968]
We propose a two-stage finetuning method to alleviate the pretrain-finetune discrepancy and introduce a novel MSG model with a fine encoder to learn better representations in MSG tasks.
Our approach achieves new state-of-the-art results on the WMT17 APE task and multi-source translation task using the WMT14 test set.
arXiv Detail & Related papers (2021-05-31T09:12:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.