Related papers: VersaTune: An Efficient Data Composition Framework for Training Multi-Capability LLMs

VersaTune: An Efficient Data Composition Framework for Training Multi-Capability LLMs

URL: http://arxiv.org/abs/2411.11266v4
Date: Thu, 05 Dec 2024 02:48:32 GMT
Title: VersaTune: An Efficient Data Composition Framework for Training Multi-Capability LLMs
Authors: Keer Lu, Keshi Zhao, Zheng Liang, Da Pan, Shusen Zhang, Xin Wu, Weipeng Chen, Zenan Zhou, Guosheng Dong, Bin Cui, Wentao Zhang,
Abstract summary: VersaTune is a novel data composition framework designed for enhancing Large Language Models' multi-ability performances during training.<n>We categorize knowledge into distinct domains including law, medicine, finance, science, code, etc.<n>We demonstrate that VersaTune achieves significant improvements in multi-domain performance, with a 35.21% enhancement in comprehensive multi-domain tasks.
Score: 38.65649832364651
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large-scale pretrained models, particularly Large Language Models (LLMs), have exhibited remarkable capabilities in handling multiple tasks across domains due to their emergent properties. These capabilities are further augmented during the Supervised Fine-Tuning (SFT) phase. Despite their potential, existing work mainly focuses on domain-specific enhancements during fine-tuning, the challenge of which lies in catastrophic forgetting of knowledge across other domains. In this study, we introduce VersaTune, a novel data composition framework designed for enhancing LLMs' overall multi-ability performances during training. We categorize knowledge into distinct domains including law, medicine, finance, science, code, etc. We begin with detecting the distribution of domain-specific knowledge within the base model, followed by the training data composition that aligns with the model's existing knowledge distribution. During the training process, domain weights are dynamically adjusted based on their learnable potential and forgetting degree. Experimental results demonstrate that VersaTune achieves significant improvements in multi-domain performance, with an 35.21% enhancement in comprehensive multi-domain tasks. Additionally, in scenarios where specific domain optimization is required, VersaTune reduces the degradation of performance in other domains by 38.77%, without compromising the target domain's training efficacy.

Related papers

Towards Text-free Graph Foundation Models: Rethinking Multi-Domain Graph Contrastive Learning [40.56379624114316]
We propose a novel multi-domain pre-training and cross-domain transfer framework, namely MDGCL.<n>In the pre-training stage, we design a contrastive learning strategy to substantially recognize and capture domain differences.<n>In the downstream stage, we introduce a domain attention mechanism to enable fine-grained domain knowledge transfer.
arXiv Detail & Related papers (2025-06-26T03:14:50Z)
OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation [65.15955645757705]
We introduce Workforce, a hierarchical multi-agent framework that decouples strategic planning from specialized execution.<n>During inference, Workforce seamlessly adapts to new domains by adding or modifying worker agents.<n>For training, we introduce optimized Workforce Learning (OWL), which improves generalization across domains.
arXiv Detail & Related papers (2025-05-29T17:51:58Z)
Commute Your Domains: Trajectory Optimality Criterion for Multi-Domain Learning [50.80758278865274]
In multi-domain learning, a single model is trained on diverse data domains to leverage shared knowledge and improve generalization. The order in which the data from these domains is used for training can significantly affect the model's performance on each domain. We investigate the influence of training order (or data mixing) in multi-domain learning using the concept of Lie bracket of gradient vector fields.
arXiv Detail & Related papers (2025-01-26T15:12:06Z)
Specialized Foundation Models Struggle to Beat Supervised Baselines [60.23386520331143]
We look at three modalities -- genomics, satellite imaging, and time series -- with multiple recent FMs and compare them to a standard supervised learning workflow. We find that it is consistently possible to train simple supervised models that match or even outperform the latest foundation models.
arXiv Detail & Related papers (2024-11-05T04:10:59Z)
Large Language Model for Multi-Domain Translation: Benchmarking and Domain CoT Fine-tuning [55.107329995417786]
Large language models (LLMs) have demonstrated impressive general understanding and generation abilities. We establish a benchmark for multi-domain translation, featuring 25 German$Leftrightarrow$English and 22 Chinese$Leftrightarrow$English test sets. We propose a domain Chain of Thought (CoT) fine-tuning technique that utilizes the intrinsic multi-domain intelligence of LLMs to improve translation performance.
arXiv Detail & Related papers (2024-10-03T16:15:04Z)
Mixing It Up: The Cocktail Effect of Multi-Task Fine-Tuning on LLM Performance -- A Case Study in Finance [0.32985979395737774]
We study the application of large language models (LLMs) in domain-specific contexts, including finance. We find that fine-tuning exclusively on the target task is not always the most effective strategy. Instead, multi-task fine-tuning can significantly enhance performance.
arXiv Detail & Related papers (2024-10-01T22:35:56Z)
Domain-Aware Fine-Tuning of Foundation Models [18.336887359257087]
Foundation models (FMs) have revolutionized computer vision, enabling effective learning across different domains. This paper investigates the zero-shot domain adaptation potential of FMs by comparing different backbone architectures. We introduce novel domain-aware components that leverage domain related textual embeddings.
arXiv Detail & Related papers (2024-07-03T20:10:55Z)
Multi-level Personalized Federated Learning on Heterogeneous and Long-Tailed Data [10.64629029156029]
We introduce an innovative personalized Federated Learning framework, Multi-level Personalized Federated Learning (MuPFL) MuPFL integrates three pivotal modules: Biased Activation Value Dropout (BAVD), Adaptive Cluster-based Model Update (ACMU) and Prior Knowledge-assisted Fine-tuning (PKCF) Experiments on diverse real-world datasets show that MuPFL consistently outperforms state-of-the-art baselines, even under extreme non-i.i.d. and long-tail conditions.
arXiv Detail & Related papers (2024-05-10T11:52:53Z)
Investigating Continual Pretraining in Large Language Models: Insights and Implications [9.660013084324817]
Continual learning in large language models (LLMs) is an evolving domain that focuses on developing efficient and sustainable training strategies. We introduce a new benchmark designed to measure the adaptability of LLMs to changing pretraining data landscapes. Our findings uncover several key insights: (i) continual pretraining consistently improves 1.5B models studied in this work and is also superior to domain adaptation, (ii) larger models always achieve better perplexity than smaller ones when continually pretrained on the same corpus, (iii) smaller models are particularly sensitive to continual pretraining, showing the most significant rates of both learning and
arXiv Detail & Related papers (2024-02-27T10:47:24Z)
EcomGPT-CT: Continual Pre-training of E-commerce Large Language Models with Semi-structured Data [67.8302955948861]
Large Language Models (LLMs) pre-trained on massive corpora have exhibited remarkable performance on various NLP tasks. Applying these models to specific domains still poses significant challenges, such as lack of domain knowledge. We focus on domain-specific continual pre-training of LLMs using E-commerce domain as an exemplar.
arXiv Detail & Related papers (2023-12-25T11:31:47Z)
Improving Domain Generalization with Domain Relations [77.63345406973097]
This paper focuses on domain shifts, which occur when the model is applied to new domains that are different from the ones it was trained on. We propose a new approach called D$3$G to learn domain-specific models. Our results show that D$3$G consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-02-06T08:11:16Z)
CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance. In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z)
Forget Less, Count Better: A Domain-Incremental Self-Distillation Learning Benchmark for Lifelong Crowd Counting [51.44987756859706]
Off-the-shelf methods have some drawbacks to handle multiple domains. Lifelong Crowd Counting aims at alleviating the catastrophic forgetting and improving the generalization ability.
arXiv Detail & Related papers (2022-05-06T15:37:56Z)
TAL: Two-stream Adaptive Learning for Generalizable Person Re-identification [115.31432027711202]
We argue that both domain-specific and domain-invariant features are crucial for improving the generalization ability of re-id models. We name two-stream adaptive learning (TAL) to simultaneously model these two kinds of information. Our framework can be applied to both single-source and multi-source domain generalization tasks.
arXiv Detail & Related papers (2021-11-29T01:27:42Z)
Improving Transferability of Domain Adaptation Networks Through Domain Alignment Layers [1.3766148734487902]
Multi-source unsupervised domain adaptation (MSDA) aims at learning a predictor for an unlabeled domain by assigning weak knowledge from a bag of source models. We propose to embed Multi-Source version of DomaIn Alignment Layers (MS-DIAL) at different levels of the predictor. Our approach can improve state-of-the-art MSDA methods, yielding relative gains of up to +30.64% on their classification accuracies.
arXiv Detail & Related papers (2021-09-06T18:41:19Z)
Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting [75.80116276369694]
In crowd counting, due to the problem of laborious labelling, it is perceived intractability of collecting a new large-scale dataset. We resort to the multi-domain joint learning and propose a simple but effective Domain-specific Knowledge Propagating Network (DKPNet) It is mainly achieved by proposing the novel Variational Attention(VA) technique for explicitly modeling the attention distributions for different domains.
arXiv Detail & Related papers (2021-08-18T08:06:37Z)
Domain Adaptation for Semantic Parsing [68.81787666086554]
We propose a novel semantic for domain adaptation, where we have much fewer annotated data in the target domain compared to the source domain. Our semantic benefits from a two-stage coarse-to-fine framework, thus can provide different and accurate treatments for the two stages. Experiments on a benchmark dataset show that our method consistently outperforms several popular domain adaptation strategies.
arXiv Detail & Related papers (2020-06-23T14:47:41Z)
Domain Conditioned Adaptation Network [90.63261870610211]
We propose a Domain Conditioned Adaptation Network (DCAN) to excite distinct convolutional channels with a domain conditioned channel attention mechanism. This is the first work to explore the domain-wise convolutional channel activation for deep DA networks.
arXiv Detail & Related papers (2020-05-14T04:23:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.