Related papers: LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models

LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models

URL: http://arxiv.org/abs/2306.12420v2
Date: Sun, 5 May 2024 13:13:02 GMT
Title: LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models
Authors: Shizhe Diao, Rui Pan, Hanze Dong, Ka Shun Shum, Jipeng Zhang, Wei Xiong, Tong Zhang,
Abstract summary: Foundation models have demonstrated a great ability to achieve general human-level intelligence far beyond traditional approaches. A significant shortcoming of most foundation models lies in their performance in specialized-domain and task-specific applications. We introduce LMFlow, which aims to simplify the domain- and task-aware finetuning of general foundation models.
Score: 31.121714473817793
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Foundation models have demonstrated a great ability to achieve general human-level intelligence far beyond traditional approaches. As the technique keeps attracting attention from the AI community, an increasing number of foundation models are becoming publicly accessible. However, a significant shortcoming of most of these models lies in their performance in specialized-domain and task-specific applications, necessitating domain- and task-aware fine-tuning to develop effective scientific language models. As the number of available foundation models and specialized tasks keeps growing, the job of training scientific language models becomes highly nontrivial. In this paper, we initiate steps to tackle this issue. We introduce an extensible and lightweight toolkit, LMFlow, which aims to simplify the domain- and task-aware finetuning of general foundation models. LMFlow offers a complete finetuning workflow for a foundation model to support specialized training with limited computing resources. Furthermore, it supports continuous pretraining, instruction tuning, parameter-efficient finetuning, alignment tuning, inference acceleration, long context generalization, model customization, and even multimodal finetuning, along with carefully designed and extensible APIs. This toolkit has been thoroughly tested and is available at https://github.com/OptimalScale/LMFlow.

Related papers

EoS-FM: Can an Ensemble of Specialist Models act as a Generalist Feature Extractor? [8.178030486012437]
We present an Ensemble-of-Specialists framework for building Remote Sensing Foundation Models (RSFMs)<n>Our method decomposes the training process into lightweight, task-specific ConvNeXtV2 specialists that can be frozen and reused.<n>Our framework sets a new direction for building scalable and efficient RSFMs.
arXiv Detail & Related papers (2025-11-26T15:52:56Z)
Revealing the Power of Post-Training for Small Language Models via Knowledge Distillation [43.68215777330875]
We introduce a systematic post-training pipeline that efficiently enhances small model accuracy.<n>The resulting instruction-tuned model achieves state-of-the-art performance.<n>This work provides a practical and efficient solution for developing high-performance language models on Ascend edge devices.
arXiv Detail & Related papers (2025-09-30T16:40:55Z)
MatterTune: An Integrated, User-Friendly Platform for Fine-Tuning Atomistic Foundation Models to Accelerate Materials Simulation and Discovery [7.1240120153291535]
We introduce MatterTune, a framework that provides advanced fine-tuning capabilities and seamless integration of atomistic foundation models into downstream materials informatics and simulation. MatterTune supports a number of state-of-the-art foundation models such as ORB, MatterSim, JMP, and EquformerV2.
arXiv Detail & Related papers (2025-04-14T19:12:43Z)
Efficient Domain Adaptation of Multimodal Embeddings using Constrastive Learning [0.08192907805418582]
Current approaches either yield subpar results when using pretrained models without task-specific adaptation, or require substantial computational resources for fine-tuning. We propose a novel method for adapting foundational, multimodal embeddings to downstream tasks, without the need of expensive fine-tuning processes.
arXiv Detail & Related papers (2025-02-04T06:30:12Z)
Specialized Foundation Models Struggle to Beat Supervised Baselines [60.23386520331143]
We look at three modalities -- genomics, satellite imaging, and time series -- with multiple recent FMs and compare them to a standard supervised learning workflow. We find that it is consistently possible to train simple supervised models that match or even outperform the latest foundation models.
arXiv Detail & Related papers (2024-11-05T04:10:59Z)
An Emulator for Fine-Tuning Large Language Models using Small Language Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales. We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training. Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z)
Adapting Large Language Models for Content Moderation: Pitfalls in Data Engineering and Supervised Fine-tuning [79.53130089003986]
Large Language Models (LLMs) have become a feasible solution for handling tasks in various domains. In this paper, we introduce how to fine-tune a LLM model that can be privately deployed for content moderation.
arXiv Detail & Related papers (2023-10-05T09:09:44Z)
TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter [21.41170708560114]
A growing number of applications based on visual foundation models are emerging. In situations involving system upgrades, it becomes essential to re-train all downstream modules to adapt to the new foundation model. We introduce a parameter-efficient and task-agnostic adapter, dubbed TaCA, that facilitates compatibility across distinct foundation models.
arXiv Detail & Related papers (2023-06-22T03:00:24Z)
Towards Efficient Task-Driven Model Reprogramming with Foundation Models [52.411508216448716]
Vision foundation models exhibit impressive power, benefiting from the extremely large model capacity and broad training data. However, in practice, downstream scenarios may only support a small model due to the limited computational resources or efficiency considerations. This brings a critical challenge for the real-world application of foundation models: one has to transfer the knowledge of a foundation model to the downstream task.
arXiv Detail & Related papers (2023-04-05T07:28:33Z)
eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception. Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency. We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z)
Model Reprogramming: Resource-Efficient Cross-Domain Machine Learning [65.268245109828]
In data-rich domains such as vision, language, and speech, deep learning prevails to deliver high-performance task-specific models. Deep learning in resource-limited domains still faces multiple challenges including (i) limited data, (ii) constrained model development cost, and (iii) lack of adequate pre-trained models for effective finetuning. Model reprogramming enables resource-efficient cross-domain machine learning by repurposing a well-developed pre-trained model from a source domain to solve tasks in a target domain without model finetuning.
arXiv Detail & Related papers (2022-02-22T02:33:54Z)
Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains [45.07506437436464]
We present a general approach to developing small, fast and effective pre-trained models for specific domains. This is achieved by adapting the off-the-shelf general pre-trained models and performing task-agnostic knowledge distillation in target domains.
arXiv Detail & Related papers (2021-06-25T07:37:05Z)
CALM: Continuous Adaptive Learning for Language Modeling [18.72860206714457]
Training large language representation models has become a standard in the natural language processing community. We demonstrate that in practice these pre-trained models present performance deterioration in the form of catastrophic forgetting. We propose CALM, Continuous Adaptive Learning for Language Modeling: techniques to render models which retain knowledge across multiple domains.
arXiv Detail & Related papers (2020-04-08T03:51:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.