ModelGPT: Unleashing LLM's Capabilities for Tailored Model Generation
- URL: http://arxiv.org/abs/2402.12408v1
- Date: Sun, 18 Feb 2024 11:24:34 GMT
- Title: ModelGPT: Unleashing LLM's Capabilities for Tailored Model Generation
- Authors: Zihao Tang, Zheqi Lv, Shengyu Zhang, Fei Wu, Kun Kuang
- Abstract summary: We propose ModelGPT, a framework designed to determine and generate AI models tailored to the data or task descriptions provided by the user.
Given user requirements, ModelGPT is able to provide tailored models at most 270x faster than the previous paradigms.
- Score: 35.160964210941955
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid advancement of Large Language Models (LLMs) has revolutionized
various sectors by automating routine tasks, marking a step toward the
realization of Artificial General Intelligence (AGI). However, they still
struggle to accommodate the diverse and specific needs of users and simplify
the utilization of AI models for the average user. In response, we propose
ModelGPT, a novel framework designed to determine and generate AI models
specifically tailored to the data or task descriptions provided by the user,
leveraging the capabilities of LLMs. Given user requirements, ModelGPT is able
to provide tailored models at most 270x faster than the previous paradigms
(e.g. all-parameter or LoRA finetuning). Comprehensive experiments on NLP, CV,
and Tabular datasets attest to the effectiveness of our framework in making AI
models more accessible and user-friendly. Our code is available at
https://github.com/IshiKura-a/ModelGPT.
Related papers
- Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - Enabling Small Models for Zero-Shot Classification through Model Label Learning [50.68074833512999]
We introduce a novel paradigm, Model Label Learning (MLL), which bridges the gap between models and their functionalities.
Experiments on seven real-world datasets validate the effectiveness and efficiency of MLL.
arXiv Detail & Related papers (2024-08-21T09:08:26Z) - GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks [0.0]
We will introduce a new kind of GLiNER model that can be used for various information extraction tasks while being a small encoder model.
Our model achieved SoTA performance on zero-shot NER benchmarks and leading performance on question-answering, summarization and relation extraction tasks.
arXiv Detail & Related papers (2024-06-14T13:54:29Z) - Model Callers for Transforming Predictive and Generative AI Applications [2.7195102129095003]
We introduce a novel software abstraction termed "model caller"
Model callers act as an intermediary for AI and ML model calling.
We have released a prototype Python library for model callers, accessible for installation via pip or for download from GitHub.
arXiv Detail & Related papers (2024-04-17T12:21:06Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - Herd: Using multiple, smaller LLMs to match the performances of proprietary, large LLMs via an intelligent composer [1.3108652488669732]
We show that a herd of open source models can match or exceed the performance of proprietary models via an intelligent router.
In cases where GPT is not able to answer the query, Herd is able to identify a model that can, at least 40% of the time.
arXiv Detail & Related papers (2023-10-30T18:11:02Z) - Model Share AI: An Integrated Toolkit for Collaborative Machine Learning
Model Development, Provenance Tracking, and Deployment in Python [0.0]
We introduce Model Share AI (AIMS), an easy-to-use MLOps platform designed to streamline collaborative model development, model provenance tracking, and model deployment.
AIMS features collaborative project spaces and a standardized model evaluation process that ranks model submissions based on their performance on unseen evaluation data.
AIMS allows users to deploy ML models built in Scikit-Learn, Keras, PyTorch, and ONNX into live REST APIs and automatically generated web apps.
arXiv Detail & Related papers (2023-09-27T15:24:39Z) - AutoML-GPT: Automatic Machine Learning with GPT [74.30699827690596]
We propose developing task-oriented prompts and automatically utilizing large language models (LLMs) to automate the training pipeline.
We present the AutoML-GPT, which employs GPT as the bridge to diverse AI models and dynamically trains models with optimized hyper parameters.
This approach achieves remarkable results in computer vision, natural language processing, and other challenging areas.
arXiv Detail & Related papers (2023-05-04T02:09:43Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - Re-parameterizing Your Optimizers rather than Architectures [119.08740698936633]
We propose a novel paradigm of incorporating model-specific prior knowledge into Structurals and using them to train generic (simple) models.
As an implementation, we propose a novel methodology to add prior knowledge by modifying the gradients according to a set of model-specific hyper- parameters.
For a simple model trained with a Repr, we focus on a VGG-style plain model and showcase that such a simple model trained with a Repr, which is referred to as Rep-VGG, performs on par with the recent well-designed models.
arXiv Detail & Related papers (2022-05-30T16:55:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.