Improving Large Models with Small models: Lower Costs and Better Performance
- URL: http://arxiv.org/abs/2406.15471v1
- Date: Sat, 15 Jun 2024 14:44:43 GMT
- Title: Improving Large Models with Small models: Lower Costs and Better Performance
- Authors: Dong Chen, Shuo Zhang, Yueting Zhuang, Siliang Tang, Qidong Liu, Hua Wang, Mingliang Xu,
- Abstract summary: We propose Data Shunt$+$ (DS$+$), a general paradigm for collaboration of small and large models.
For instance, ChatGPT achieves an accuracy of $94.43%$ on Amazon Product sentiment analysis, and DS$+$ achieves an accuracy of $95.64%$, while the cost has been reduced to only $31.18%$.
- Score: 81.55672406002715
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pretrained large models (PLMs), such as ChatGPT, have demonstrated remarkable performance across diverse tasks. However, the significant computational requirements of PLMs have discouraged most product teams from running or fine-tuning them. In such cases, to harness the exceptional performance of PLMs, one must rely on expensive APIs, thereby exacerbating the economic burden. Despite the overall inferior performance of small models, in specific distributions, they can achieve comparable or even superior results. Consequently, some input can be processed exclusively by small models. On the other hand, certain tasks can be broken down into multiple subtasks, some of which can be completed without powerful capabilities. Under these circumstances, small models can handle the simple subtasks, allowing large models to focus on challenging subtasks, thus improving the performance. We propose Data Shunt$^+$ (DS$^+$), a general paradigm for collaboration of small and large models. DS$^+$ not only substantially reduces the cost associated with querying large models but also effectively improves large models' performance. For instance, ChatGPT achieves an accuracy of $94.43\%$ on Amazon Product sentiment analysis, and DS$^+$ achieves an accuracy of $95.64\%$, while the cost has been reduced to only $31.18\%$. Besides, experiments also prove that the proposed collaborative-based paradigm can better inject specific task knowledge into PLMs compared to fine-tuning.
Related papers
- Model Cascading for Code: Reducing Inference Costs with Model Cascading for LLM Based Code Generation [20.445496441396028]
We propose letting each model generate and execute a set of test cases for their solutions, and use the test results as the cascading threshold.
We show that our model cascading strategy reduces computational costs while increases accuracy compared to generating the output with a single model.
arXiv Detail & Related papers (2024-05-24T16:20:04Z) - Majority Kernels: An Approach to Leverage Big Model Dynamics for Efficient Small Model Training [32.154166415680066]
Methods like distillation, compression or quantization help leverage the highly performant large models to induce smaller performant ones.
This paper explores the hypothesis that a single training run can simultaneously train a larger model for performance and derive a smaller model for deployment.
arXiv Detail & Related papers (2024-02-07T17:07:41Z) - PanGu-$\pi$: Enhancing Language Model Architectures via Nonlinearity
Compensation [97.78045712375047]
We present a new efficient model architecture for large language models (LLMs)
We show that PanGu-$pi$-7B can achieve a comparable performance to that of benchmarks with about 10% inference speed-up.
In addition, we have deployed PanGu-$pi$-7B in the high-value domains of finance and law, developing an LLM named YunShan for practical application.
arXiv Detail & Related papers (2023-12-27T11:49:24Z) - Herd: Using multiple, smaller LLMs to match the performances of proprietary, large LLMs via an intelligent composer [1.3108652488669732]
We show that a herd of open source models can match or exceed the performance of proprietary models via an intelligent router.
In cases where GPT is not able to answer the query, Herd is able to identify a model that can, at least 40% of the time.
arXiv Detail & Related papers (2023-10-30T18:11:02Z) - Small Models are Valuable Plug-ins for Large Language Models [65.29370906766997]
Large language models (LLMs) such as GPT-3 and GPT-4 are powerful but their weights are often publicly unavailable.
We propose Super In-Context Learning (SuperICL) which allows black-box LLMs to work with locally fine-tuned smaller models.
arXiv Detail & Related papers (2023-05-15T17:59:01Z) - Distilling Step-by-Step! Outperforming Larger Language Models with Less
Training Data and Smaller Model Sizes [91.58845026796149]
We introduce Distilling step-by-step, a new mechanism that trains small models that outperform large language models.
We present three findings across 4 NLP benchmarks.
arXiv Detail & Related papers (2023-05-03T17:50:56Z) - Exploring Sparse Expert Models and Beyond [51.90860155810848]
Mixture-of-Experts (MoE) models can achieve promising results with outrageous large amount of parameters but constant computation cost.
We propose a simple method called expert prototyping that splits experts into different prototypes and applies $k$ top-$1$ routing.
This strategy improves the model quality but maintains constant computational costs, and our further exploration on extremely large-scale models reflects that it is more effective in training larger models.
arXiv Detail & Related papers (2021-05-31T16:12:44Z) - When Ensembling Smaller Models is More Efficient than Single Large
Models [52.38997176317532]
We show that ensembles can outperform single models with both higher accuracy and requiring fewer total FLOPs to compute.
This presents an interesting observation that output diversity in ensembling can often be more efficient than training larger models.
arXiv Detail & Related papers (2020-05-01T18:56:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.