COSMOS: Predictable and Cost-Effective Adaptation of LLMs
- URL: http://arxiv.org/abs/2505.01449v1
- Date: Wed, 30 Apr 2025 02:06:26 GMT
- Title: COSMOS: Predictable and Cost-Effective Adaptation of LLMs
- Authors: Jiayu Wang, Aws Albarghouthi, Frederic Sala,
- Abstract summary: Large language models (LLMs) achieve remarkable performance across numerous tasks by using a diverse array of adaptation strategies.<n>We introduce COSMOS, a unified prediction framework that efficiently estimates adaptation outcomes at minimal cost.
- Score: 21.91455944905485
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) achieve remarkable performance across numerous tasks by using a diverse array of adaptation strategies. However, optimally selecting a model and adaptation strategy under resource constraints is challenging and often requires extensive experimentation. We investigate whether it is possible to accurately predict both performance and cost without expensive trials. We formalize the strategy selection problem for LLMs and introduce COSMOS, a unified prediction framework that efficiently estimates adaptation outcomes at minimal cost. We instantiate and study the capability of our framework via a pair of powerful predictors: embedding-augmented lightweight proxy models to predict fine-tuning performance, and low-sample scaling laws to forecast retrieval-augmented in-context learning. Extensive evaluation across eight representative benchmarks demonstrates that COSMOS achieves high prediction accuracy while reducing computational costs by 92.72% on average, and up to 98.71% in resource-intensive scenarios. Our results show that efficient prediction of adaptation outcomes is not only feasible but can substantially reduce the computational overhead of LLM deployment while maintaining performance standards.
Related papers
- Probabilistic Optimality for Inference-time Scaling [11.92228840747636]
Inference-time scaling has emerged as a powerful technique for enhancing the reasoning performance of Large Language Models (LLMs)<n>We propose a probabilistic framework that formalizes the optimality of inference-time scaling under the assumption that parallel samples are independently and identically distributed (i.i.d.)<n>Within this framework, we derive a theoretical lower bound on the required number of samples to achieve a target performance level, providing the first principled guidance for compute-efficient scaling.
arXiv Detail & Related papers (2025-06-27T16:44:11Z) - LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection [11.353302879735862]
Open-sourced Large Language Models (LLMs) and diverse downstream tasks require efficient model selection.<n>We propose a novel theoretical framework that provides a proper lens to assess the generalization capabilities of LLMs.<n>In particular, we first derive a PAC-Bayesian Generalization Bound that unveils fine-tuning dynamics of LLMs.<n>We then introduce LENSLLM, a Neural Tangent Kernel (NTK)-based Rectified Scaling Model that enables accurate performance predictions.
arXiv Detail & Related papers (2025-05-01T15:07:32Z) - Can Pre-training Indicators Reliably Predict Fine-tuning Outcomes of LLMs? [32.04523360747506]
We construct a dataset using 50 1B parameter LLM variants with systematically varied pre-training configurations.<n>We introduce novel unsupervised and supervised proxy metrics derived from pre-training that successfully reduce the relative performance prediction error rate by over 50%.
arXiv Detail & Related papers (2025-04-16T21:19:09Z) - Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective [5.09611816929943]
Accurately predicting downstream task performance prior to model training is crucial for efficient resource allocation.<n>Existing performance prediction methods suffer from limited accuracy and reliability.<n>We propose a Clustering-On-Difficulty (COD) downstream performance prediction framework.
arXiv Detail & Related papers (2025-02-24T15:44:57Z) - The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws [51.608402959163925]
We present the first systematic exploration of optimal sparse pre-training configurations for large language models.<n>We find that initiating pruning at 25% of total training compute and concluding at 75% achieves near-optimal final evaluation loss.<n>We propose a new scaling law that modifies the Chinchilla scaling law to use the average parameter count over pre-training.
arXiv Detail & Related papers (2025-01-21T20:23:22Z) - Adaptive Pruning for Large Language Models with Structural Importance Awareness [66.2690963378878]
Large language models (LLMs) have significantly improved language understanding and generation capabilities.<n>LLMs are difficult to deploy on resource-constrained edge devices due to their high computational and storage resource demands.<n>We propose structurally-aware adaptive pruning (SAAP) to significantly reduce the computational and memory costs while maintaining model performance.
arXiv Detail & Related papers (2024-12-19T18:08:04Z) - Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs [75.11449420928139]
Fine-tuning Large Language Models (LLMs) has become a crucial technique for adapting pre-trained models to downstream tasks.
Low-Rank Adaptation (LoRA) has emerged as a promising solution, but there exists a gap between the practical performance of low-rank adaptations and its theoretical optimum.
We propose eXtreme Gradient Boosting LoRA, a novel framework that bridges this gap by leveraging the power of ensemble learning.
arXiv Detail & Related papers (2024-10-25T17:07:13Z) - Scaling Laws for Predicting Downstream Performance in LLMs [75.28559015477137]
This work focuses on the pre-training loss as a more computation-efficient metric for performance estimation.<n>We present FLP-M, a fundamental approach for performance prediction that addresses the practical need to integrate datasets from multiple sources during pre-training.
arXiv Detail & Related papers (2024-10-11T04:57:48Z) - Understanding the Performance and Estimating the Cost of LLM Fine-Tuning [9.751868268608675]
Fine-tuning Large Language Models (LLMs) for specific tasks in a cost-effective manner.
In this paper, we characterize sparse Mixture of Experts (MoE) based LLM fine-tuning to understand their accuracy and runtime performance.
We also develop and validate an analytical model to estimate the cost of LLM fine-tuning on the cloud.
arXiv Detail & Related papers (2024-08-08T16:26:07Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Value Augmented Sampling for Language Model Alignment and Personalization [39.070662999014836]
We present a new framework for reward optimization, Value Augmented Sampling (VAS)
VAS solves for the optimal reward-maximizing policy without co-training the policy and the value function.
Our algorithm unlocks the new capability of composing several rewards and controlling the extent of each one during deployment time.
arXiv Detail & Related papers (2024-05-10T17:59:04Z) - Which Examples to Annotate for In-Context Learning? Towards Effective
and Efficient Selection [35.924633625147365]
Large Language Models (LLMs) can adapt to new tasks via in-context learning (ICL)
In this work, we investigate an active learning approach for ICL, where there is a limited budget for annotating examples.
We propose a model-adaptive optimization-free algorithm, termed AdaICL, which identifies examples that the model is uncertain about.
arXiv Detail & Related papers (2023-10-30T22:03:55Z) - Leaving the Nest: Going Beyond Local Loss Functions for
Predict-Then-Optimize [57.22851616806617]
We show that our method achieves state-of-the-art results in four domains from the literature.
Our approach outperforms the best existing method by nearly 200% when the localness assumption is broken.
arXiv Detail & Related papers (2023-05-26T11:17:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.