Scaling LLM Inference with Optimized Sample Compute Allocation
- URL: http://arxiv.org/abs/2410.22480v1
- Date: Tue, 29 Oct 2024 19:17:55 GMT
- Title: Scaling LLM Inference with Optimized Sample Compute Allocation
- Authors: Kexun Zhang, Shang Zhou, Danqing Wang, William Yang Wang, Lei Li,
- Abstract summary: We propose OSCA, an algorithm to find an optimal mix of different inference configurations.
Our experiments show that with our learned mixed allocation, we can achieve accuracy better than the best single configuration.
OSCA is also shown to be effective in agentic beyond single-turn tasks, achieving a better accuracy on SWE-Bench with 3x less compute than the default configuration.
- Score: 56.524278187351925
- License:
- Abstract: Sampling is a basic operation in many inference-time algorithms of large language models (LLMs). To scale up inference efficiently with a limited compute, it is crucial to find an optimal allocation for sample compute budgets: Which sampling configurations (model, temperature, language, etc.) do we use? How many samples do we generate in each configuration? We formulate these choices as a learning problem and propose OSCA, an algorithm that Optimizes Sample Compute Allocation by finding an optimal mix of different inference configurations. Our experiments show that with our learned mixed allocation, we can achieve accuracy better than the best single configuration with 128x less compute on code generation and 25x less compute on 4 reasoning tasks. OSCA is also shown to be effective in agentic workflows beyond single-turn tasks, achieving a better accuracy on SWE-Bench with 3x less compute than the default configuration. Our code and generations are released at https://github.com/LeiLiLab/OSCA.
Related papers
- Faster Adaptive Decentralized Learning Algorithms [24.379734053137597]
We propose a class of faster distributed non decentralized algorithms (i.e. AdaMDOS and AdaMDOF) for adaptive learning and finite-sum optimization.
Some experimental results demonstrate efficiency of our algorithms.
arXiv Detail & Related papers (2024-08-19T08:05:33Z) - Sparsity-Constraint Optimization via Splicing Iteration [1.3622424109977902]
We develop an algorithm named Sparsity-Constraint Optimization via sPlicing itEration (SCOPE)
SCOPE converges effectively without tuning parameters.
We apply SCOPE to solve quadratic optimization, learn sparse classifiers, and recover sparse Markov networks for binary variables.
Our open-source Python package skscope based on C++ implementation is publicly available on GitHub.
arXiv Detail & Related papers (2024-06-17T18:34:51Z) - Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML.
This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z) - Predicting Software Performance with Divide-and-Learn [3.635696352780227]
We propose an approach based on the concept of 'divide-and-learn', dubbed DaL.
Experiment results from eight real-world systems and five sets of training data reveal that DaL performs no worse than the best counterpart on 33 out of 40 cases.
arXiv Detail & Related papers (2023-06-11T11:16:27Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - Decentralized Gossip-Based Stochastic Bilevel Optimization over
Communication Networks [42.76623191830371]
We propose a gossip-based distributed bilevel optimization algorithm.
Agents can solve both networked and outer problems in a single time.
Our algorithm achieves the state-of-the-art efficiency and test accuracy.
arXiv Detail & Related papers (2022-06-22T06:38:54Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Refined bounds for algorithm configuration: The knife-edge of dual class
approximability [94.83809668933021]
We investigate how large should a training set be to ensure that a parameter's average metrics performance over the training set is close to its expected, future performance.
We show that if this approximation holds under the L-infinity norm, we can provide strong sample complexity bounds.
We empirically evaluate our bounds in the context of integer programming, one of the most powerful tools in computer science.
arXiv Detail & Related papers (2020-06-21T15:32:21Z) - Bandit Samplers for Training Graph Neural Networks [63.17765191700203]
Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs)
These sampling algorithms are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT)
arXiv Detail & Related papers (2020-06-10T12:48:37Z) - Distributionally Robust Weighted $k$-Nearest Neighbors [21.537952410507483]
Learning a robust classifier from a few samples remains a key challenge in machine learning.
In this paper, we study a minimax distributionally robust formulation of weighted $k$-nearest neighbors.
We develop an algorithm, textttDr.k-NN, that efficiently solves this functional optimization problem.
arXiv Detail & Related papers (2020-06-07T00:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.