Related papers: Patched MOA: optimizing inference for diverse software development tasks

Patched MOA: optimizing inference for diverse software development tasks

URL: http://arxiv.org/abs/2407.18521v2
Date: Fri, 6 Sep 2024 06:49:31 GMT
Title: Patched MOA: optimizing inference for diverse software development tasks
Authors: Asankhaya Sharma,
Abstract summary: This paper introduces Patched MOA, an inference optimization technique that significantly enhances the performance of large language models (LLMs) We evaluate three inference optimization algorithms - Best of N, Mixture of Agents, and Monte Carlo Tree Search. We demonstrate that Patched MOA can boost the performance of smaller models to surpass that of larger, more expensive models.
Score: 1.14219428942199
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper introduces Patched MOA (Mixture of Agents), an inference optimization technique that significantly enhances the performance of large language models (LLMs) across diverse software development tasks. We evaluate three inference optimization algorithms - Best of N, Mixture of Agents, and Monte Carlo Tree Search and demonstrate that Patched MOA can boost the performance of smaller models to surpass that of larger, more expensive models. Notably, our approach improves the gpt-4o-mini model's performance on the Arena-Hard-Auto benchmark by 15.52%, outperforming gpt-4-turbo at a fraction of the cost. We also apply Patched MOA to various software development workflows, showing consistent improvements in task completion rates. Our method is model-agnostic, transparent to end-users, and can be easily integrated into existing LLM pipelines. This work contributes to the growing field of LLM optimization, offering a cost-effective solution for enhancing model performance without the need for fine-tuning or larger models. Our implementation is open-source and available at https://github.com/codelion/optillm.

Related papers

Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training [4.059735204483926]
We propose Lumos, a trace-driven performance modeling and estimation toolkit for large-scale LLM training. We show that Lumos can replay execution time with an average error of just 3.3%, along with other runtime details, across different models and configurations.
arXiv Detail & Related papers (2025-04-12T18:43:24Z)
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning [76.35753243272521]
We introduce VisualPRM, which improves the reasoning abilities of existing Multimodal Large Language Models (MLLMs) Our model achieves a 5.9-point improvement across seven multimodal reasoning benchmarks. For the evaluation of multimodal PRMs, we propose VisualProcessBench, a benchmark with human-annotated step-wise correctness labels.
arXiv Detail & Related papers (2025-03-13T12:03:37Z)
Extrapolation Merging: Keep Improving With Extrapolation and Merging [14.786100203787194]
Large Language Models (LLMs) require instruction fine-tuning to perform different downstream tasks. Model merging aims to enhance performance by combining the parameters of different models. We propose Extrapolation Merging, a paradigm that can continue improving model performance without requiring extra computational resources or data.
arXiv Detail & Related papers (2025-03-05T14:28:22Z)
Capability Instruction Tuning: A New Paradigm for Dynamic LLM Routing [64.38277118982698]
Large Language Models (LLMs) have demonstrated human-like instruction-following abilities. In this work, we explore how to route the best-performing LLM for each instruction to achieve better overall performance. We develop a new paradigm, constructing capability instructions with model capability representation, user instruction, and performance inquiry prompts to assess the performance.
arXiv Detail & Related papers (2025-02-24T16:10:53Z)
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization [65.64108848398696]
We introduce a preference optimization process to enhance the multimodal reasoning capabilities of MLLMs. We develop a simple yet effective method, termed Mixed Preference Optimization (MPO), which boosts multimodal CoT performance. Our model, InternVL2-8B-MPO, achieves an accuracy of 67.0 on MathVista, outperforming InternVL2-8B by 8.7 points and achieving performance comparable to the 10x larger InternVL2-76B.
arXiv Detail & Related papers (2024-11-15T18:59:27Z)
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance [78.48606021719206]
Mini-InternVL is a series of MLLMs with parameters ranging from 1B to 4B, which achieves 90% of the performance with only 5% of the parameters. We develop a unified adaptation framework for Mini-InternVL, which enables our models to transfer and outperform specialized models in downstream tasks.
arXiv Detail & Related papers (2024-10-21T17:58:20Z)
Decoding-Time Language Model Alignment with Multiple Objectives [116.42095026960598]
Existing methods primarily focus on optimizing LMs for a single reward function, limiting their adaptability to varied objectives. Here, we propose $textbfmulti-objective decoding (MOD)$, a decoding-time algorithm that outputs the next token from a linear combination of predictions. We show why existing approaches can be sub-optimal even in natural settings and obtain optimality guarantees for our method.
arXiv Detail & Related papers (2024-06-27T02:46:30Z)
Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs [40.159064885288245]
We study prompt optimization for Language Model Programs. We factorize our problem into optimizing the free-form instructions and few-shot demonstrations of every module. We develop MIPRO, a novel algorithm for optimizing LM programs.
arXiv Detail & Related papers (2024-06-17T16:12:03Z)
Iterative or Innovative? A Problem-Oriented Perspective for Code Optimization [81.88668100203913]
Large language models (LLMs) have demonstrated strong capabilities in solving a wide range of programming tasks. In this paper, we explore code optimization with a focus on performance enhancement, specifically aiming to optimize code for minimal execution time.
arXiv Detail & Related papers (2024-06-17T16:10:10Z)
ORLM: A Customizable Framework in Training Large Models for Automated Optimization Modeling [15.673219028826173]
We introduce a semi-automated data synthesis framework designed for optimization modeling issues, named OR-Instruct. We train various open-source LLMs with a capacity of 7 billion parameters (dubbed ORLMs) The resulting model demonstrates significantly enhanced optimization modeling capabilities, achieving state-of-the-art performance across the NL4OPT, MAMO, and IndustryOR benchmarks.
arXiv Detail & Related papers (2024-05-28T01:55:35Z)
LLaMoCo: Instruction Tuning of Large Language Models for Optimization Code Generation [26.975412742800614]
We introduce LLaMoCo, the first instruction-tuning framework designed to adapt large language models for solving optimization problems in a code-to-code manner. Specifically, we establish a comprehensive instruction set containing well-described problem prompts and effective optimization codes. Experiment results demonstrate that a CodeGen (350M) model fine-tuned by our LLaMoCo achieves superior optimization performance compared to GPT-4 Turbo.
arXiv Detail & Related papers (2024-03-02T08:21:59Z)
CoLLiE: Collaborative Training of Large Language Models in an Efficient Way [59.09824823710863]
CoLLiE is an efficient library that facilitates collaborative training of large language models. With its modular design and comprehensive functionality, CoLLiE offers a balanced blend of efficiency, ease of use, and customization.
arXiv Detail & Related papers (2023-12-01T08:02:16Z)
Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs [66.30706841821123]
Large language models (LLMs) power many state-of-the-art systems in natural language processing. LLMs are extremely computationally expensive, even at inference time. We propose a new metric for comparing inference efficiency across models.
arXiv Detail & Related papers (2023-05-03T21:51:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.