Related papers: Direct Behavior Optimization: Unlocking the Potential of Lightweight LLMs

Direct Behavior Optimization: Unlocking the Potential of Lightweight LLMs

URL: http://arxiv.org/abs/2506.06401v1
Date: Fri, 06 Jun 2025 02:40:42 GMT
Title: Direct Behavior Optimization: Unlocking the Potential of Lightweight LLMs
Authors: Hongming Yang, Shi Lin, Jun Shao, Changting Lin, Donghai Zhu, Meng Han, Qinglei Kong,
Abstract summary: DeBoP is an automatic optimization method, which focuses on the optimization directly on the behavior of LwLLMs.<n>We evaluate DeBoP on seven challenging tasks where state-of-the-art LLMs excel but LwLLMs generally underperform.<n>DeBoP-optimized LwLLMs surpass GPT-3.5 on most tasks while reducing computational time by approximately 60%.
Score: 9.085280547983091
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Lightweight Large Language Models (LwLLMs) are reduced-parameter, optimized models designed to run efficiently on consumer-grade hardware, offering significant advantages in resource efficiency, cost-effectiveness, and data privacy. However, these models often struggle with limited inference and reasoning capabilities, which restrict their performance on complex tasks and limit their practical applicability. Moreover, existing prompt optimization methods typically rely on extensive manual effort or the meta-cognitive abilities of state-of-the-art LLMs, making them less effective for LwLLMs. To address these challenges, we introduce DeBoP, a new Direct Behavior Optimization Paradigm, original from the Chain-of-Thought (CoT) prompting technique. Unlike CoT Prompting, DeBoP is an automatic optimization method, which focuses on the optimization directly on the behavior of LwLLMs. In particular, DeBoP transforms the optimization of complex prompts into the optimization of discrete, quantifiable execution sequences using a gradient-free Monte Carlo Tree Search. We evaluate DeBoP on seven challenging tasks where state-of-the-art LLMs excel but LwLLMs generally underperform. Experimental results demonstrate that DeBoP significantly outperforms recent prompt optimization methods on most tasks. In particular, DeBoP-optimized LwLLMs surpass GPT-3.5 on most tasks while reducing computational time by approximately 60% compared to other automatic prompt optimization methods.

Related papers

ORPP: Self-Optimizing Role-playing Prompts to Enhance Language Model Capabilities [64.24517317344959]
High-quality prompts are crucial for eliciting outstanding performance from large language models on complex tasks.<n>We propose ORPP, a framework that enhances model performance by optimizing and generating role-playing prompts.<n>We show that ORPP not only matches but in most cases surpasses existing mainstream prompt optimization methods in terms of performance.
arXiv Detail & Related papers (2025-06-03T05:51:35Z)
CAPO: Cost-Aware Prompt Optimization [3.0290544952776854]
Large language models (LLMs) have revolutionized natural language processing by solving a wide range of tasks simply guided by a prompt.<n>We introduce CAPO, an algorithm that enhances prompt optimization efficiency by integrating AutoML techniques.<n>Our experiments demonstrate that CAPO outperforms state-of-the-art discrete prompt optimization methods in 11/15 cases with improvements up to 21%p in accuracy.
arXiv Detail & Related papers (2025-04-22T16:14:31Z)
GReaTer: Gradients over Reasoning Makes Smaller Language Models Strong Prompt Optimizers [52.17222304851524]
We introduce GReaTer, a novel prompt optimization technique that directly incorporates gradient information over task-specific reasoning.<n>By utilizing task loss gradients, GReaTer enables self-optimization of prompts for open-source, lightweight language models.<n> GReaTer consistently outperforms previous state-of-the-art prompt optimization methods.
arXiv Detail & Related papers (2024-12-12T20:59:43Z)
Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System [75.25394449773052]
Large Language Model (LLM) based multi-agent systems (MAS) show remarkable potential in collaborative problem-solving.<n>Yet they still face critical challenges: low communication efficiency, poor scalability, and a lack of effective parameter-updating optimization methods.<n>We present Optima, a novel framework that addresses these issues by significantly enhancing both communication efficiency and task effectiveness.
arXiv Detail & Related papers (2024-10-10T17:00:06Z)
Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs. We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention. Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z)
Revisiting OPRO: The Limitations of Small-Scale LLMs as Optimizers [15.809293135844756]
We revisit OPRO for automated prompting with relatively small-scale Language Models (LLMs) OPRO shows limited effectiveness in small-scale LLMs, with limited inference capabilities constraining optimization ability. We suggest future automatic prompting engineering to consider both model capabilities and computational costs.
arXiv Detail & Related papers (2024-05-16T17:33:50Z)
Pretrained Optimization Model for Zero-Shot Black Box Optimization [16.391389860521134]
We propose a Pretrained Optimization Model (POM) that leverages knowledge gained from optimizing diverse tasks.<n>POM offers efficient solutions to zero-shot optimization through direct application or fine-tuning with few-shot samples.<n>Fine-tuning POM with a small number of samples and budget yields significant performance improvements.
arXiv Detail & Related papers (2024-05-06T09:11:49Z)
Are Large Language Models Good Prompt Optimizers? [65.48910201816223]
We conduct a study to uncover the actual mechanism of LLM-based Prompt Optimization. Our findings reveal that the LLMs struggle to identify the true causes of errors during reflection, tending to be biased by their own prior knowledge. We introduce a new "Automatic Behavior Optimization" paradigm, which directly optimize the target model's behavior in a more controllable manner.
arXiv Detail & Related papers (2024-02-03T09:48:54Z)
Large Language Models as Optimizers [106.52386531624532]
We propose Optimization by PROmpting (OPRO), a simple and effective approach to leverage large language models (LLMs) as prompts. In each optimization step, the LLM generates new solutions from the prompt that contains previously generated solutions with their values. We demonstrate that the best prompts optimized by OPRO outperform human-designed prompts by up to 8% on GSM8K, and by up to 50% on Big-Bench Hard tasks.
arXiv Detail & Related papers (2023-09-07T00:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.