MEMETRON: Metaheuristic Mechanisms for Test-time Response Optimization of Large Language Models
- URL: http://arxiv.org/abs/2506.08643v1
- Date: Tue, 10 Jun 2025 09:55:53 GMT
- Title: MEMETRON: Metaheuristic Mechanisms for Test-time Response Optimization of Large Language Models
- Authors: Son The Nguyen, Theja Tulabandhula,
- Abstract summary: Large language models (LLMs) are increasingly used for both open-ended and structured tasks.<n>We introduce MEMETRON, a task-agnostic framework that formulates LLM decoding as a discrete black-box optimization problem.<n>We evaluate our framework on the critical human preference alignment task and demonstrate that it significantly outperforms standard decoding and reranking methods.
- Score: 0.6926105253992517
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) are increasingly used for both open-ended and structured tasks, yet their inference-time behavior is still largely dictated by heuristic decoding strategies such as greedy search, sampling, or reranking. These methods provide limited control and do not explicitly optimize for task-specific objectives. We introduce MEMETRON, a task-agnostic framework that formulates LLM decoding as a discrete black-box optimization problem. MEMETRON leverages hybrid metaheuristic algorithms, GENETRON and ANNETRON, to search the response space, guided by reward models and contextual operations performed by the LLM itself. This approach enables efficient discovery of high-reward responses without requiring model retraining or gradient access. The framework is modular and generalizes across diverse tasks, requiring only a reward function and lightweight prompt templates. We evaluate our framework on the critical human preference alignment task and demonstrate that it significantly outperforms standard decoding and reranking methods, highlighting its potential to improve alignment without model retraining.
Related papers
- MOPrompt: Multi-objective Semantic Evolution for Prompt Optimization [0.0699049312989311]
MOPrompt is a novel framework designed to optimize prompts for both accuracy and context size (measured in tokens) simultaneously.<n>We evaluate MOPrompt on a sentiment analysis task in Portuguese, using Gemma-2B and Sabiazinho-3 as evaluation models.
arXiv Detail & Related papers (2025-08-03T01:50:43Z) - Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques [14.892995952768352]
Language Models (LMs) have excelled at tasks like text generation, summarization, and question answering.<n>Their inference remains computationally expensive and energy intensive in settings with limited hardware, power, or bandwidth.<n>Recent approaches have introduced multi LLM intelligent model selection strategies that dynamically allocate computational resources based on query complexity.
arXiv Detail & Related papers (2025-06-06T23:13:08Z) - Generalizable Heuristic Generation Through Large Language Models with Meta-Optimization [14.919482411153185]
Heuristic design with large language models (LLMs) has emerged as a promising approach for tackling optimization problems.<n>Existing approaches often rely on manually predefined evolutionary generalizations and single-task training schemes.<n>We propose Meta-Optimization of Heuristics (MoH), a novel framework that operates at the level of meta-learning.
arXiv Detail & Related papers (2025-05-27T08:26:27Z) - MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering [57.156093929365255]
Gym-style framework for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents.<n>MLE-Dojo covers diverse, open-ended MLE tasks carefully curated to reflect realistic engineering scenarios.<n>Its fully executable environment supports comprehensive agent training via both supervised fine-tuning and reinforcement learning.
arXiv Detail & Related papers (2025-05-12T17:35:43Z) - Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models.<n>Controlled Decoding provides a mechanism for aligning a model at inference time without retraining.<n>We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z) - A Survey on the Optimization of Large Language Model-based Agents [16.733092886211097]
Large Language Models (LLMs) have been widely adopted in various fields, becoming essential for autonomous decision-making and interactive tasks.<n>However, current work typically relies on prompt design or fine-tuning strategies applied to vanilla LLMs.<n>We provide a comprehensive review of LLM-based agent optimization approaches, categorizing them into parameter-driven and parameter-free methods.
arXiv Detail & Related papers (2025-03-16T10:09:10Z) - Towards more Contextual Agents: An extractor-Generator Optimization Framework [0.0]
Large Language Model (LLM)-based agents have demonstrated remarkable success in solving complex tasks across a wide range of general-purpose applications.<n>However, their performance often degrades in context-specific scenarios, such as specialized industries or research domains.<n>To address this challenge, our work introduces a systematic approach to enhance the contextual adaptability of LLM-based agents.
arXiv Detail & Related papers (2025-02-18T15:07:06Z) - In-context Demonstration Matters: On Prompt Optimization for Pseudo-Supervision Refinement [71.60563181678323]
Large language models (LLMs) have achieved great success across diverse tasks, and fine-tuning is sometimes needed to further enhance generation quality.<n>To handle these challenges, a direct solution is to generate high-confidence'' data from unsupervised downstream tasks.<n>We propose a novel approach, pseudo-supervised demonstrations aligned prompt optimization (PAPO) algorithm, which jointly refines both the prompt and the overall pseudo-supervision.
arXiv Detail & Related papers (2024-10-04T03:39:28Z) - Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
In-Context Learning (ICL) and.
Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting.
LLMs to downstream tasks.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - Self-Exploring Language Models: Active Preference Elicitation for Online Alignment [88.56809269990625]
We propose a bilevel objective optimistically biased towards potentially high-reward responses to actively explore out-of-distribution regions.
Our experimental results demonstrate that when fine-tuned on Zephyr-7B-SFT and Llama-3-8B-Instruct models, Self-Exploring Language Models (SELM) significantly boosts the performance on instruction-following benchmarks.
arXiv Detail & Related papers (2024-05-29T17:59:07Z) - Towards Generalist Prompting for Large Language Models by Mental Models [105.03747314550591]
Large language models (LLMs) have demonstrated impressive performance on many tasks.
To achieve optimal performance, specially designed prompting methods are still needed.
We introduce the concept of generalist prompting, which operates on the design principle of achieving optimal or near-optimal performance.
arXiv Detail & Related papers (2024-02-28T11:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.