READY: Reward Discovery for Meta-Black-Box Optimization
- URL: http://arxiv.org/abs/2601.21847v1
- Date: Thu, 29 Jan 2026 15:23:18 GMT
- Title: READY: Reward Discovery for Meta-Black-Box Optimization
- Authors: Zechuan Huang, Zhiguang Cao, Hongshu Guo, Yue-Jiao Gong, Zeyuan Ma,
- Abstract summary: We use Large Language Model(LLM) as an automated reward discovery tool for MetaBBO.<n>We additionally introduce multi-task evolution architecture to support parallel reward discovery for diverse MetaBBO approaches.
- Score: 38.27552012808326
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Meta-Black-Box Optimization (MetaBBO) is an emerging avenue within Optimization community, where algorithm design policy could be meta-learned by reinforcement learning to enhance optimization performance. So far, the reward functions in existing MetaBBO works are designed by human experts, introducing certain design bias and risks of reward hacking. In this paper, we use Large Language Model~(LLM) as an automated reward discovery tool for MetaBBO. Specifically, we consider both effectiveness and efficiency sides. On effectiveness side, we borrow the idea of evolution of heuristics, introducing tailored evolution paradigm in the iterative LLM-based program search process, which ensures continuous improvement. On efficiency side, we additionally introduce multi-task evolution architecture to support parallel reward discovery for diverse MetaBBO approaches. Such parallel process also benefits from knowledge sharing across tasks to accelerate convergence. Empirical results demonstrate that the reward functions discovered by our approach could be helpful for boosting existing MetaBBO works, underscoring the importance of reward design in MetaBBO. We provide READY's project at https://anonymous.4open.science/r/ICML_READY-747F.
Related papers
- Task-free Adaptive Meta Black-box Optimization [55.461814601130044]
We propose the Adaptive meta Black-box Optimization Model (ABOM), which performs online parameter adaptation using solely optimization data from the target task.<n>Unlike conventional metaBBO frameworks that decouple meta-training and optimization phases, ABOM introduces a closed-loop parameter learning mechanism, where parameterized evolutionary operators continuously self-update.<n>This paradigm shift enables zero-shot optimization: ABOM competitive performance on synthetic BBO benchmarks and realistic unmanned aerial vehicle path planning problems without any handcrafted training tasks.
arXiv Detail & Related papers (2026-01-29T09:54:10Z) - Differentiable Evolutionary Reinforcement Learning [41.96953381133274]
We propose Differentiable Evolutionary Reinforcement Learning (DERL), a bilevel framework that enables the autonomous discovery of optimal reward signals.<n>DERL is differentiable in its metaoptimization: it treats the inner-loop validation performance as a signal to update the Meta-r via reinforcement learning.<n> Experimental results show that DERL achieves state-of-the-art performance on ALFWorld and ScienceWorld, significantly outperforming methods relying on rewards.
arXiv Detail & Related papers (2025-12-15T14:50:08Z) - MetaBox-v2: A Unified Benchmark Platform for Meta-Black-Box Optimization [22.090002096198514]
We introduce MetaBox-v2 as a milestone upgrade with four novel features.<n>A comprehensive benchmark suite of $18$ synthetic/realistic tasks spanning single-objective, multi-objective, multi-task optimization scenarios.<n> Valuable insights are concluded from thorough and detailed analysis for practitioners and those new to the field.
arXiv Detail & Related papers (2025-05-23T11:13:10Z) - Reinforcement Learning-based Self-adaptive Differential Evolution through Automated Landscape Feature Learning [7.765689048808507]
This paper introduces a novel MetaBBO method that supports automated feature learning during the meta-learning process.<n>We design an attention-based neural network with mantissa-exponent based embedding to transform the solution populations.<n>We also incorporate a comprehensive algorithm configuration space including diverse DE operators into a reinforcement learning-aided DAC paradigm.
arXiv Detail & Related papers (2025-03-23T13:07:57Z) - Toward Automated Algorithm Design: A Survey and Practical Guide to Meta-Black-Box-Optimization [22.902923118981857]
We introduce Meta-Black-Box-Optimization(MetaBBO) as an emerging avenue within the Evolutionary Computation(EC) community.<n>Despite the success of MetaBBO, the current literature provides insufficient summaries of its key aspects and lacks practical guidance for implementation.
arXiv Detail & Related papers (2024-11-01T14:32:19Z) - Neural Exploratory Landscape Analysis for Meta-Black-Box-Optimization [12.6318861144205]
This paper proposes a novel framework that dynamically profiles landscape features through a two-stage, attention-based neural network.<n>NeurELA is pre-trained over a variety of MetaBBO algorithms using a multi-task neuroevolution strategy.<n>Experiments show that NeurELA achieves consistently superior performance when integrated into different and even unseen MetaBBO tasks.
arXiv Detail & Related papers (2024-08-20T09:17:11Z) - Reinforced In-Context Black-Box Optimization [64.25546325063272]
RIBBO is a method to reinforce-learn a BBO algorithm from offline data in an end-to-end fashion.
RIBBO employs expressive sequence models to learn the optimization histories produced by multiple behavior algorithms and tasks.
Central to our method is to augment the optimization histories with textitregret-to-go tokens, which are designed to represent the performance of an algorithm based on cumulative regret over the future part of the histories.
arXiv Detail & Related papers (2024-02-27T11:32:14Z) - REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and human preferences can lead to catastrophic outcomes in the real world.<n>Recent methods aim to mitigate misalignment by learning reward functions from human preferences.<n>We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - Bootstrapped Meta-Learning [48.017607959109924]
We propose an algorithm that tackles a challenging meta-optimisation problem by letting the meta-learner teach itself.
The algorithm first bootstraps a target from the meta-learner, then optimises the meta-learner by minimising the distance to that target under a chosen (pseudo-)metric.
We achieve a new state-of-the art for model-free agents on the Atari ALE benchmark, improve upon MAML in few-shot learning, and demonstrate how our approach opens up new possibilities.
arXiv Detail & Related papers (2021-09-09T18:29:05Z) - Meta-Learning with Neural Tangent Kernels [58.06951624702086]
We propose the first meta-learning paradigm in the Reproducing Kernel Hilbert Space (RKHS) induced by the meta-model's Neural Tangent Kernel (NTK)
Within this paradigm, we introduce two meta-learning algorithms, which no longer need a sub-optimal iterative inner-loop adaptation as in the MAML framework.
We achieve this goal by 1) replacing the adaptation with a fast-adaptive regularizer in the RKHS; and 2) solving the adaptation analytically based on the NTK theory.
arXiv Detail & Related papers (2021-02-07T20:53:23Z) - HMRL: Hyper-Meta Learning for Sparse Reward Reinforcement Learning
Problem [107.52043871875898]
We develop a novel meta reinforcement learning framework called Hyper-Meta RL(HMRL) for sparse reward RL problems.
It is consisted with three modules including the cross-environment meta state embedding module which constructs a common meta state space to adapt to different environments.
Experiments with sparse-reward environments show the superiority of HMRL on both transferability and policy learning efficiency.
arXiv Detail & Related papers (2020-02-11T07:31:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.