Related papers: POP: Prior-fitted Optimizer Policies

POP: Prior-fitted Optimizer Policies

URL: http://arxiv.org/abs/2602.15473v1
Date: Tue, 17 Feb 2026 10:27:07 GMT
Title: POP: Prior-fitted Optimizer Policies
Authors: Jan Kobiolka, Christian Frey, Gresa Shala, Arlind Kadra, Erind Bedalli, Josif Grabocka,
Abstract summary: We introduce POP (Prior Policies Policies), a meta-learned model that predicts coordinate step-wise on contextual information.<n>Our model is learned on millions of synthetic optimization problems sampled from both nonfitted objectives.
Score: 20.784587787548436
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Optimization refers to the task of finding extrema of an objective function. Classical gradient-based optimizers are highly sensitive to hyperparameter choices. In highly non-convex settings their performance relies on carefully tuned learning rates, momentum, and gradient accumulation. To address these limitations, we introduce POP (Prior-fitted Optimizer Policies), a meta-learned optimizer that predicts coordinate-wise step sizes conditioned on the contextual information provided in the optimization trajectory. Our model is learned on millions of synthetic optimization problems sampled from a novel prior spanning both convex and non-convex objectives. We evaluate POP on an established benchmark including 47 optimization functions of various complexity, where it consistently outperforms first-order gradient-based methods, non-convex optimization approaches (e.g., evolutionary strategies), Bayesian optimization, and a recent meta-learned competitor under matched budget constraints. Our evaluation demonstrates strong generalization capabilities without task-specific tuning.

Related papers

Simulated Annealing-based Candidate Optimization for Batch Acquisition Functions [0.5196289708103362]
This paper presents a simulated annealing-based approach for candidate optimization in batch acquisition functions.<n>We evaluate our approach against Sequential Least Squares Programming (SLSQP) across four benchmark multi-objective optimization problems.
arXiv Detail & Related papers (2026-01-12T06:51:49Z)
Optimizing Optimizers for Fast Gradient-Based Learning [53.81268610971847]
We lay the theoretical foundation for automating design in gradient-based learning.<n>By treating gradient loss signals as a function that translates to parameter motions, the problem reduces to a family of convex optimization problems.
arXiv Detail & Related papers (2025-12-06T09:50:41Z)
Scalable Min-Max Optimization via Primal-Dual Exact Pareto Optimization [66.51747366239299]
We propose a smooth variant of the min-max problem based on the augmented Lagrangian.<n>The proposed algorithm scales better with the number of objectives than subgradient-based strategies.
arXiv Detail & Related papers (2025-03-16T11:05:51Z)
Optimizing Posterior Samples for Bayesian Optimization via Rootfinding [2.94944680995069]
We introduce an efficient global optimization strategy for posterior samples based on global rootfinding.<n>Remarkably, even with just one point from each set, the global optimum is discovered most of the time.<n>Our approach also improves the performance of other posterior sample-based acquisition functions, such as variants of entropy search.
arXiv Detail & Related papers (2024-10-29T17:57:16Z)
Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs. We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention. Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z)
An Empirical Evaluation of Zeroth-Order Optimization Methods on AI-driven Molecule Optimization [78.36413169647408]
We study the effectiveness of various ZO optimization methods for optimizing molecular objectives. We show the advantages of ZO sign-based gradient descent (ZO-signGD) We demonstrate the potential effectiveness of ZO optimization methods on widely used benchmark tasks from the Guacamol suite.
arXiv Detail & Related papers (2022-10-27T01:58:10Z)
Generalizing Bayesian Optimization with Decision-theoretic Entropies [102.82152945324381]
We consider a generalization of Shannon entropy from work in statistical decision theory. We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures. We then show how alternative choices for the loss yield a flexible family of acquisition functions.
arXiv Detail & Related papers (2022-10-04T04:43:58Z)
Optimistic Optimization of Gaussian Process Samples [30.226274682578172]
A competing, computationally more efficient, global optimization framework is optimistic optimization, which exploits prior knowledge about the geometry of the search space in form of a dissimilarity function. We argue that there is a new research domain between geometric and probabilistic search, i.e. methods that run drastically faster than traditional Bayesian optimization, while retaining some of the crucial functionality of Bayesian optimization.
arXiv Detail & Related papers (2022-09-02T09:06:24Z)
Online Calibrated and Conformal Prediction Improves Bayesian Optimization [10.470326550507117]
This paper studies which uncertainties are needed in model-based decision-making and in Bayesian optimization. Maintaining calibration, however, can be challenging when the data is non-stationary and depends on our actions. We propose using simple algorithms based on online learning to provably maintain calibration on non-i.i.d. data.
arXiv Detail & Related papers (2021-12-08T23:26:23Z)
BOSH: Bayesian Optimization by Sampling Hierarchically [10.10241176664951]
We propose a novel BO routine pairing a hierarchical Gaussian process with an information-theoretic framework to generate a growing pool of realizations. We demonstrate that BOSH provides more efficient and higher-precision optimization than standard BO across synthetic benchmarks, simulation optimization, reinforcement learning and hyper- parameter tuning tasks.
arXiv Detail & Related papers (2020-07-02T07:35:49Z)
A Primer on Zeroth-Order Optimization in Signal Processing and Machine Learning [95.85269649177336]
ZO optimization iteratively performs three major steps: gradient estimation, descent direction, and solution update. We demonstrate promising applications of ZO optimization, such as evaluating and generating explanations from black-box deep learning models, and efficient online sensor management.
arXiv Detail & Related papers (2020-06-11T06:50:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.