Related papers: Non-convex entropic mean-field optimization via Best Response flow

Non-convex entropic mean-field optimization via Best Response flow

URL: http://arxiv.org/abs/2505.22760v1
Date: Wed, 28 May 2025 18:22:08 GMT
Title: Non-convex entropic mean-field optimization via Best Response flow
Authors: Razvan-Andrei Lascu, Mateusz B. Majka,
Abstract summary: We discuss the problem of minimizing non- functionals on the space probability measures, regularized by the relative entropy (KL) with respect to a fixed reference measure.<n>We show how to choose the regularizer, given the non functional, so that the Best Response becomes a contraction with respect to the $L1$Wasserstein distance.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the problem of minimizing non-convex functionals on the space of probability measures, regularized by the relative entropy (KL divergence) with respect to a fixed reference measure, as well as the corresponding problem of solving entropy-regularized non-convex-non-concave min-max problems. We utilize the Best Response flow (also known in the literature as the fictitious play flow) and study how its convergence is influenced by the relation between the degree of non-convexity of the functional under consideration, the regularization parameter and the tail behaviour of the reference measure. In particular, we demonstrate how to choose the regularizer, given the non-convex functional, so that the Best Response operator becomes a contraction with respect to the $L^1$-Wasserstein distance, which then ensures the existence of its unique fixed point, which is then shown to be the unique global minimizer for our optimization problem. This extends recent results where the Best Response flow was applied to solve convex optimization problems regularized by the relative entropy with respect to arbitrary reference measures, and with arbitrary values of the regularization parameter. Our results explain precisely how the assumption of convexity can be relaxed, at the expense of making a specific choice of the regularizer. Additionally, we demonstrate how these results can be applied in reinforcement learning in the context of policy optimization for Markov Decision Processes and Markov games with softmax parametrized policies in the mean-field regime.

Related papers

A Novel Unified Parametric Assumption for Nonconvex Optimization [53.943470475510196]
Non optimization is central to machine learning, but the general framework non convexity enables weak convergence guarantees too pessimistic compared to the other hand.<n>We introduce a novel unified assumption in non convex algorithms.
arXiv Detail & Related papers (2025-02-17T21:25:31Z)
Optimal Rates of Convergence for Entropy Regularization in Discounted Markov Decision Processes [4.714840786221651]
We study the error introduced by entropy regularization in Markov decision processes.<n>We show that this error decreases exponentially in the inverse regularization strength.<n>We extend our analysis to settings beyond the entropy.
arXiv Detail & Related papers (2024-06-06T15:20:37Z)
Randomized algorithms and PAC bounds for inverse reinforcement learning in continuous spaces [47.907236421762626]
This work studies discrete-time discounted Markov decision processes with continuous state and action spaces. We first consider the case in which we have access to the entire expert policy and characterize the set of solutions to the inverse problem.
arXiv Detail & Related papers (2024-05-24T12:53:07Z)
Soft Quantization using Entropic Regularization [0.0]
We investigate the properties and robustness of the entropy-regularized quantization problem. The proposed approximation technique naturally adopts the softmin function. We implement a gradient approach to achieve the optimal solutions.
arXiv Detail & Related papers (2023-09-08T16:41:26Z)
Generalizing Bayesian Optimization with Decision-theoretic Entropies [102.82152945324381]
We consider a generalization of Shannon entropy from work in statistical decision theory. We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures. We then show how alternative choices for the loss yield a flexible family of acquisition functions.
arXiv Detail & Related papers (2022-10-04T04:43:58Z)
Non-Convex Optimization with Certificates and Fast Rates Through Kernel Sums of Squares [68.8204255655161]
We consider potentially non- optimized approximation problems. In this paper, we propose an algorithm that achieves close to optimal a priori computational guarantees.
arXiv Detail & Related papers (2022-04-11T09:37:04Z)
Integrated Conditional Estimation-Optimization [6.037383467521294]
Many real-world optimization problems uncertain parameters with probability can be estimated using contextual feature information. In contrast to the standard approach of estimating the distribution of uncertain parameters, we propose an integrated conditional estimation approach. We show that our ICEO approach is theally consistent under moderate conditions.
arXiv Detail & Related papers (2021-10-24T04:49:35Z)
Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints. A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual. This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z)
The Geometry of Memoryless Stochastic Policy Optimization in Infinite-Horizon POMDPs [0.0]
We consider the problem of finding the best memoryless policy for an infinite-horizon partially observable decision process. We show that the discounted state-action frequencies and the expected cumulative reward are the functions of the policy, whereby the degree is determined by the degree of partial observability.
arXiv Detail & Related papers (2021-10-14T14:42:09Z)
Optimal Rates for Random Order Online Optimization [60.011653053877126]
We study the citetgarber 2020online, where the loss functions may be chosen by an adversary, but are then presented online in a uniformly random order. We show that citetgarber 2020online algorithms achieve the optimal bounds and significantly improve their stability.
arXiv Detail & Related papers (2021-06-29T09:48:46Z)
Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation [30.02577720946978]
We establish finite-time convergence analyses of entropy-regularized NPG with linear function approximation. We prove that entropy-regularized NPG exhibits emphlinear convergence up to a function approximation error.
arXiv Detail & Related papers (2021-06-08T04:30:39Z)
Convergence of adaptive algorithms for weakly convex constrained optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope. Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.