Related papers: Monte Carlo Policy Gradient Method for Binary Optimization

Monte Carlo Policy Gradient Method for Binary Optimization

URL: http://arxiv.org/abs/2307.00783v1
Date: Mon, 3 Jul 2023 07:01:42 GMT
Title: Monte Carlo Policy Gradient Method for Binary Optimization
Authors: Cheng Chen, Ruitao Chen, Tianyou Li, Ruichen Ao and Zaiwen Wen
Abstract summary: We develop a novel probabilistic model to sample the binary solution according to a parameterized policy distribution. For coherent exploration in discrete spaces, parallel Markov Chain Monte Carlo (MCMC) methods are employed. Convergence to stationary points in expectation of the policy gradient method is established.
Score: 3.742634130733923
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Binary optimization has a wide range of applications in combinatorial optimization problems such as MaxCut, MIMO detection, and MaxSAT. However, these problems are typically NP-hard due to the binary constraints. We develop a novel probabilistic model to sample the binary solution according to a parameterized policy distribution. Specifically, minimizing the KL divergence between the parameterized policy distribution and the Gibbs distributions of the function value leads to a stochastic optimization problem whose policy gradient can be derived explicitly similar to reinforcement learning. For coherent exploration in discrete spaces, parallel Markov Chain Monte Carlo (MCMC) methods are employed to sample from the policy distribution with diversity and approximate the gradient efficiently. We further develop a filter scheme to replace the original objective function by the one with the local search technique to broaden the horizon of the function landscape. Convergence to stationary points in expectation of the policy gradient method is established based on the concentration inequality for MCMC. Numerical results show that this framework is very promising to provide near-optimal solutions for quite a few binary optimization problems.

Related papers

Mirror Descent Policy Optimisation for Robust Constrained Markov Decision Processes [8.735525389833013]
This paper presents mirror descent policy optimisation for robust constrained Markov decision processes (RCMDPs)<n>We make use of policy gradient techniques to optimise both the policy (as a maximiser) and the transition kernel (as an adversarial minimiser) on the Lagrangian representing a constrained MDP.<n>Experiments confirm the benefits of mirror descent policy optimisation in constrained and unconstrained optimisation, and significant improvements are observed in robustness tests.
arXiv Detail & Related papers (2025-06-29T09:55:52Z)
Steering Large Agent Populations using Mean-Field Schrodinger Bridges with Gaussian Mixture Models [13.03355083378673]
Mean-Field Schrodinger Bridge (MFSB) problem is an optimization problem aiming to find the minimum effort control policy. In the context of multiagent control, the objective is to control the configuration of a swarm of identical, interacting cooperative agents.
arXiv Detail & Related papers (2025-03-31T04:01:04Z)
Landscape of Policy Optimization for Finite Horizon MDPs with General State and Action [10.219627570276689]
We develop a framework for a class of Markov Decision Processes with general state and spaces. We show that gradient methods converge to the globally optimal policy with a nonasymptomatic condition. Our result establishes first complexity for multi-period inventory systems.
arXiv Detail & Related papers (2024-09-25T17:56:02Z)
Diffusion Stochastic Optimization for Min-Max Problems [33.73046548872663]
The optimistic gradient method is useful in addressing minimax optimization problems. Motivated by the observation that the conventional version suffers from the need for a large batch size, we introduce and analyze a new formulation termed Samevareps-generativeOGOG.
arXiv Detail & Related papers (2024-01-26T01:16:59Z)
Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods. Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem. As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z)
Moreau Envelope ADMM for Decentralized Weakly Convex Optimization [55.2289666758254]
This paper proposes a proximal variant of the alternating direction method of multipliers (ADMM) for distributed optimization. The results of our numerical experiments indicate that our method is faster and more robust than widely-used approaches.
arXiv Detail & Related papers (2023-08-31T14:16:30Z)
Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs [107.28031292946774]
We study the problem of computing an optimal policy of an infinite-horizon discounted Markov decision process (constrained MDP) We develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy. To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs.
arXiv Detail & Related papers (2023-06-20T17:27:31Z)
Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies [115.86431674214282]
We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class. We show that both methods attain linear convergence rates and $mathcalO (1/epsilon2)$ sample complexities using a simple, non-adaptive geometrically increasing step size.
arXiv Detail & Related papers (2022-10-04T06:17:52Z)
Distributed Policy Gradient with Variance Reduction in Multi-Agent Reinforcement Learning [7.4447396913959185]
This paper studies a distributed policy gradient in collaborative multi-agent reinforcement learning (MARL) Agents over a communication network aim to find the optimal policy to maximize the average of all agents' local returns.
arXiv Detail & Related papers (2021-11-25T08:07:30Z)
Efficient semidefinite bounds for multi-label discrete graphical models [6.226454551201676]
One of the main queries on such models is to identify the SDPWCSP Function on Cost of a Posteri (MAP) Networks. We consider a traditional dualized constraint approach and a dedicated dedicated SDP/Monteiro style method based on row-by-row updates.
arXiv Detail & Related papers (2021-11-24T13:38:34Z)
Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs [113.8752163061151]
We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision processes (MDPs) We propose the underlineperiodically underlinerestarted underlineoptimistic underlinepolicy underlineoptimization algorithm (PROPO) PROPO features two mechanisms: sliding-window-based policy evaluation and periodic-restart-based policy improvement.
arXiv Detail & Related papers (2021-10-18T02:33:20Z)
Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm [50.50545326342971]
We formulate the problem of maximizing a non-linear concave function of multiple long-term objectives.<n>A policy-gradient based model-free algorithm is proposed for the problem.<n>The proposed algorithm is shown to achieve convergence to within an $epsilon$ of the global optima.
arXiv Detail & Related papers (2021-05-28T22:20:54Z)
Parallel Stochastic Mirror Descent for MDPs [72.75921150912556]
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs) Some variant of Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals. We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method.
arXiv Detail & Related papers (2021-02-27T19:28:39Z)
Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon [3.867363075280544]
We explore reinforcement learning methods for finding the optimal policy in the linear quadratic regulator (LQR) problem. We produce a global linear convergence guarantee for the setting of finite time horizon and state dynamics under weak assumptions. We show results for the case where we assume a model for the underlying dynamics and where we apply the method to the data directly.
arXiv Detail & Related papers (2020-11-20T09:51:49Z)
GACEM: Generalized Autoregressive Cross Entropy Method for Multi-Modal Black Box Constraint Satisfaction [69.94831587339539]
We present a modified Cross-Entropy Method (CEM) that uses a masked auto-regressive neural network for modeling uniform distributions over the solution space. Our algorithm is able to express complicated solution spaces, thus allowing it to track a variety of different solution regions.
arXiv Detail & Related papers (2020-02-17T20:21:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.