Faster Fixed-Point Methods for Multichain MDPs
- URL: http://arxiv.org/abs/2506.20910v1
- Date: Thu, 26 Jun 2025 00:31:21 GMT
- Title: Faster Fixed-Point Methods for Multichain MDPs
- Authors: Matthew Zurek, Yudong Chen,
- Abstract summary: We study value-iteration (VI) algorithms for solving general (a.k.a. multichain) Markov decision processes (MDPs) under the average-reward criterion.
- Score: 6.996002801232415
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study value-iteration (VI) algorithms for solving general (a.k.a. multichain) Markov decision processes (MDPs) under the average-reward criterion, a fundamental but theoretically challenging setting. Beyond the difficulties inherent to all average-reward problems posed by the lack of contractivity and non-uniqueness of solutions to the Bellman operator, in the multichain setting an optimal policy must solve the navigation subproblem of steering towards the best connected component, in addition to optimizing long-run performance within each component. We develop algorithms which better solve this navigational subproblem in order to achieve faster convergence for multichain MDPs, obtaining improved rates of convergence and sharper measures of complexity relative to prior work. Many key components of our results are of potential independent interest, including novel connections between average-reward and discounted problems, optimal fixed-point methods for discounted VI which extend to general Banach spaces, new sublinear convergence rates for the discounted value error, and refined suboptimality decompositions for multichain MDPs. Overall our results yield faster convergence rates for discounted and average-reward problems and expand the theoretical foundations of VI approaches.
Related papers
- Preference Optimization for Combinatorial Optimization Problems [54.87466279363487]
Reinforcement Learning (RL) has emerged as a powerful tool for neural optimization, enabling models learns that solve complex problems without requiring expert knowledge.<n>Despite significant progress, existing RL approaches face challenges such as diminishing reward signals and inefficient exploration in vast action spaces.<n>We propose Preference Optimization, a novel method that transforms quantitative reward signals into qualitative preference signals via statistical comparison modeling.
arXiv Detail & Related papers (2025-05-13T16:47:00Z) - Deep Distributed Optimization for Large-Scale Quadratic Programming [15.773581194857085]
This paper introduces a novel deep learning-aided distributed optimization architecture designed for tackling large-scale Quadratic programming (QP) problems.<n>DeepDistributedQP demonstrates strong generalization by training on small problems and scaling to solve much larger ones (up to 50K variables and 150K constraints) using the same policy.
arXiv Detail & Related papers (2024-12-11T09:45:00Z) - On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes [11.868402302316131]
This paper analyzes reinforcement learning (RL) algorithms for Markov decision processes (MDPs) under the average-reward criterion.
We focus on Q-learning algorithms based on relative value (RVI), which are model-free sets of the iteration RVI method for weakly communicating MDPs.
arXiv Detail & Related papers (2024-08-29T04:57:44Z) - Lower Bounds and Optimal Algorithms for Non-Smooth Convex Decentralized Optimization over Time-Varying Networks [57.24087627267086]
We consider the task of minimizing the sum of convex functions stored in a decentralized manner across the nodes of a communication network.
Lower bounds on the number of decentralized communications and (sub)gradient computations required to solve the problem have been established.
We develop the first optimal algorithm that matches these lower bounds and offers substantially improved theoretical performance compared to the existing state of the art.
arXiv Detail & Related papers (2024-05-28T10:28:45Z) - Federated Multi-Level Optimization over Decentralized Networks [55.776919718214224]
We study the problem of distributed multi-level optimization over a network, where agents can only communicate with their immediate neighbors.
We propose a novel gossip-based distributed multi-level optimization algorithm that enables networked agents to solve optimization problems at different levels in a single timescale.
Our algorithm achieves optimal sample complexity, scaling linearly with the network size, and demonstrates state-of-the-art performance on various applications.
arXiv Detail & Related papers (2023-10-10T00:21:10Z) - Optimization and Optimizers for Adversarial Robustness [10.279287131070157]
In this paper, we introduce a novel framework that blends a general-purpose constrained-optimization solver with Constraint Folding.
Regarding reliability, PWCF provides solutions with stationarity measures and feasibility tests to assess the solution quality.
We further explore the distinct patterns in the solutions found for solving these problems using various combinations of losses, perturbation models, and optimization algorithms.
arXiv Detail & Related papers (2023-03-23T16:22:59Z) - Faster Last-iterate Convergence of Policy Optimization in Zero-Sum
Markov Games [63.60117916422867]
This paper focuses on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games.
We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method.
Our convergence results improve upon the best known complexities, and lead to a better understanding of policy optimization in competitive Markov games.
arXiv Detail & Related papers (2022-10-03T16:05:43Z) - Late Fusion Multi-view Clustering via Global and Local Alignment
Maximization [61.89218392703043]
Multi-view clustering (MVC) optimally integrates complementary information from different views to improve clustering performance.
Most of existing approaches directly fuse multiple pre-specified similarities to learn an optimal similarity matrix for clustering.
We propose late fusion MVC via alignment to address these issues.
arXiv Detail & Related papers (2022-08-02T01:49:31Z) - Permutation Invariant Policy Optimization for Mean-Field Multi-Agent
Reinforcement Learning: A Principled Approach [128.62787284435007]
We propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture.
We prove that MF-PPO attains the globally optimal policy at a sublinear rate of convergence.
In particular, we show that the inductive bias introduced by the permutation-invariant neural architecture enables MF-PPO to outperform existing competitors.
arXiv Detail & Related papers (2021-05-18T04:35:41Z) - Meta-learning based Alternating Minimization Algorithm for Non-convex
Optimization [9.774392581946108]
We propose a novel solution for challenging non-problems of multiple variables.
Our proposed approach is able to achieve effective iterations in cases while other methods would typically fail.
arXiv Detail & Related papers (2020-09-09T10:45:00Z) - FedSplit: An algorithmic framework for fast federated optimization [40.42352500741025]
We introduce FedSplit, a class of algorithms for solving distributed convex minimization with additive structure.
Our theory shows that these methods are provably robust to inexact computation of intermediate local quantities.
arXiv Detail & Related papers (2020-05-11T16:30:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.