Structuring Value Representations via Geometric Coherence in Markov Decision Processes
- URL: http://arxiv.org/abs/2602.02978v1
- Date: Tue, 03 Feb 2026 01:35:58 GMT
- Title: Structuring Value Representations via Geometric Coherence in Markov Decision Processes
- Authors: Zuyuan Zhang, Zeyu Fang, Tian Lan,
- Abstract summary: We propose emphGCR-RL (Geometric Coherence Regularized Reinforcement Learning) that computes a sequence of super-poset refinements.<n>Two novel algorithms by Q-learning and by actor--critic are developed to efficiently realize these super-poset refinements.<n>We empirically evaluate GCR-RL in a range of tasks and demonstrate significant improvements in sample efficiency and stable performance over strong baselines.
- Score: 9.312400001335659
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Geometric properties can be leveraged to stabilize and speed reinforcement learning. Existing examples include encoding symmetry structure, geometry-aware data augmentation, and enforcing structural restrictions. In this paper, we take a novel view of RL through the lens of order theory and recast value function estimates into learning a desired poset (partially ordered set). We propose \emph{GCR-RL} (Geometric Coherence Regularized Reinforcement Learning) that computes a sequence of super-poset refinements -- by refining posets in previous steps and learning additional order relationships from temporal difference signals -- thus ensuring geometric coherence across the sequence of posets underpinning the learned value functions. Two novel algorithms by Q-learning and by actor--critic are developed to efficiently realize these super-poset refinements. Their theoretical properties and convergence rates are analyzed. We empirically evaluate GCR-RL in a range of tasks and demonstrate significant improvements in sample efficiency and stable performance over strong baselines.
Related papers
- ODELoRA: Training Low-Rank Adaptation by Solving Ordinary Differential Equations [54.886931928255564]
Low-rank adaptation (LoRA) has emerged as a widely adopted parameter-efficient fine-tuning method in deep transfer learning.<n>We propose a novel continuous-time optimization dynamic for LoRA factor matrices in the form of an ordinary differential equation (ODE)<n>We show that ODELoRA achieves stable feature learning, a property that is crucial for training deep neural networks at different scales of problem dimensionality.
arXiv Detail & Related papers (2026-02-07T10:19:36Z) - FISMO: Fisher-Structured Momentum-Orthogonalized Optimizer [30.184978506988767]
We introduce FISMO, which incorporates anisotropic neuralotropic geometry information through Fisher information geometry.<n> FISMO achieves superior efficiency and final performance compared to established baselines.
arXiv Detail & Related papers (2026-01-29T14:05:04Z) - Continuous-time reinforcement learning for optimal switching over multiple regimes [5.045537244224327]
This paper studies the continuous-time reinforcement learning (RL) for optimal switching problems across multiple regimes.<n>We establish the well-posedness of the associated system of Hamilton-Jacobi-Bellman equations and provide a characterization of the optimal policy.<n>A reinforcement learning algorithm is devised and implemented by invoking the policy evaluation based on the martingale characterization.
arXiv Detail & Related papers (2025-12-04T11:48:07Z) - Deep Unfolding: Recent Developments, Theory, and Design Guidelines [99.63555420898554]
This article provides a tutorial-style overview of deep unfolding, a framework that transforms optimization algorithms into structured, trainable ML architectures.<n>We review the foundations of optimization for inference and for learning, introduce four representative design paradigms for deep unfolding, and discuss the distinctive training schemes that arise from their iterative nature.
arXiv Detail & Related papers (2025-12-03T13:16:35Z) - Reinforcement Learning Using known Invariances [54.91261509214309]
This paper develops a theoretical framework for incorporating known group symmetries into kernel-based reinforcement learning.<n>We show that symmetry-aware RL achieves significantly better performance than their standard kernel counterparts.
arXiv Detail & Related papers (2025-11-05T13:56:14Z) - Random Sparse Lifts: Construction, Analysis and Convergence of finite sparse networks [17.487761710665968]
We present a framework to define a large class of neural networks for which, by construction, training by gradient flow provably reaches arbitrarily low loss when the number of parameters grows.
arXiv Detail & Related papers (2025-01-10T12:52:00Z) - Last-Iterate Convergence of Adaptive Riemannian Gradient Descent for Equilibrium Computation [52.73824786627612]
This paper establishes new convergence results for textitgeodesic strongly monotone games.<n>Our key result shows that RGD attains last-iterate linear convergence in a textitgeometry-agnostic fashion.<n>Overall, this paper presents the first geometry-agnostic last-iterate convergence analysis for games beyond the Euclidean settings.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning [73.80728148866906]
Quasimetric Reinforcement Learning (QRL) is a new RL method that utilizes quasimetric models to learn optimal value functions.
On offline and online goal-reaching benchmarks, QRL also demonstrates improved sample efficiency and performance.
arXiv Detail & Related papers (2023-04-03T17:59:58Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Group Equivariant Deep Reinforcement Learning [4.997686360064921]
We propose the use of Equivariant CNNs to train RL agents and study their inductive bias for transformation equivariant Q-value approximation.
We demonstrate that equivariant architectures can dramatically enhance the performance and sample efficiency of RL agents in a highly symmetric environment.
arXiv Detail & Related papers (2020-07-01T02:38:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.