Learning The Minimum Action Distance
- URL: http://arxiv.org/abs/2506.09276v1
- Date: Tue, 10 Jun 2025 22:27:11 GMT
- Title: Learning The Minimum Action Distance
- Authors: Lorenzo Steccanella, Joshua B. Evans, Özgür Şimşek, Anders Jonsson,
- Abstract summary: This paper presents a state representation framework for Markov decision processes (MDPs) that can be learned solely from state trajectories.<n>We propose learning the minimum action distance (MAD) as a fundamental metric that captures the underlying structure of an environment.
- Score: 6.232804902200881
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a state representation framework for Markov decision processes (MDPs) that can be learned solely from state trajectories, requiring neither reward signals nor the actions executed by the agent. We propose learning the minimum action distance (MAD), defined as the minimum number of actions required to transition between states, as a fundamental metric that captures the underlying structure of an environment. MAD naturally enables critical downstream tasks such as goal-conditioned reinforcement learning and reward shaping by providing a dense, geometrically meaningful measure of progress. Our self-supervised learning approach constructs an embedding space where the distances between embedded state pairs correspond to their MAD, accommodating both symmetric and asymmetric approximations. We evaluate the framework on a comprehensive suite of environments with known MAD values, encompassing both deterministic and stochastic dynamics, as well as discrete and continuous state spaces, and environments with noisy observations. Empirical results demonstrate that the proposed approach not only efficiently learns accurate MAD representations across these diverse settings but also significantly outperforms existing state representation methods in terms of representation quality.
Related papers
- Policy Gradient Methods in the Presence of Symmetries and State
Abstractions [46.66541516203923]
Reinforcement learning (RL) on high-dimensional and complex problems relies on abstraction for improved efficiency and generalization.
We study abstraction in the continuous-control setting, and extend the definition of Markov decision process (MDP) homomorphisms to the setting of continuous state and action spaces.
We propose a family of actor-critic algorithms that are able to learn the policy and the MDP homomorphism map simultaneously.
arXiv Detail & Related papers (2023-05-09T17:59:10Z) - Distributed Bayesian Learning of Dynamic States [65.7870637855531]
The proposed algorithm is a distributed Bayesian filtering task for finite-state hidden Markov models.
It can be used for sequential state estimation, as well as for modeling opinion formation over social networks under dynamic environments.
arXiv Detail & Related papers (2022-12-05T19:40:17Z) - Using Forwards-Backwards Models to Approximate MDP Homomorphisms [11.020094184644789]
We propose a novel approach to constructing homomorphisms in discrete action spaces.
We use a learnt model of environment dynamics to infer which state-action pairs lead to the same state.
In MinAtar, we report an almost 4x improvement over a value-based off-policy baseline in the low sample limit.
arXiv Detail & Related papers (2022-09-14T00:38:12Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - The Geometry of Robust Value Functions [119.94715309072983]
We introduce a new perspective that enables us to characterize both the non-robust and robust value space.
We show that the robust value space is determined by a set conic hypersurfaces, each which contains the robust values of all policies that agree on one state.
arXiv Detail & Related papers (2022-01-30T22:12:17Z) - Towards Robust Bisimulation Metric Learning [3.42658286826597]
Bisimulation metrics offer one solution to representation learning problem.
We generalize value function approximation bounds for on-policy bisimulation metrics to non-optimal policies.
We find that these issues stem from an underconstrained dynamics model and an unstable dependence of the embedding norm on the reward signal.
arXiv Detail & Related papers (2021-10-27T00:32:07Z) - Machine-Learning-Derived Entanglement Witnesses [55.76279816849472]
We show a correspondence between linear support vector machines (SVMs) and entanglement witnesses.
We use this correspondence to generate entanglement witnesses for bipartite and tripartite qubit (and qudit) target entangled states.
arXiv Detail & Related papers (2021-07-05T22:28:02Z) - A Deep Reinforcement Learning Approach to Marginalized Importance
Sampling with the Successor Representation [61.740187363451746]
Marginalized importance sampling (MIS) measures the density ratio between the state-action occupancy of a target policy and that of a sampling distribution.
We bridge the gap between MIS and deep reinforcement learning by observing that the density ratio can be computed from the successor representation of the target policy.
We evaluate the empirical performance of our approach on a variety of challenging Atari and MuJoCo environments.
arXiv Detail & Related papers (2021-06-12T20:21:38Z) - Learning Markov State Abstractions for Deep Reinforcement Learning [17.34529517221924]
We introduce a novel set of conditions and prove that they are sufficient for learning a Markov abstract state representation.
We then describe a practical training procedure that combines inverse model estimation and temporal contrastive learning.
Our approach learns representations that capture the underlying structure of the domain and lead to improved sample efficiency.
arXiv Detail & Related papers (2021-06-08T14:12:36Z) - GELATO: Geometrically Enriched Latent Model for Offline Reinforcement
Learning [54.291331971813364]
offline reinforcement learning approaches can be divided into proximal and uncertainty-aware methods.
In this work, we demonstrate the benefit of combining the two in a latent variational model.
Our proposed metrics measure both the quality of out of distribution samples as well as the discrepancy of examples in the data.
arXiv Detail & Related papers (2021-02-22T19:42:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.