Related papers: Offline Action-Free Learning of Ex-BMDPs by Comparing Diverse Datasets

Offline Action-Free Learning of Ex-BMDPs by Comparing Diverse Datasets

URL: http://arxiv.org/abs/2503.21018v1
Date: Wed, 26 Mar 2025 22:05:57 GMT
Title: Offline Action-Free Learning of Ex-BMDPs by Comparing Diverse Datasets
Authors: Alexander Levine, Peter Stone, Amy Zhang,
Abstract summary: This paper introduces CRAFT, a sample-efficient algorithm leveraging differences in controllable feature dynamics across agents to learn representations.<n>We provide theoretical guarantees for CRAFT's performance and demonstrate its feasibility on a toy example.
Score: 87.62730694973696
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While sequential decision-making environments often involve high-dimensional observations, not all features of these observations are relevant for control. In particular, the observation space may capture factors of the environment which are not controllable by the agent, but which add complexity to the observation space. The need to ignore these "noise" features in order to operate in a tractably-small state space poses a challenge for efficient policy learning. Due to the abundance of video data available in many such environments, task-independent representation learning from action-free offline data offers an attractive solution. However, recent work has highlighted theoretical limitations in action-free learning under the Exogenous Block MDP (Ex-BMDP) model, where temporally-correlated noise features are present in the observations. To address these limitations, we identify a realistic setting where representation learning in Ex-BMDPs becomes tractable: when action-free video data from multiple agents with differing policies are available. Concretely, this paper introduces CRAFT (Comparison-based Representations from Action-Free Trajectories), a sample-efficient algorithm leveraging differences in controllable feature dynamics across agents to learn representations. We provide theoretical guarantees for CRAFT's performance and demonstrate its feasibility on a toy example, offering a foundation for practical methods in similar settings.

Related papers

Stochastic Encodings for Active Feature Acquisition [100.47043816019888]
Active Feature Acquisition is an instance-wise, sequential decision making problem.<n>The aim is to dynamically select which feature to measure based on current observations, independently for each test instance.<n>Common approaches either use Reinforcement Learning, which experiences training difficulties, or greedily maximize the conditional mutual information of the label and unobserved features, which makes myopic.<n>We introduce a latent variable model, trained in a supervised manner. Acquisitions are made by reasoning about the features across many possible unobserved realizations in a latent space.
arXiv Detail & Related papers (2025-08-03T23:48:46Z)
Value from Observations: Towards Large-Scale Imitation Learning via Self-Improvement [19.883973457999282]
Imitation Learning from Observation (IfO) offers a powerful way to learn behaviors at large-scale.<n>This paper investigates idealized scenarios with mostly bimodal-quality data distributions and introduces a method to learn from such data.<n>Our method adapts RL-based imitation learning to action-free demonstrations, using a value function to transfer information between expert and non-expert data.
arXiv Detail & Related papers (2025-07-09T09:55:23Z)
Learning a Fast Mixing Exogenous Block MDP using a Single Trajectory [87.62730694973696]
STEEL is the first provably sample-efficient algorithm for learning the controllable dynamics of an Exogenous Block Markov Decision Process from a single trajectory. We prove that STEEL is correct and sample-efficient, and demonstrate STEEL on two toy problems.
arXiv Detail & Related papers (2024-10-03T21:57:21Z)
Learning Action-based Representations Using Invariance [18.1941237781348]
We introduce action-bisimulation encoding, which learns a multi-step controllability metric that discounts distant state features that are relevant for control. We demonstrate that action-bisimulation pretraining on reward-free, uniformly random data improves sample efficiency in several environments.
arXiv Detail & Related papers (2024-03-25T02:17:54Z)
Sequential Action-Induced Invariant Representation for Reinforcement Learning [1.2046159151610263]
How to accurately learn task-relevant state representations from high-dimensional observations with visual distractions is a challenging problem in visual reinforcement learning. We propose a Sequential Action-induced invariant Representation (SAR) method, in which the encoder is optimized by an auxiliary learner to only preserve the components that follow the control signals of sequential actions.
arXiv Detail & Related papers (2023-09-22T05:31:55Z)
TACO: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning [73.53576440536682]
We introduce TACO: Temporal Action-driven Contrastive Learning, a powerful temporal contrastive learning approach. TACO simultaneously learns a state and an action representation by optimizing the mutual information between representations of current states. For online RL, TACO achieves 40% performance boost after one million environment interaction steps.
arXiv Detail & Related papers (2023-06-22T22:21:53Z)
SeMAIL: Eliminating Distractors in Visual Imitation via Separated Models [22.472167814814448]
We propose a new model-based imitation learning algorithm named Separated Model-based Adversarial Imitation Learning (SeMAIL) Our method achieves near-expert performance on various visual control tasks with complex observations and the more challenging tasks with different backgrounds from expert observations.
arXiv Detail & Related papers (2023-06-19T04:33:44Z)
Imitation from Observation With Bootstrapped Contrastive Learning [12.048166025000976]
Imitation from observation (IfO) is a learning paradigm that consists of training autonomous agents in a Markov Decision Process. We present BootIfOL, an IfO algorithm that aims to learn a reward function that takes an agent trajectory and compares it to an expert. We evaluate our approach on a variety of control tasks showing that we can train effective policies using a limited number of demonstrative trajectories.
arXiv Detail & Related papers (2023-02-13T17:32:17Z)
Provable RL with Exogenous Distractors via Multistep Inverse Dynamics [85.52408288789164]
Real-world applications of reinforcement learning (RL) require the agent to deal with high-dimensional observations such as those generated from a megapixel camera. Prior work has addressed such problems with representation learning, through which the agent can provably extract endogenous, latent state information from raw observations. However, such approaches can fail in the presence of temporally correlated noise in the observations.
arXiv Detail & Related papers (2021-10-17T15:21:27Z)
DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator. Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms. This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk. We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
Learning Memory-Dependent Continuous Control from Demonstrations [13.063093054280948]
This paper builds on the idea of replaying demonstrations for memory-dependent continuous control. Experiments involving several memory-crucial continuous control tasks reveal significantly reduce interactions with the environment. The algorithm also shows better sample efficiency and learning capabilities than a baseline reinforcement learning algorithm for memory-based control from demonstrations.
arXiv Detail & Related papers (2021-02-18T08:13:42Z)
Learning Robust State Abstractions for Hidden-Parameter Block MDPs [55.31018404591743]
We leverage ideas of common structure from the HiP-MDP setting to enable robust state abstractions inspired by Block MDPs. We derive instantiations of this new framework for both multi-task reinforcement learning (MTRL) and meta-reinforcement learning (Meta-RL) settings.
arXiv Detail & Related papers (2020-07-14T17:25:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.