Provably Sample-Efficient RL with Side Information about Latent Dynamics
- URL: http://arxiv.org/abs/2205.14237v1
- Date: Fri, 27 May 2022 21:07:03 GMT
- Title: Provably Sample-Efficient RL with Side Information about Latent Dynamics
- Authors: Yao Liu, Dipendra Misra, Miro Dud\'ik, Robert E. Schapire
- Abstract summary: We study reinforcement learning in settings where observations are high-dimensional, but where an RL agent has access to abstract knowledge about the structure of the state space.
We present an algorithm, called TASID, that learns a robust policy in the target domain, with sample complexity that is in the horizon.
- Score: 12.461789905893026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study reinforcement learning (RL) in settings where observations are
high-dimensional, but where an RL agent has access to abstract knowledge about
the structure of the state space, as is the case, for example, when a robot is
tasked to go to a specific room in a building using observations from its own
camera, while having access to the floor plan. We formalize this setting as
transfer reinforcement learning from an abstract simulator, which we assume is
deterministic (such as a simple model of moving around the floor plan), but
which is only required to capture the target domain's latent-state dynamics
approximately up to unknown (bounded) perturbations (to account for environment
stochasticity). Crucially, we assume no prior knowledge about the structure of
observations in the target domain except that they can be used to identify the
latent states (but the decoding map is unknown). Under these assumptions, we
present an algorithm, called TASID, that learns a robust policy in the target
domain, with sample complexity that is polynomial in the horizon, and
independent of the number of states, which is not possible without access to
some prior knowledge. In synthetic experiments, we verify various properties of
our algorithm and show that it empirically outperforms transfer RL algorithms
that require access to "full simulators" (i.e., those that also simulate
observations).
Related papers
- Geospatial Trajectory Generation via Efficient Abduction: Deployment for Independent Testing [1.8877926393541125]
We show that we can abduce movement trajectories efficiently through an informed (i.e., A*) search.
We also report on our own experiments showing that we not only provide exact results but also scale to very large scenarios.
arXiv Detail & Related papers (2024-07-08T23:11:47Z) - The Power of Resets in Online Reinforcement Learning [73.64852266145387]
We explore the power of simulators through online reinforcement learning with local simulator access (or, local planning)
We show that MDPs with low coverability can be learned in a sample-efficient fashion with only $Qstar$-realizability.
We show that the notorious Exogenous Block MDP problem is tractable under local simulator access.
arXiv Detail & Related papers (2024-04-23T18:09:53Z) - Endogenous Macrodynamics in Algorithmic Recourse [52.87956177581998]
Existing work on Counterfactual Explanations (CE) and Algorithmic Recourse (AR) has largely focused on single individuals in a static environment.
We show that many of the existing methodologies can be collectively described by a generalized framework.
We then argue that the existing framework does not account for a hidden external cost of recourse, that only reveals itself when studying the endogenous dynamics of recourse at the group level.
arXiv Detail & Related papers (2023-08-16T07:36:58Z) - Persistent Homology Meets Object Unity: Object Recognition in Clutter [2.356908851188234]
Recognition of occluded objects in unseen and unstructured indoor environments is a challenging problem for mobile robots.
We propose a new descriptor, TOPS, for point clouds generated from depth images and an accompanying recognition framework, THOR, inspired by human reasoning.
THOR outperforms state-of-the-art methods on both the datasets and achieves substantially higher recognition accuracy for all the scenarios of the UW-IS Occluded dataset.
arXiv Detail & Related papers (2023-05-05T19:42:39Z) - Quantifying the LiDAR Sim-to-Real Domain Shift: A Detailed Investigation
Using Object Detectors and Analyzing Point Clouds at Target-Level [1.1999555634662635]
LiDAR object detection algorithms based on neural networks for autonomous driving require large amounts of data for training, validation, and testing.
We show that using simulated data for the training of neural networks leads to a domain shift of training and testing data due to differences in scenes, scenarios, and distributions.
arXiv Detail & Related papers (2023-03-03T12:52:01Z) - Planning for Learning Object Properties [117.27898922118946]
We formalize the problem of automatically training a neural network to recognize object properties as a symbolic planning problem.
We use planning techniques to produce a strategy for automating the training dataset creation and the learning process.
We provide an experimental evaluation in both a simulated and a real environment.
arXiv Detail & Related papers (2023-01-15T09:37:55Z) - Near-optimal Policy Identification in Active Reinforcement Learning [84.27592560211909]
AE-LSVI is a novel variant of the kernelized least-squares value RL (LSVI) algorithm that combines optimism with pessimism for active exploration.
We show that AE-LSVI outperforms other algorithms in a variety of environments when robustness to the initial state is required.
arXiv Detail & Related papers (2022-12-19T14:46:57Z) - Bridging the Gap to Real-World Object-Centric Learning [66.55867830853803]
We show that reconstructing features from models trained in a self-supervised manner is a sufficient training signal for object-centric representations to arise in a fully unsupervised way.
Our approach, DINOSAUR, significantly out-performs existing object-centric learning models on simulated data.
arXiv Detail & Related papers (2022-09-29T15:24:47Z) - Transfer RL across Observation Feature Spaces via Model-Based
Regularization [9.660642248872973]
In many reinforcement learning (RL) applications, the observation space is specified by human developers and restricted by physical realizations.
We propose a novel algorithm which extracts the latent-space dynamics in the source task, and transfers the dynamics model to the target task.
Our algorithm works for drastic changes of observation space without any inter-task mapping or any prior knowledge of the target task.
arXiv Detail & Related papers (2022-01-01T22:41:19Z) - Provable RL with Exogenous Distractors via Multistep Inverse Dynamics [85.52408288789164]
Real-world applications of reinforcement learning (RL) require the agent to deal with high-dimensional observations such as those generated from a megapixel camera.
Prior work has addressed such problems with representation learning, through which the agent can provably extract endogenous, latent state information from raw observations.
However, such approaches can fail in the presence of temporally correlated noise in the observations.
arXiv Detail & Related papers (2021-10-17T15:21:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.