Planning from Pixels in Environments with Combinatorially Hard Search
Spaces
- URL: http://arxiv.org/abs/2110.06149v1
- Date: Tue, 12 Oct 2021 16:38:08 GMT
- Title: Planning from Pixels in Environments with Combinatorially Hard Search
Spaces
- Authors: Marco Bagatella, Mirek Ol\v{s}\'ak, Michal Rol\'inek, Georg Martius
- Abstract summary: A recent surge of interest in this field brought advances that yield good performance in tasks ranging from arcade games to continuous control.
We present a method that learns to represent its environment as a latent graph.
We show that our methods achieves strong generalizations or in an offline RL paradigm which only provides low-quality trajectories.
- Score: 14.897437359519456
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability to form complex plans based on raw visual input is a litmus test
for current capabilities of artificial intelligence, as it requires a seamless
combination of visual processing and abstract algorithmic execution, two
traditionally separate areas of computer science. A recent surge of interest in
this field brought advances that yield good performance in tasks ranging from
arcade games to continuous control; these methods however do not come without
significant issues, such as limited generalization capabilities and
difficulties when dealing with combinatorially hard planning instances. Our
contribution is two-fold: (i) we present a method that learns to represent its
environment as a latent graph and leverages state reidentification to reduce
the complexity of finding a good policy from exponential to linear (ii) we
introduce a set of lightweight environments with an underlying discrete
combinatorial structure in which planning is challenging even for humans.
Moreover, we show that our methods achieves strong empirical generalization to
variations in the environment, even across highly disadvantaged regimes, such
as "one-shot" planning, or in an offline RL paradigm which only provides
low-quality trajectories.
Related papers
- Efficient Imitation Learning with Conservative World Models [54.52140201148341]
We tackle the problem of policy learning from expert demonstrations without a reward function.
We re-frame imitation learning as a fine-tuning problem, rather than a pure reinforcement learning one.
arXiv Detail & Related papers (2024-05-21T20:53:18Z) - Learning Logic Specifications for Policy Guidance in POMDPs: an
Inductive Logic Programming Approach [57.788675205519986]
We learn high-quality traces from POMDP executions generated by any solver.
We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications.
We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
arXiv Detail & Related papers (2024-02-29T15:36:01Z) - The Complexity of Optimizing Atomic Congestion [14.845310803203724]
Atomic congestion games are a classic topic in network design, routing, and algorithmic game theory.
We show that the problem remains highly intractable even on extremely simple networks.
We conclude by extending our analysis towards the (even more challenging) min-max variant of the problem.
arXiv Detail & Related papers (2023-12-15T21:31:30Z) - Planning as In-Painting: A Diffusion-Based Embodied Task Planning
Framework for Environments under Uncertainty [56.30846158280031]
Task planning for embodied AI has been one of the most challenging problems.
We propose a task-agnostic method named 'planning as in-painting'
The proposed framework achieves promising performances in various embodied AI tasks.
arXiv Detail & Related papers (2023-12-02T10:07:17Z) - AI planning in the imagination: High-level planning on learned abstract
search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training.
We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z) - Learning Coverage Paths in Unknown Environments with Deep Reinforcement Learning [17.69984142788365]
Coverage path planning ( CPP) is the problem of finding a path that covers the entire free space of a confined area.
We investigate how suitable reinforcement learning is for this challenging problem.
We propose a computationally feasible egocentric map representation based on frontiers, and a novel reward term based on total variation.
arXiv Detail & Related papers (2023-06-29T14:32:06Z) - Symbolic Visual Reinforcement Learning: A Scalable Framework with
Object-Level Abstraction and Differentiable Expression Search [63.3745291252038]
We propose DiffSES, a novel symbolic learning approach that discovers discrete symbolic policies.
By using object-level abstractions instead of raw pixel-level inputs, DiffSES is able to leverage the simplicity and scalability advantages of symbolic expressions.
Our experiments demonstrate that DiffSES is able to generate symbolic policies that are simpler and more scalable than state-of-the-art symbolic RL methods.
arXiv Detail & Related papers (2022-12-30T17:50:54Z) - Compositional Reinforcement Learning from Logical Specifications [21.193231846438895]
Recent approaches automatically generate a reward function from a given specification and use a suitable reinforcement learning algorithm to learn a policy.
We develop a compositional learning approach, called DiRL, that interleaves high-level planning and reinforcement learning.
Our approach then incorporates reinforcement learning to learn neural network policies for each edge (sub-task) within a Dijkstra-style planning algorithm to compute a high-level plan in the graph.
arXiv Detail & Related papers (2021-06-25T22:54:28Z) - Composable Learning with Sparse Kernel Representations [110.19179439773578]
We present a reinforcement learning algorithm for learning sparse non-parametric controllers in a Reproducing Kernel Hilbert Space.
We improve the sample complexity of this approach by imposing a structure of the state-action function through a normalized advantage function.
We demonstrate the performance of this algorithm on learning obstacle-avoidance policies in multiple simulations of a robot equipped with a laser scanner while navigating in a 2D environment.
arXiv Detail & Related papers (2021-03-26T13:58:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.