Deep Visual Reasoning: Learning to Predict Action Sequences for Task and
Motion Planning from an Initial Scene Image
- URL: http://arxiv.org/abs/2006.05398v1
- Date: Tue, 9 Jun 2020 16:52:02 GMT
- Title: Deep Visual Reasoning: Learning to Predict Action Sequences for Task and
Motion Planning from an Initial Scene Image
- Authors: Danny Driess, Jung-Su Ha, Marc Toussaint
- Abstract summary: We propose a deep convolutional recurrent neural network that predicts action sequences for task and motion planning (TAMP) from an initial scene image.
A key aspect is that our method generalizes to scenes with many and varying number of objects, although being trained on only two objects at a time.
- Score: 43.05971157389743
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a deep convolutional recurrent neural network that
predicts action sequences for task and motion planning (TAMP) from an initial
scene image. Typical TAMP problems are formalized by combining reasoning on a
symbolic, discrete level (e.g. first-order logic) with continuous motion
planning such as nonlinear trajectory optimization. Due to the great
combinatorial complexity of possible discrete action sequences, a large number
of optimization/motion planning problems have to be solved to find a solution,
which limits the scalability of these approaches.
To circumvent this combinatorial complexity, we develop a neural network
which, based on an initial image of the scene, directly predicts promising
discrete action sequences such that ideally only one motion planning problem
has to be solved to find a solution to the overall TAMP problem. A key aspect
is that our method generalizes to scenes with many and varying number of
objects, although being trained on only two objects at a time. This is possible
by encoding the objects of the scene in images as input to the neural network,
instead of a fixed feature vector. Results show runtime improvements of several
magnitudes. Video: https://youtu.be/i8yyEbbvoEk
Related papers
- MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion [118.74385965694694]
We present Motion DUSt3R (MonST3R), a novel geometry-first approach that directly estimates per-timestep geometry from dynamic scenes.
By simply estimating a pointmap for each timestep, we can effectively adapt DUST3R's representation, previously only used for static scenes, to dynamic scenes.
We show that by posing the problem as a fine-tuning task, identifying several suitable datasets, and strategically training the model on this limited data, we can surprisingly enable the model to handle dynamics.
arXiv Detail & Related papers (2024-10-04T18:00:07Z) - Neural MP: A Generalist Neural Motion Planner [75.82675575009077]
We seek to do the same by applying data-driven learning at scale to the problem of motion planning.
Our approach builds a large number of complex scenes in simulation, collects expert data from a motion planner, then distills it into a reactive generalist policy.
We perform a thorough evaluation of our method on 64 motion planning tasks across four diverse environments.
arXiv Detail & Related papers (2024-09-09T17:59:45Z) - SIGMA:Sinkhorn-Guided Masked Video Modeling [69.31715194419091]
Sinkhorn-guided Masked Video Modelling ( SIGMA) is a novel video pretraining method.
We distribute features of space-time tubes evenly across a limited number of learnable clusters.
Experimental results on ten datasets validate the effectiveness of SIGMA in learning more performant, temporally-aware, and robust video representations.
arXiv Detail & Related papers (2024-07-22T08:04:09Z) - Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object
Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view.
Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks.
Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z) - Learning to Search in Task and Motion Planning with Streams [20.003445874753233]
Task and motion planning problems in robotics combine symbolic planning over discrete task variables with motion optimization over continuous state and action variables.
We propose a geometrically informed symbolic planner that expands the set of objects and facts in a best-first manner.
We apply our algorithm on a 7DOF robotic arm in block-stacking manipulation tasks.
arXiv Detail & Related papers (2021-11-25T15:58:31Z) - Neural Scene Flow Prior [30.878829330230797]
Before the deep learning revolution, many perception algorithms were based on runtime optimization in conjunction with a strong prior/regularization penalty.
This paper revisits the scene flow problem that relies predominantly on runtime optimization and strong regularization.
A central innovation here is the inclusion of a neural scene flow prior, which uses the architecture of neural networks as a new type of implicit regularizer.
arXiv Detail & Related papers (2021-11-01T20:44:12Z) - Neural Manipulation Planning on Constraint Manifolds [13.774614900994342]
We present Constrained Motion Planning Networks (CoMPNet), the first neural planner for multimodal kinematic constraints.
We show that CoMPNet solves practical motion planning tasks involving both unconstrained and constrained problems.
It generalizes to new unseen locations of the objects, i.e., not seen during training, in the given environments with high success rates.
arXiv Detail & Related papers (2020-08-09T18:58:10Z) - Action sequencing using visual permutations [19.583283039057505]
This work considers the task of neural action sequencing conditioned on a single reference visual state.
This paper takes a permutation perspective and argues that action sequencing benefits from the ability to reason about both permutations and ordering concepts.
arXiv Detail & Related papers (2020-08-03T19:49:06Z) - A Flexible Framework for Designing Trainable Priors with Adaptive
Smoothing and Game Encoding [57.1077544780653]
We introduce a general framework for designing and training neural network layers whose forward passes can be interpreted as solving non-smooth convex optimization problems.
We focus on convex games, solved by local agents represented by the nodes of a graph and interacting through regularization functions.
This approach is appealing for solving imaging problems, as it allows the use of classical image priors within deep models that are trainable end to end.
arXiv Detail & Related papers (2020-06-26T08:34:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.