Learning visual policies for building 3D shape categories
- URL: http://arxiv.org/abs/2004.07950v2
- Date: Wed, 30 Sep 2020 22:24:32 GMT
- Title: Learning visual policies for building 3D shape categories
- Authors: Alexander Pashevich, Igor Kalevatykh, Ivan Laptev, Cordelia Schmid
- Abstract summary: Previous work in this domain often assembles particular instances of objects from known sets of primitives.
We learn a visual policy to assemble other instances of the same category.
Our visual assembly policies are trained with no real images and reach up to 95% success rate when evaluated on a real robot.
- Score: 130.7718618259183
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Manipulation and assembly tasks require non-trivial planning of actions
depending on the environment and the final goal. Previous work in this domain
often assembles particular instances of objects from known sets of primitives.
In contrast, we aim to handle varying sets of primitives and to construct
different objects of a shape category. Given a single object instance of a
category, e.g. an arch, and a binary shape classifier, we learn a visual policy
to assemble other instances of the same category. In particular, we propose a
disassembly procedure and learn a state policy that discovers new object
instances and their assembly plans in state space. We then render simulated
states in the observation space and learn a heatmap representation to predict
alternative actions from a given input image. To validate our approach, we
first demonstrate its efficiency for building object categories in state space.
We then show the success of our visual policies for building arches from
different primitives. Moreover, we demonstrate (i) the reactive ability of our
method to re-assemble objects using additional primitives and (ii) the robust
performance of our policy for unseen primitives resembling building blocks used
during training. Our visual assembly policies are trained with no real images
and reach up to 95% success rate when evaluated on a real robot.
Related papers
- ICGNet: A Unified Approach for Instance-Centric Grasping [42.92991092305974]
We introduce an end-to-end architecture for object-centric grasping.
We show the effectiveness of the proposed method by extensively evaluating it against state-of-the-art methods on synthetic datasets.
arXiv Detail & Related papers (2024-01-18T12:41:41Z) - Learning Generalizable Manipulation Policies with Object-Centric 3D
Representations [65.55352131167213]
GROOT is an imitation learning method for learning robust policies with object-centric and 3D priors.
It builds policies that generalize beyond their initial training conditions for vision-based manipulation.
GROOT's performance excels in generalization over background changes, camera viewpoint shifts, and the presence of new object instances.
arXiv Detail & Related papers (2023-10-22T18:51:45Z) - PartManip: Learning Cross-Category Generalizable Part Manipulation
Policy from Point Cloud Observations [12.552149411655355]
We build the first large-scale, part-based cross-category object manipulation benchmark, PartManip.
We train a state-based expert with our proposed part-based canonicalization and part-aware rewards, and then distill the knowledge to a vision-based student.
For cross-category generalization, we introduce domain adversarial learning for domain-invariant feature extraction.
arXiv Detail & Related papers (2023-03-29T18:29:30Z) - Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators [97.12135238534628]
We propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects.
Specifically, the semantic discriminators leverage pretrained visual features to improve the realism of the generated visual concepts.
Our proposed scheme significantly improves the generation quality and achieves state-of-the-art results on various tasks.
arXiv Detail & Related papers (2022-12-13T01:36:56Z) - Hyperbolic Contrastive Learning for Visual Representations beyond
Objects [30.618032825306187]
We focus on learning representations for objects and scenes that preserve the structure among them.
Motivated by the observation that visually similar objects are close in the representation space, we argue that the scenes and objects should instead follow a hierarchical structure.
arXiv Detail & Related papers (2022-12-01T16:58:57Z) - Efficient Representations of Object Geometry for Reinforcement Learning
of Interactive Grasping Policies [29.998917158604694]
We present a reinforcement learning framework that learns the interactive grasping of various geometrically distinct real-world objects.
Videos of learned interactive policies are available at https://maltemosbach.org/io/geometry_aware_grasping_policies.
arXiv Detail & Related papers (2022-11-20T11:47:33Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - Contrastive Object Detection Using Knowledge Graph Embeddings [72.17159795485915]
We compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs.
We propose a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
arXiv Detail & Related papers (2021-12-21T17:10:21Z) - Few-shot Object Grounding and Mapping for Natural Language Robot
Instruction Following [15.896892723068932]
We study the problem of learning a robot policy to follow natural language instructions that can be easily extended to reason about new objects.
We introduce a few-shot language-conditioned object grounding method trained from augmented reality data.
We present a learned map representation that encodes object locations and their instructed use, and construct it from our few-shot grounding output.
arXiv Detail & Related papers (2020-11-14T20:35:20Z) - Look-into-Object: Self-supervised Structure Modeling for Object
Recognition [71.68524003173219]
We propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions.
We show the recognition backbone can be substantially enhanced for more robust representation learning.
Our approach achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft)
arXiv Detail & Related papers (2020-03-31T12:22:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.