Efficient and Robust Training of Dense Object Nets for Multi-Object
Robot Manipulation
- URL: http://arxiv.org/abs/2206.12145v1
- Date: Fri, 24 Jun 2022 08:24:42 GMT
- Title: Efficient and Robust Training of Dense Object Nets for Multi-Object
Robot Manipulation
- Authors: David B. Adrian, Andras Gabor Kupcsik, Markus Spies and Heiko Neumann
- Abstract summary: We propose a framework for robust and efficient training of Dense Object Nets (DON)
We focus on training with multi-object data instead of singulated objects, combined with a well-chosen augmentation scheme.
We demonstrate the robustness and accuracy of our proposed framework on a real-world robotic grasping task.
- Score: 8.321536457963655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a framework for robust and efficient training of Dense Object Nets
(DON) with a focus on multi-object robot manipulation scenarios. DON is a
popular approach to obtain dense, view-invariant object descriptors, which can
be used for a multitude of downstream tasks in robot manipulation, such as,
pose estimation, state representation for control, etc.. However, the original
work focused training on singulated objects, with limited results on
instance-specific, multi-object applications. Additionally, a complex data
collection pipeline, including 3D reconstruction and mask annotation of each
object, is required for training. In this paper, we further improve the
efficacy of DON with a simplified data collection and training regime, that
consistently yields higher precision and enables robust tracking of keypoints
with less data requirements. In particular, we focus on training with
multi-object data instead of singulated objects, combined with a well-chosen
augmentation scheme. We additionally propose an alternative loss formulation to
the original pixelwise formulation that offers better results and is less
sensitive to hyperparameters. Finally, we demonstrate the robustness and
accuracy of our proposed framework on a real-world robotic grasping task.
Related papers
- Attribute-Based Robotic Grasping with Data-Efficient Adaptation [19.683833436076313]
We present an end-to-end encoder-decoder network to learn attribute-based robotic grasping.
Our approach achieves over 81% instance grasping success rate on unknown objects.
arXiv Detail & Related papers (2025-01-04T00:37:17Z) - EC-Diffuser: Multi-Object Manipulation via Entity-Centric Behavior Generation [25.12999060040265]
Learning to manipulate objects from high-dimensional observations presents significant challenges.
Recent approaches have utilized large-scale offline data to train models from pixel observations.
We propose a novel behavioral cloning (BC) approach that leverages object-centric representations and an entity-centric Transformer.
arXiv Detail & Related papers (2024-12-25T13:50:15Z) - Semi-Supervised Neural Processes for Articulated Object Interactions [10.847409934374205]
This paper introduces the Semi-Supervised Neural Process (SSNP), an adaptive reward-prediction model designed for scenarios in which only a small subset of objects have labeled interaction data.
Jointly training with both types of data allows the model to focus more effectively on generalizable features.
The efficacy of SSNP is demonstrated through a door-opening task, leading to better performance than other semi-supervised methods, and only using a fraction of the data compared to other adaptive models.
arXiv Detail & Related papers (2024-11-28T21:20:06Z) - Keypoint Abstraction using Large Models for Object-Relative Imitation Learning [78.92043196054071]
Generalization to novel object configurations and instances across diverse tasks and environments is a critical challenge in robotics.
Keypoint-based representations have been proven effective as a succinct representation for essential object capturing features.
We propose KALM, a framework that leverages large pre-trained vision-language models to automatically generate task-relevant and cross-instance consistent keypoints.
arXiv Detail & Related papers (2024-10-30T17:37:31Z) - SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation [62.58480650443393]
Segment Anything (SAM) is a vision-foundation model for generalizable scene understanding and sequence imitation.
We develop a novel multi-channel heatmap that enables the prediction of the action sequence in a single pass.
arXiv Detail & Related papers (2024-05-30T00:32:51Z) - Proposal-Contrastive Pretraining for Object Detection from Fewer Data [11.416621957617334]
We present Proposal Selection Contrast (ProSeCo), a novel unsupervised overall pretraining approach.
ProSeCo uses the large number of object proposals generated by the detector for contrastive learning.
We show that our method outperforms state of the art in unsupervised pretraining for object detection on standard and novel benchmarks.
arXiv Detail & Related papers (2023-10-25T17:59:26Z) - DiffSkill: Skill Abstraction from Differentiable Physics for Deformable
Object Manipulations with Tools [96.38972082580294]
DiffSkill is a novel framework that uses a differentiable physics simulator for skill abstraction to solve deformable object manipulation tasks.
In particular, we first obtain short-horizon skills using individual tools from a gradient-based simulator.
We then learn a neural skill abstractor from the demonstration trajectories which takes RGBD images as input.
arXiv Detail & Related papers (2022-03-31T17:59:38Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Vision-driven Robotic
Grasping via Physics-based Metaverse Synthesis [78.26022688167133]
We present a large-scale benchmark dataset for vision-driven robotic grasping via physics-based metaverse synthesis.
The proposed dataset contains 100,000 images and 25 different object types.
We also propose a new layout-weighted performance metric alongside the dataset for evaluating object detection and segmentation performance.
arXiv Detail & Related papers (2021-12-29T17:23:24Z) - Supervised Training of Dense Object Nets using Optimal Descriptors for
Industrial Robotic Applications [57.87136703404356]
Dense Object Nets (DONs) by Florence, Manuelli and Tedrake introduced dense object descriptors as a novel visual object representation for the robotics community.
In this paper we show that given a 3D model of an object, we can generate its descriptor space image, which allows for supervised training of DONs.
We compare the training methods on generating 6D grasps for industrial objects and show that our novel supervised training approach improves the pick-and-place performance in industry-relevant tasks.
arXiv Detail & Related papers (2021-02-16T11:40:12Z) - Distributed Reinforcement Learning of Targeted Grasping with Active
Vision for Mobile Manipulators [4.317864702902075]
We present the first RL-based system for a mobile manipulator that can (a) achieve targeted grasping generalizing to unseen target objects, (b) learn complex grasping strategies for cluttered scenes with occluded objects, and (c) perform active vision through its movable wrist camera to better locate objects.
We train and evaluate our system in a simulated environment, identify key components for improving performance, analyze its behaviors, and transfer to a real-world setup.
arXiv Detail & Related papers (2020-07-16T02:47:48Z) - A Unified Object Motion and Affinity Model for Online Multi-Object
Tracking [127.5229859255719]
We propose a novel MOT framework that unifies object motion and affinity model into a single network, named UMA.
UMA integrates single object tracking and metric learning into a unified triplet network by means of multi-task learning.
We equip our model with a task-specific attention module, which is used to boost task-aware feature learning.
arXiv Detail & Related papers (2020-03-25T09:36:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.