Related papers: Efficient and Robust Training of Dense Object Nets for Multi-Object Robot Manipulation

Efficient and Robust Training of Dense Object Nets for Multi-Object Robot Manipulation

URL: http://arxiv.org/abs/2206.12145v1
Date: Fri, 24 Jun 2022 08:24:42 GMT
Title: Efficient and Robust Training of Dense Object Nets for Multi-Object Robot Manipulation
Authors: David B. Adrian, Andras Gabor Kupcsik, Markus Spies and Heiko Neumann
Abstract summary: We propose a framework for robust and efficient training of Dense Object Nets (DON) We focus on training with multi-object data instead of singulated objects, combined with a well-chosen augmentation scheme. We demonstrate the robustness and accuracy of our proposed framework on a real-world robotic grasping task.
Score: 8.321536457963655
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a framework for robust and efficient training of Dense Object Nets (DON) with a focus on multi-object robot manipulation scenarios. DON is a popular approach to obtain dense, view-invariant object descriptors, which can be used for a multitude of downstream tasks in robot manipulation, such as, pose estimation, state representation for control, etc.. However, the original work focused training on singulated objects, with limited results on instance-specific, multi-object applications. Additionally, a complex data collection pipeline, including 3D reconstruction and mask annotation of each object, is required for training. In this paper, we further improve the efficacy of DON with a simplified data collection and training regime, that consistently yields higher precision and enables robust tracking of keypoints with less data requirements. In particular, we focus on training with multi-object data instead of singulated objects, combined with a well-chosen augmentation scheme. We additionally propose an alternative loss formulation to the original pixelwise formulation that offers better results and is less sensitive to hyperparameters. Finally, we demonstrate the robustness and accuracy of our proposed framework on a real-world robotic grasping task.

Related papers

A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning [67.72413262980272]
Pre-trained vision models (PVMs) are fundamental to modern robotics, yet their optimal configuration remains unclear. We develop SlotMIM, a method that induces object-centric representations by introducing a semantic bottleneck. Our approach achieves significant improvements over prior work in image recognition, scene understanding, and robot learning evaluations.
arXiv Detail & Related papers (2025-03-10T06:18:31Z)
Attribute-Based Robotic Grasping with Data-Efficient Adaptation [19.683833436076313]
We present an end-to-end encoder-decoder network to learn attribute-based robotic grasping. Our approach achieves over 81% instance grasping success rate on unknown objects.
arXiv Detail & Related papers (2025-01-04T00:37:17Z)
EC-Diffuser: Multi-Object Manipulation via Entity-Centric Behavior Generation [25.12999060040265]
Learning to manipulate objects from high-dimensional observations presents significant challenges. Recent approaches have utilized large-scale offline data to train models from pixel observations. We propose a novel behavioral cloning (BC) approach that leverages object-centric representations and an entity-centric Transformer.
arXiv Detail & Related papers (2024-12-25T13:50:15Z)
Semi-Supervised Neural Processes for Articulated Object Interactions [10.847409934374205]
This paper introduces the Semi-Supervised Neural Process (SSNP), an adaptive reward-prediction model designed for scenarios in which only a small subset of objects have labeled interaction data. Jointly training with both types of data allows the model to focus more effectively on generalizable features. The efficacy of SSNP is demonstrated through a door-opening task, leading to better performance than other semi-supervised methods, and only using a fraction of the data compared to other adaptive models.
arXiv Detail & Related papers (2024-11-28T21:20:06Z)
Keypoint Abstraction using Large Models for Object-Relative Imitation Learning [78.92043196054071]
Generalization to novel object configurations and instances across diverse tasks and environments is a critical challenge in robotics. Keypoint-based representations have been proven effective as a succinct representation for essential object capturing features. We propose KALM, a framework that leverages large pre-trained vision-language models to automatically generate task-relevant and cross-instance consistent keypoints.
arXiv Detail & Related papers (2024-10-30T17:37:31Z)
SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation [62.58480650443393]
Segment Anything (SAM) is a vision-foundation model for generalizable scene understanding and sequence imitation. We develop a novel multi-channel heatmap that enables the prediction of the action sequence in a single pass.
arXiv Detail & Related papers (2024-05-30T00:32:51Z)
Counting Objects in a Robotic Hand [6.057565013011719]
A robot performing multi-object grasping needs to sense the number of objects in the hand after grasping. This paper presents a data-driven contrastive learning-based counting classifier with a modified loss function. The proposed contrastive learning-based counting approach achieved above 96% accuracy for all three objects in the real setup.
arXiv Detail & Related papers (2024-04-09T21:46:14Z)
Proposal-Contrastive Pretraining for Object Detection from Fewer Data [11.416621957617334]
We present Proposal Selection Contrast (ProSeCo), a novel unsupervised overall pretraining approach. ProSeCo uses the large number of object proposals generated by the detector for contrastive learning. We show that our method outperforms state of the art in unsupervised pretraining for object detection on standard and novel benchmarks.
arXiv Detail & Related papers (2023-10-25T17:59:26Z)
Primitive3D: 3D Object Dataset Synthesis from Randomly Assembled Primitives [44.03149443379618]
We propose a cost-effective method for automatically generating a large amount of 3D objects with annotations. These objects are auto-annotated with part labels originating from primitives. Considering the large overhead of learning on the generated dataset, we propose a dataset distillation strategy.
arXiv Detail & Related papers (2022-05-25T10:07:07Z)
DiffSkill: Skill Abstraction from Differentiable Physics for Deformable Object Manipulations with Tools [96.38972082580294]
DiffSkill is a novel framework that uses a differentiable physics simulator for skill abstraction to solve deformable object manipulation tasks. In particular, we first obtain short-horizon skills using individual tools from a gradient-based simulator. We then learn a neural skill abstractor from the demonstration trajectories which takes RGBD images as input.
arXiv Detail & Related papers (2022-03-31T17:59:38Z)
MetaGraspNet: A Large-Scale Benchmark Dataset for Vision-driven Robotic Grasping via Physics-based Metaverse Synthesis [78.26022688167133]
We present a large-scale benchmark dataset for vision-driven robotic grasping via physics-based metaverse synthesis. The proposed dataset contains 100,000 images and 25 different object types. We also propose a new layout-weighted performance metric alongside the dataset for evaluating object detection and segmentation performance.
arXiv Detail & Related papers (2021-12-29T17:23:24Z)
Supervised Training of Dense Object Nets using Optimal Descriptors for Industrial Robotic Applications [57.87136703404356]
Dense Object Nets (DONs) by Florence, Manuelli and Tedrake introduced dense object descriptors as a novel visual object representation for the robotics community. In this paper we show that given a 3D model of an object, we can generate its descriptor space image, which allows for supervised training of DONs. We compare the training methods on generating 6D grasps for industrial objects and show that our novel supervised training approach improves the pick-and-place performance in industry-relevant tasks.
arXiv Detail & Related papers (2021-02-16T11:40:12Z)
Distributed Reinforcement Learning of Targeted Grasping with Active Vision for Mobile Manipulators [4.317864702902075]
We present the first RL-based system for a mobile manipulator that can (a) achieve targeted grasping generalizing to unseen target objects, (b) learn complex grasping strategies for cluttered scenes with occluded objects, and (c) perform active vision through its movable wrist camera to better locate objects. We train and evaluate our system in a simulated environment, identify key components for improving performance, analyze its behaviors, and transfer to a real-world setup.
arXiv Detail & Related papers (2020-07-16T02:47:48Z)
A Unified Object Motion and Affinity Model for Online Multi-Object Tracking [127.5229859255719]
We propose a novel MOT framework that unifies object motion and affinity model into a single network, named UMA. UMA integrates single object tracking and metric learning into a unified triplet network by means of multi-task learning. We equip our model with a task-specific attention module, which is used to boost task-aware feature learning.
arXiv Detail & Related papers (2020-03-25T09:36:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.