Rapid Pose Label Generation through Sparse Representation of Unknown
Objects
- URL: http://arxiv.org/abs/2011.03790v1
- Date: Sat, 7 Nov 2020 15:14:03 GMT
- Title: Rapid Pose Label Generation through Sparse Representation of Unknown
Objects
- Authors: Rohan Pratap Singh, Mehdi Benallegue, Yusuke Yoshiyasu, Fumio Kanehiro
- Abstract summary: This work presents an approach for rapidly generating real-world, pose-annotated RGB-D data for unknown objects.
We first source minimalistic labelings of an ordered set of arbitrarily chosen keypoints over a set of RGB-D videos.
By solving an optimization problem, we combine these labels under a world frame to recover a sparse, keypoint-based representation of the object.
- Score: 7.32172860877574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Convolutional Neural Networks (CNNs) have been successfully deployed on
robots for 6-DoF object pose estimation through visual perception. However,
obtaining labeled data on a scale required for the supervised training of CNNs
is a difficult task - exacerbated if the object is novel and a 3D model is
unavailable. To this end, this work presents an approach for rapidly generating
real-world, pose-annotated RGB-D data for unknown objects. Our method not only
circumvents the need for a prior 3D object model (textured or otherwise) but
also bypasses complicated setups of fiducial markers, turntables, and sensors.
With the help of a human user, we first source minimalistic labelings of an
ordered set of arbitrarily chosen keypoints over a set of RGB-D videos. Then,
by solving an optimization problem, we combine these labels under a world frame
to recover a sparse, keypoint-based representation of the object. The sparse
representation leads to the development of a dense model and the pose labels
for each image frame in the set of scenes. We show that the sparse model can
also be efficiently used for scaling to a large number of new scenes. We
demonstrate the practicality of the generated labeled dataset by training a
pipeline for 6-DoF object pose estimation and a pixel-wise segmentation
network.
Related papers
- Open-Vocabulary Octree-Graph for 3D Scene Understanding [54.11828083068082]
Octree-Graph is a novel scene representation for open-vocabulary 3D scene understanding.
An adaptive-octree structure is developed that stores semantics and depicts the occupancy of an object adjustably according to its shape.
arXiv Detail & Related papers (2024-11-25T10:14:10Z) - Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast
Contrastive Fusion [110.84357383258818]
We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation.
The core of our approach is a slow-fast clustering objective function, which is scalable and well-suited for scenes with a large number of objects.
Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets.
arXiv Detail & Related papers (2023-06-07T17:57:45Z) - BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown
Objects [89.2314092102403]
We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence.
Our method works for arbitrary rigid objects, even when visual texture is largely absent.
arXiv Detail & Related papers (2023-03-24T17:13:49Z) - Category-Agnostic 6D Pose Estimation with Conditional Neural Processes [19.387280883044482]
We present a novel meta-learning approach for 6D pose estimation on unknown objects.
Our algorithm learns object representation in a category-agnostic way, which endows it with strong generalization capabilities across object categories.
arXiv Detail & Related papers (2022-06-14T20:46:09Z) - Semantic keypoint-based pose estimation from single RGB frames [64.80395521735463]
We present an approach to estimating the continuous 6-DoF pose of an object from a single RGB image.
The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model.
We show that our approach can accurately recover the 6-DoF object pose for both instance- and class-based scenarios.
arXiv Detail & Related papers (2022-04-12T15:03:51Z) - ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose
Estimation [76.31125154523056]
We present a discrete descriptor, which can represent the object surface densely.
We also propose a coarse to fine training strategy, which enables fine-grained correspondence prediction.
arXiv Detail & Related papers (2022-03-17T16:16:24Z) - Cuboids Revisited: Learning Robust 3D Shape Fitting to Single RGB Images [44.223070672713455]
In particular, man-made environments commonly consist of volumetric primitives such as cuboids or cylinders.
Previous approaches directly estimate shape parameters from a 2D or 3D input, and are only able to reproduce simple objects.
We propose a robust estimator for primitive fitting, which can meaningfully abstract real-world environments using cuboids.
arXiv Detail & Related papers (2021-05-05T13:36:00Z) - Self-supervised Learning of 3D Object Understanding by Data Association
and Landmark Estimation for Image Sequence [15.815583594196488]
3D object under-standing from 2D image is a challenging task that infers ad-ditional dimension from reduced-dimensional information.
It is challenging to obtain large amount of 3D dataset since achieving 3D annotation is expensive andtime-consuming.
We propose a strategy to exploit multipleobservations of the object in the image sequence in orderto surpass the self-performance.
arXiv Detail & Related papers (2021-04-14T18:59:08Z) - Supervised Training of Dense Object Nets using Optimal Descriptors for
Industrial Robotic Applications [57.87136703404356]
Dense Object Nets (DONs) by Florence, Manuelli and Tedrake introduced dense object descriptors as a novel visual object representation for the robotics community.
In this paper we show that given a 3D model of an object, we can generate its descriptor space image, which allows for supervised training of DONs.
We compare the training methods on generating 6D grasps for industrial objects and show that our novel supervised training approach improves the pick-and-place performance in industry-relevant tasks.
arXiv Detail & Related papers (2021-02-16T11:40:12Z) - Self-Supervised Object-in-Gripper Segmentation from Robotic Motions [27.915309216800125]
We propose a robust solution for learning to segment unknown objects grasped by a robot.
We exploit motion and temporal cues in RGB video sequences.
Our approach is fully self-supervised and independent of precise camera calibration, 3D models or potentially imperfect depth data.
arXiv Detail & Related papers (2020-02-11T15:44:46Z) - L6DNet: Light 6 DoF Network for Robust and Precise Object Pose
Estimation with Small Datasets [0.0]
We propose a novel approach to perform 6 DoF object pose estimation from a single RGB-D image.
We adopt a hybrid pipeline in two stages: data-driven and geometric.
Our approach is more robust and accurate than state-of-the-art methods.
arXiv Detail & Related papers (2020-02-03T17:41:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.