Learning Precise 3D Manipulation from Multiple Uncalibrated Cameras
- URL: http://arxiv.org/abs/2002.09107v2
- Date: Wed, 31 Mar 2021 18:48:24 GMT
- Title: Learning Precise 3D Manipulation from Multiple Uncalibrated Cameras
- Authors: Iretiayo Akinola, Jacob Varley and Dmitry Kalashnikov
- Abstract summary: We present an effective multi-view approach to end-to-end learning of precise manipulation tasks that are 3D in nature.
Our method learns to accomplish these tasks using multiple statically placed but uncalibrated RGB camera views without building an explicit 3D representation such as a pointcloud or voxel grid.
- Score: 13.24490469380487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we present an effective multi-view approach to closed-loop
end-to-end learning of precise manipulation tasks that are 3D in nature. Our
method learns to accomplish these tasks using multiple statically placed but
uncalibrated RGB camera views without building an explicit 3D representation
such as a pointcloud or voxel grid. This multi-camera approach achieves
superior task performance on difficult stacking and insertion tasks compared to
single-view baselines. Single view robotic agents struggle from occlusion and
challenges in estimating relative poses between points of interest. While full
3D scene representations (voxels or pointclouds) are obtainable from registered
output of multiple depth sensors, several challenges complicate operating off
such explicit 3D representations. These challenges include imperfect camera
calibration, poor depth maps due to object properties such as reflective
surfaces, and slower inference speeds over 3D representations compared to 2D
images. Our use of static but uncalibrated cameras does not require
camera-robot or camera-camera calibration making the proposed approach easy to
setup and our use of \textit{sensor dropout} during training makes it resilient
to the loss of camera-views after deployment.
Related papers
- SpatialTracker: Tracking Any 2D Pixels in 3D Space [71.58016288648447]
We propose to estimate point trajectories in 3D space to mitigate the issues caused by image projection.
Our method, named SpatialTracker, lifts 2D pixels to 3D using monocular depth estimators.
Tracking in 3D allows us to leverage as-rigid-as-possible (ARAP) constraints while simultaneously learning a rigidity embedding that clusters pixels into different rigid parts.
arXiv Detail & Related papers (2024-04-05T17:59:25Z) - Multi-Person 3D Pose Estimation from Multi-View Uncalibrated Depth
Cameras [36.59439020480503]
We tackle the task of multi-view, multi-person 3D human pose estimation from a limited number of uncalibrated depth cameras.
We propose to leverage sparse, uncalibrated depth cameras providing RGBD video streams for 3D human pose estimation.
arXiv Detail & Related papers (2024-01-28T10:06:17Z) - CAPE: Camera View Position Embedding for Multi-View 3D Object Detection [100.02565745233247]
Current query-based methods rely on global 3D position embeddings to learn the geometric correspondence between images and 3D space.
We propose a novel method based on CAmera view Position Embedding, called CAPE.
CAPE achieves state-of-the-art performance (61.0% NDS and 52.5% mAP) among all LiDAR-free methods on nuScenes dataset.
arXiv Detail & Related papers (2023-03-17T18:59:54Z) - Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data [80.14669385741202]
We propose a self-supervised pre-training method for 3D perception models tailored to autonomous driving data.
We leverage the availability of synchronized and calibrated image and Lidar sensors in autonomous driving setups.
Our method does not require any point cloud nor image annotations.
arXiv Detail & Related papers (2022-03-30T12:40:30Z) - Learning to Predict 3D Lane Shape and Camera Pose from a Single Image
via Geometry Constraints [25.7441545608721]
We propose to predict 3D lanes by estimating camera pose from a single image with a two-stage framework.
The first stage aims at the camera pose task from perspective-view images.
The second stage targets the 3D lane task. It uses previously estimated pose to generate top-view images containing distance-invariant lane appearances.
arXiv Detail & Related papers (2021-12-31T08:59:27Z) - MonoCInIS: Camera Independent Monocular 3D Object Detection using
Instance Segmentation [55.96577490779591]
Methods need to have a degree of 'camera independence' in order to benefit from large and heterogeneous training data.
We show that more data does not automatically guarantee a better performance, but rather, methods need to have a degree of 'camera independence' in order to benefit from large and heterogeneous training data.
arXiv Detail & Related papers (2021-10-01T14:56:37Z) - MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation.
Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z) - CanonPose: Self-Supervised Monocular 3D Human Pose Estimation in the
Wild [31.334715988245748]
We propose a self-supervised approach that learns a single image 3D pose estimator from unlabeled multi-view data.
In contrast to most existing methods, we do not require calibrated cameras and can therefore learn from moving cameras.
Key to the success are new, unbiased reconstruction objectives that mix information across views and training samples.
arXiv Detail & Related papers (2020-11-30T10:42:27Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.