ImageManip: Image-based Robotic Manipulation with Affordance-guided Next
View Selection
- URL: http://arxiv.org/abs/2310.09069v1
- Date: Fri, 13 Oct 2023 12:42:54 GMT
- Title: ImageManip: Image-based Robotic Manipulation with Affordance-guided Next
View Selection
- Authors: Xiaoqi Li, Yanzi Wang, Yan Shen, Ponomarenko Iaroslav, Haoran Lu,
Qianxu Wang, Boshi An, Jiaming Liu, Hao Dong
- Abstract summary: 3D articulated object manipulation is essential for enabling robots to interact with their environment.
Many existing studies make use of 3D point clouds as the primary input for manipulation policies.
RGB images offer high-resolution observations using cost effective devices but lack spatial 3D geometric information.
This framework is designed to capture multiple perspectives of the target object and infer depth information to complement its geometry.
- Score: 10.162882793554191
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the realm of future home-assistant robots, 3D articulated object
manipulation is essential for enabling robots to interact with their
environment. Many existing studies make use of 3D point clouds as the primary
input for manipulation policies. However, this approach encounters challenges
due to data sparsity and the significant cost associated with acquiring point
cloud data, which can limit its practicality. In contrast, RGB images offer
high-resolution observations using cost effective devices but lack spatial 3D
geometric information. To overcome these limitations, we present a novel
image-based robotic manipulation framework. This framework is designed to
capture multiple perspectives of the target object and infer depth information
to complement its geometry. Initially, the system employs an eye-on-hand RGB
camera to capture an overall view of the target object. It predicts the initial
depth map and a coarse affordance map. The affordance map indicates actionable
areas on the object and serves as a constraint for selecting subsequent
viewpoints. Based on the global visual prior, we adaptively identify the
optimal next viewpoint for a detailed observation of the potential manipulation
success area. We leverage geometric consistency to fuse the views, resulting in
a refined depth map and a more precise affordance map for robot manipulation
decisions. By comparing with prior works that adopt point clouds or RGB images
as inputs, we demonstrate the effectiveness and practicality of our method. In
the project webpage (https://sites.google.com/view/imagemanip), real world
experiments further highlight the potential of our method for practical
deployment.
Related papers
- FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction [17.367277970910813]
Humans effortlessly integrate common-sense knowledge with sensory input from vision and touch to understand their surroundings.
We introduce FusionSense, a novel 3D reconstruction framework that enables robots to fuse priors from foundation models with highly sparse observations from vision and tactile sensors.
arXiv Detail & Related papers (2024-10-10T18:07:07Z) - 3D Foundation Models Enable Simultaneous Geometry and Pose Estimation of Grasped Objects [13.58353565350936]
We contribute methodology to jointly estimate the geometry and pose of objects grasped by a robot.
Our method transforms the estimated geometry into the robot's coordinate frame.
We empirically evaluate our approach on a robot manipulator holding a diverse set of real-world objects.
arXiv Detail & Related papers (2024-07-14T21:02:55Z) - 3D Feature Distillation with Object-Centric Priors [9.626027459292926]
2D vision-language models such as CLIP have been widely popularized, due to their impressive capabilities for open-vocabulary grounding in 2D images.
Recent works aim to elevate 2D CLIP features to 3D via feature distillation, but either learn neural fields that are scene-specific or focus on indoor room scan data.
We show that our method reconstructs 3D CLIP features with improved grounding capacity and spatial consistency.
arXiv Detail & Related papers (2024-06-26T20:16:49Z) - Towards Generalizable Multi-Camera 3D Object Detection via Perspective
Debiasing [28.874014617259935]
Multi-Camera 3D Object Detection (MC3D-Det) has gained prominence with the advent of bird's-eye view (BEV) approaches.
We propose a novel method that aligns 3D detection with 2D camera plane results, ensuring consistent and accurate detections.
arXiv Detail & Related papers (2023-10-17T15:31:28Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - Leveraging Single-View Images for Unsupervised 3D Point Cloud Completion [53.93172686610741]
Cross-PCC is an unsupervised point cloud completion method without requiring any 3D complete point clouds.
To take advantage of the complementary information from 2D images, we use a single-view RGB image to extract 2D features.
Our method even achieves comparable performance to some supervised methods.
arXiv Detail & Related papers (2022-12-01T15:11:21Z) - SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for
Spatial-Aware Visual Representations [85.38562724999898]
We propose a 2D Image and 3D Point cloud Unsupervised pre-training strategy, called SimIPU.
Specifically, we develop a multi-modal contrastive learning framework that consists of an intra-modal spatial perception module and an inter-modal feature interaction module.
To the best of our knowledge, this is the first study to explore contrastive learning pre-training strategies for outdoor multi-modal datasets.
arXiv Detail & Related papers (2021-12-09T03:27:00Z) - Supervised Training of Dense Object Nets using Optimal Descriptors for
Industrial Robotic Applications [57.87136703404356]
Dense Object Nets (DONs) by Florence, Manuelli and Tedrake introduced dense object descriptors as a novel visual object representation for the robotics community.
In this paper we show that given a 3D model of an object, we can generate its descriptor space image, which allows for supervised training of DONs.
We compare the training methods on generating 6D grasps for industrial objects and show that our novel supervised training approach improves the pick-and-place performance in industry-relevant tasks.
arXiv Detail & Related papers (2021-02-16T11:40:12Z) - Nothing But Geometric Constraints: A Model-Free Method for Articulated
Object Pose Estimation [89.82169646672872]
We propose an unsupervised vision-based system to estimate the joint configurations of the robot arm from a sequence of RGB or RGB-D images without knowing the model a priori.
We combine a classical geometric formulation with deep learning and extend the use of epipolar multi-rigid-body constraints to solve this task.
arXiv Detail & Related papers (2020-11-30T20:46:48Z) - A Long Horizon Planning Framework for Manipulating Rigid Pointcloud
Objects [25.428781562909606]
We present a framework for solving long-horizon planning problems involving manipulation of rigid objects.
Our method plans in the space of object subgoals and frees the planner from reasoning about robot-object interaction dynamics.
arXiv Detail & Related papers (2020-11-16T18:59:33Z) - Single View Metrology in the Wild [94.7005246862618]
We present a novel approach to single view metrology that can recover the absolute scale of a scene represented by 3D heights of objects or camera height above the ground.
Our method relies on data-driven priors learned by a deep network specifically designed to imbibe weakly supervised constraints from the interplay of the unknown camera with 3D entities such as object heights.
We demonstrate state-of-the-art qualitative and quantitative results on several datasets as well as applications including virtual object insertion.
arXiv Detail & Related papers (2020-07-18T22:31:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.