3D AffordanceNet: A Benchmark for Visual Object Affordance Understanding
- URL: http://arxiv.org/abs/2103.16397v2
- Date: Wed, 31 Mar 2021 09:59:28 GMT
- Title: 3D AffordanceNet: A Benchmark for Visual Object Affordance Understanding
- Authors: Shengheng Deng, Xun Xu, Chaozheng Wu, Ke Chen, Kui Jia
- Abstract summary: We present a 3D AffordanceNet dataset, a benchmark of 23k shapes from 23 semantic object categories, annotated with 18 visual affordance categories.
Three state-of-the-art point cloud deep learning networks are evaluated on all tasks.
- Score: 33.68455617113953
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability to understand the ways to interact with objects from visual cues,
a.k.a. visual affordance, is essential to vision-guided robotic research. This
involves categorizing, segmenting and reasoning of visual affordance. Relevant
studies in 2D and 2.5D image domains have been made previously, however, a
truly functional understanding of object affordance requires learning and
prediction in the 3D physical domain, which is still absent in the community.
In this work, we present a 3D AffordanceNet dataset, a benchmark of 23k shapes
from 23 semantic object categories, annotated with 18 visual affordance
categories. Based on this dataset, we provide three benchmarking tasks for
evaluating visual affordance understanding, including full-shape, partial-view
and rotation-invariant affordance estimations. Three state-of-the-art point
cloud deep learning networks are evaluated on all tasks. In addition we also
investigate a semi-supervised learning setup to explore the possibility to
benefit from unlabeled data. Comprehensive results on our contributed dataset
show the promise of visual affordance understanding as a valuable yet
challenging benchmark.
Related papers
- SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR.
SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds.
We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z) - Grounding 3D Object Affordance from 2D Interactions in Images [128.6316708679246]
Grounding 3D object affordance seeks to locate objects' ''action possibilities'' regions in the 3D space.
Humans possess the ability to perceive object affordances in the physical world through demonstration images or videos.
We devise an Interaction-driven 3D Affordance Grounding Network (IAG), which aligns the region feature of objects from different sources.
arXiv Detail & Related papers (2023-03-18T15:37:35Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z) - A large scale multi-view RGBD visual affordance learning dataset [4.3773754388936625]
We introduce a large scale multi-view RGBD visual affordance learning dataset.
This is the first ever and the largest multi-view RGBD visual affordance learning dataset.
Several state-of-the-art deep learning networks are evaluated each for affordance recognition and segmentation tasks.
arXiv Detail & Related papers (2022-03-26T14:31:35Z) - PartAfford: Part-level Affordance Discovery from 3D Objects [113.91774531972855]
We present a new task of part-level affordance discovery (PartAfford)
Given only the affordance labels per object, the machine is tasked to (i) decompose 3D shapes into parts and (ii) discover how each part corresponds to a certain affordance category.
We propose a novel learning framework for PartAfford, which discovers part-level representations by leveraging only the affordance set supervision and geometric primitive regularization.
arXiv Detail & Related papers (2022-02-28T02:58:36Z) - VAT-Mart: Learning Visual Action Trajectory Proposals for Manipulating
3D ARTiculated Objects [19.296344218177534]
The space of 3D articulated objects is exceptionally rich in their myriad semantic categories, diverse shape geometry, and complicated part functionality.
Previous works mostly abstract kinematic structure with estimated joint parameters and part poses as the visual representations for manipulating 3D articulated objects.
We propose object-centric actionable visual priors as a novel perception-interaction handshaking point that the perception system outputs more actionable guidance than kinematic structure estimation.
arXiv Detail & Related papers (2021-06-28T07:47:31Z) - Learning to Reconstruct and Segment 3D Objects [4.709764624933227]
We aim to understand scenes and the objects within them by learning general and robust representations using deep neural networks.
This thesis makes three core contributions from object-level 3D shape estimation from single or multiple views to scene-level semantic understanding.
arXiv Detail & Related papers (2020-10-19T15:09:04Z) - Improving Point Cloud Semantic Segmentation by Learning 3D Object
Detection [102.62963605429508]
Point cloud semantic segmentation plays an essential role in autonomous driving.
Current 3D semantic segmentation networks focus on convolutional architectures that perform great for well represented classes.
We propose a novel Aware 3D Semantic Detection (DASS) framework that explicitly leverages localization features from an auxiliary 3D object detection task.
arXiv Detail & Related papers (2020-09-22T14:17:40Z) - PointContrast: Unsupervised Pre-training for 3D Point Cloud
Understanding [107.02479689909164]
In this work, we aim at facilitating research on 3D representation learning.
We measure the effect of unsupervised pre-training on a large source set of 3D scenes.
arXiv Detail & Related papers (2020-07-21T17:59:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.