AKB-48: A Real-World Articulated Object Knowledge Base
- URL: http://arxiv.org/abs/2202.08432v1
- Date: Thu, 17 Feb 2022 03:24:07 GMT
- Title: AKB-48: A Real-World Articulated Object Knowledge Base
- Authors: Liu Liu, Wenqiang Xu, Haoyuan Fu, Sucheng Qian, Yang Han, Cewu Lu
- Abstract summary: We present AKB-48: a large-scale Articulated object Knowledge Base which consists of 2,037 real-world 3D articulated object models of 48 categories.
To build the AKB-48, we present a fast articulation knowledge modeling (FArM) pipeline, which can fulfill the ArtiKG for an articulated object within 10-15 minutes.
Using our dataset, we propose AKBNet, a novel integral pipeline for Category-level Visual Articulation Manipulation (C-VAM) task.
- Score: 38.4899076076656
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human life is populated with articulated objects. A comprehensive
understanding of articulated objects, namely appearance, structure, physics
property, and semantics, will benefit many research communities. As current
articulated object understanding solutions are usually based on synthetic
object dataset with CAD models without physics properties, which prevent
satisfied generalization from simulation to real-world applications in visual
and robotics tasks. To bridge the gap, we present AKB-48: a large-scale
Articulated object Knowledge Base which consists of 2,037 real-world 3D
articulated object models of 48 categories. Each object is described by a
knowledge graph ArtiKG. To build the AKB-48, we present a fast articulation
knowledge modeling (FArM) pipeline, which can fulfill the ArtiKG for an
articulated object within 10-15 minutes, and largely reduce the cost for object
modeling in the real world. Using our dataset, we propose AKBNet, a novel
integral pipeline for Category-level Visual Articulation Manipulation (C-VAM)
task, in which we benchmark three sub-tasks, namely pose estimation, object
reconstruction and manipulation. Dataset, codes, and models will be publicly
available at https://liuliu66.github.io/articulationobjects/.
Related papers
- Uncertainty-aware Active Learning of NeRF-based Object Models for Robot Manipulators using Visual and Re-orientation Actions [8.059133373836913]
This paper presents an approach that enables a robot to rapidly learn the complete 3D model of a given object for manipulation in unfamiliar orientations.
We use an ensemble of partially constructed NeRF models to quantify model uncertainty to determine the next action.
Our approach determines when and how to grasp and re-orient an object given its partial NeRF model and re-estimates the object pose to rectify misalignments introduced during the interaction.
arXiv Detail & Related papers (2024-04-02T10:15:06Z) - AffordanceLLM: Grounding Affordance from Vision Language Models [36.97072698640563]
Affordance grounding refers to the task of finding the area of an object with which one can interact.
Much of the knowledge is hidden and beyond the image content with the supervised labels from a limited training set.
We make an attempt to improve the generalization capability of the current affordance grounding by taking the advantage of the rich world, abstract, and human-object-interaction knowledge.
arXiv Detail & Related papers (2024-01-12T03:21:02Z) - GAMMA: Generalizable Articulation Modeling and Manipulation for
Articulated Objects [53.965581080954905]
We propose a novel framework of Generalizable Articulation Modeling and Manipulating for Articulated Objects (GAMMA)
GAMMA learns both articulation modeling and grasp pose affordance from diverse articulated objects with different categories.
Results show that GAMMA significantly outperforms SOTA articulation modeling and manipulation algorithms in unseen and cross-category articulated objects.
arXiv Detail & Related papers (2023-09-28T08:57:14Z) - HANDAL: A Dataset of Real-World Manipulable Object Categories with Pose
Annotations, Affordances, and Reconstructions [17.9178233068395]
We present the HANDAL dataset for category-level object pose estimation and affordance prediction.
The dataset consists of 308k annotated image frames from 2.2k videos of 212 real-world objects in 17 categories.
We outline the usefulness of our dataset for 6-DoF category-level pose+scale estimation and related tasks.
arXiv Detail & Related papers (2023-08-02T23:59:59Z) - Full-Body Articulated Human-Object Interaction [61.01135739641217]
CHAIRS is a large-scale motion-captured f-AHOI dataset consisting of 16.2 hours of versatile interactions.
CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process.
By learning the geometrical relationships in HOI, we devise the very first model that leverage human pose estimation.
arXiv Detail & Related papers (2022-12-20T19:50:54Z) - MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training.
We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects.
Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z) - ABO: Dataset and Benchmarks for Real-World 3D Object Understanding [43.42504014918771]
Amazon-Berkeley Objects (ABO) is a large-scale dataset of product images and 3D models corresponding to real household objects.
We use ABO to measure the domain gap for single-view 3D reconstruction networks trained on synthetic objects.
We also use multi-view images from ABO to measure the robustness of state-of-the-art metric learning approaches to different camera viewpoints.
arXiv Detail & Related papers (2021-10-12T17:52:42Z) - Unsupervised Learning of 3D Object Categories from Videos in the Wild [75.09720013151247]
We focus on learning a model from multiple views of a large collection of object instances.
We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction.
Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
arXiv Detail & Related papers (2021-03-30T17:57:01Z) - RELATE: Physically Plausible Multi-Object Scene Synthesis Using
Structured Latent Spaces [77.07767833443256]
We present RELATE, a model that learns to generate physically plausible scenes and videos of multiple interacting objects.
In contrast to state-of-the-art methods in object-centric generative modeling, RELATE also extends naturally to dynamic scenes and generates videos of high visual fidelity.
arXiv Detail & Related papers (2020-07-02T17:27:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.