Related papers: Arti-PG: A Toolbox for Procedurally Synthesizing Large-Scale and Diverse Articulated Objects with Rich Annotations

Arti-PG: A Toolbox for Procedurally Synthesizing Large-Scale and Diverse Articulated Objects with Rich Annotations

URL: http://arxiv.org/abs/2412.14974v1
Date: Thu, 19 Dec 2024 15:48:51 GMT
Title: Arti-PG: A Toolbox for Procedurally Synthesizing Large-Scale and Diverse Articulated Objects with Rich Annotations
Authors: Jianhua Sun, Yuxuan Li, Jiude Wei, Longfei Xu, Nange Wang, Yining Zhang, Cewu Lu,
Abstract summary: We propose Articulated Object Procedural Generation toolbox, a.k.a. Arti-PG toolbox.<n>Arti-PG supports the procedural generation of 26 categories of articulate objects and provides annotations across a wide range of both vision and manipulation tasks.<n>We will make Arti-PG toolbox publicly available for the community to use.
Score: 41.54457853741178
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The acquisition of substantial volumes of 3D articulated object data is expensive and time-consuming, and consequently the scarcity of 3D articulated object data becomes an obstacle for deep learning methods to achieve remarkable performance in various articulated object understanding tasks. Meanwhile, pairing these object data with detailed annotations to enable training for various tasks is also difficult and labor-intensive to achieve. In order to expeditiously gather a significant number of 3D articulated objects with comprehensive and detailed annotations for training, we propose Articulated Object Procedural Generation toolbox, a.k.a. Arti-PG toolbox. Arti-PG toolbox consists of i) descriptions of articulated objects by means of a generalized structure program along with their analytic correspondence to the objects' point cloud, ii) procedural rules about manipulations on the structure program to synthesize large-scale and diverse new articulated objects, and iii) mathematical descriptions of knowledge (e.g. affordance, semantics, etc.) to provide annotations to the synthesized object. Arti-PG has two appealing properties for providing training data for articulated object understanding tasks: i) objects are created with unlimited variations in shape through program-oriented structure manipulation, ii) Arti-PG is widely applicable to diverse tasks by easily providing comprehensive and detailed annotations. Arti-PG now supports the procedural generation of 26 categories of articulate objects and provides annotations across a wide range of both vision and manipulation tasks, and we provide exhaustive experiments which fully demonstrate its advantages. We will make Arti-PG toolbox publicly available for the community to use.

Related papers

IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments [56.85804719947]
We present IAAO, a framework that builds an explicit 3D model for intelligent agents to gain understanding of articulated objects in their environment through interaction. We first build hierarchical features and label fields for each object state using 3D Gaussian Splatting (3DGS) by distilling mask features and view-consistent labels from multi-view images. We then perform object- and part-level queries on the 3D Gaussian primitives to identify static and articulated elements, estimating global transformations and local articulation parameters along with affordances.
arXiv Detail & Related papers (2025-04-09T12:36:48Z)
ArtGS: Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting [66.29782808719301]
Building articulated objects is a key challenge in computer vision. Existing methods often fail to effectively integrate information across different object states. We introduce ArtGS, a novel approach that leverages 3D Gaussians as a flexible and efficient representation.
arXiv Detail & Related papers (2025-02-26T10:25:32Z)
Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling [48.78204955169967]
Articulate Anymesh is an automated framework that is able to convert rigid 3D mesh into its articulated counterpart in an open-vocabulary manner. Our experiments show that Articulate Anymesh can generate large-scale, high-quality 3D articulated objects, including tools, toys, mechanical devices, and vehicles.
arXiv Detail & Related papers (2025-02-04T18:59:55Z)
Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics [31.819336585007104]
We propose to leverage superquadrics as an alternative 3D object representation to bounding boxes. We demonstrate their effectiveness on both template-free object reconstruction and action recognition tasks. We also study the compositionality of actions by considering a more challenging task where the training combinations of verbs and nouns do not overlap with the testing split.
arXiv Detail & Related papers (2025-01-13T07:26:05Z)
Holistic Understanding of 3D Scenes as Universal Scene Description [56.69740649781989]
3D scene understanding is a long-standing challenge in computer vision and a key component in enabling mixed reality, wearable computing, and embodied AI. We introduce an expertly curated dataset in the Universal Scene Description (USD) format featuring high-quality manual annotations. With its broad and high-quality annotations, the data provides the basis for holistic 3D scene understanding models.
arXiv Detail & Related papers (2024-12-02T11:33:55Z)
GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding [53.42728468191711]
Open-Vocabulary 3D object affordance grounding aims to anticipate action possibilities'' regions on 3D objects with arbitrary instructions.<n>We propose GREAT (GeometRy-intEntion collAboraTive inference) for Open-Vocabulary 3D Object Affordance Grounding.
arXiv Detail & Related papers (2024-11-29T11:23:15Z)
TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding [44.206222326616526]
TACO is an extensive dataset spanning a large variety of tool-action-object compositions for daily human activities. TACO contains 2.5K motion sequences paired with third-person and egocentric views, precise hand-object 3D meshes, and action labels. We benchmark three generalizable hand-object-interaction tasks: compositional action recognition, generalizable hand-object motion forecasting, and cooperative grasp synthesis.
arXiv Detail & Related papers (2024-01-16T14:41:42Z)
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers [65.51132104404051]
We introduce the use of object identifiers and object-centric representations to interact with scenes at the object level. Our model significantly outperforms existing methods on benchmarks including ScanRefer, Multi3DRefer, Scan2Cap, ScanQA, and SQA3D.
arXiv Detail & Related papers (2023-12-13T14:27:45Z)
HANDAL: A Dataset of Real-World Manipulable Object Categories with Pose Annotations, Affordances, and Reconstructions [17.9178233068395]
We present the HANDAL dataset for category-level object pose estimation and affordance prediction. The dataset consists of 308k annotated image frames from 2.2k videos of 212 real-world objects in 17 categories. We outline the usefulness of our dataset for 6-DoF category-level pose+scale estimation and related tasks.
arXiv Detail & Related papers (2023-08-02T23:59:59Z)
Lifelong Ensemble Learning based on Multiple Representations for Few-Shot Object Recognition [6.282068591820947]
We present a lifelong ensemble learning approach based on multiple representations to address the few-shot object recognition problem. To facilitate lifelong learning, each approach is equipped with a memory unit for storing and retrieving object information instantly. We have performed extensive sets of experiments to assess the performance of the proposed approach in offline, and open-ended scenarios.
arXiv Detail & Related papers (2022-05-04T10:29:10Z)
VAT-Mart: Learning Visual Action Trajectory Proposals for Manipulating 3D ARTiculated Objects [19.296344218177534]
The space of 3D articulated objects is exceptionally rich in their myriad semantic categories, diverse shape geometry, and complicated part functionality. Previous works mostly abstract kinematic structure with estimated joint parameters and part poses as the visual representations for manipulating 3D articulated objects. We propose object-centric actionable visual priors as a novel perception-interaction handshaking point that the perception system outputs more actionable guidance than kinematic structure estimation.
arXiv Detail & Related papers (2021-06-28T07:47:31Z)
The IKEA ASM Dataset: Understanding People Assembling Furniture through Actions, Objects and Pose [108.21037046507483]
IKEA ASM is a three million frame, multi-view, furniture assembly video dataset that includes depth, atomic actions, object segmentation, and human pose. We benchmark prominent methods for video action recognition, object segmentation and human pose estimation tasks on this challenging dataset. The dataset enables the development of holistic methods, which integrate multi-modal and multi-view data to better perform on these tasks.
arXiv Detail & Related papers (2020-07-01T11:34:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.