ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation via
Online Exploration and Synthesis
- URL: http://arxiv.org/abs/2109.05488v1
- Date: Sun, 12 Sep 2021 11:15:42 GMT
- Title: ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation via
Online Exploration and Synthesis
- Authors: Kailin Li, Lixin Yang, Xinyu Zhan, Jun Lv, Wenqiang Xu, Jiefeng Li,
Cewu Lu
- Abstract summary: ArtiBoost is a lightweight online data enrichment method that boosts articulated hand-object pose estimation from the data perspective.
We apply ArtiBoost on a simple learning baseline network and demonstrate the performance boost on several hand-object benchmarks.
- Score: 38.54763542838848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating the articulated 3D hand-object pose from a single RGB image is a
highly ambiguous and challenging problem requiring large-scale datasets that
contain diverse hand poses, object poses, and camera viewpoints. Most
real-world datasets lack this diversity. In contrast, synthetic datasets can
easily ensure vast diversity, but learning from them is inefficient and suffers
from heavy training consumption. To address the above issues, we propose
ArtiBoost, a lightweight online data enrichment method that boosts articulated
hand-object pose estimation from the data perspective. ArtiBoost is employed
along with a real-world source dataset. During training, ArtiBoost
alternatively performs data exploration and synthesis. ArtiBoost can cover
various hand-object poses and camera viewpoints based on a Compositional
hand-object Configuration and Viewpoint space (CCV-space) and can adaptively
enrich the current hard-discernable samples by a mining strategy. We apply
ArtiBoost on a simple learning baseline network and demonstrate the performance
boost on several hand-object benchmarks. As an illustrative example, with
ArtiBoost, even a simple baseline network can outperform the previous
start-of-the-art based on Transformer on the HO3D dataset. Our code is
available at https://github.com/MVIG-SJTU/ArtiBoost.
Related papers
- Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions [68.28684509445529]
We present HandBooster, a new approach to uplift the data diversity and boost the 3D hand-mesh reconstruction performance.
First, we construct versatile content-aware conditions to guide a diffusion model to produce realistic images with diverse hand appearances, poses, views, and backgrounds.
Then, we design a novel condition creator based on our similarity-aware distribution sampling strategies to deliberately find novel and realistic interaction poses that are distinctive from the training set.
arXiv Detail & Related papers (2024-03-27T13:56:08Z) - Objaverse: A Universe of Annotated 3D Objects [53.2537614157313]
We present averse 1.0, a large dataset of objects with 800K+ (and growing) 3D models with descriptive tags, captions and animations.
We demonstrate the large potential of averse 3D models via four applications: training diverse 3D models, improving tail category segmentation on the LVIS benchmark, training open-vocabulary object-navigation models for Embodied vision models, and creating a new benchmark for robustness analysis of vision models.
arXiv Detail & Related papers (2022-12-15T18:56:53Z) - PoseScript: Linking 3D Human Poses and Natural Language [38.85620213438554]
We introduce the PoseScript dataset, which pairs more than six thousand 3D human poses with rich human-annotated descriptions.
To increase the size of the dataset to a scale that is compatible with data-hungry learning algorithms, we have proposed an elaborate captioning process.
This process extracts low-level pose information, known as "posecodes", using a set of simple but generic rules on the 3D keypoints.
With automatic annotations, the amount of available data significantly scales up (100k), making it possible to effectively pretrain deep models for finetuning on human captions.
arXiv Detail & Related papers (2022-10-21T08:18:49Z) - Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond
Algorithms [31.2529724533643]
This work presents the first comprehensive benchmarking study from three under-explored perspectives beyond algorithms.
An analysis on 31 datasets reveals the distinct impacts of data samples.
We achieve a PA-MPJPE of 47.3 mm on the 3DPW test set with a relatively simple model.
arXiv Detail & Related papers (2022-09-21T17:39:53Z) - Playing for 3D Human Recovery [88.91567909861442]
In this work, we obtain massive human sequences by playing the video game with automatically annotated 3D ground truths.
Specifically, we contribute GTA-Human, a large-scale 3D human dataset generated with the GTA-V game engine.
A simple frame-based baseline trained on GTA-Human outperforms more sophisticated methods by a large margin.
arXiv Detail & Related papers (2021-10-14T17:49:42Z) - PointContrast: Unsupervised Pre-training for 3D Point Cloud
Understanding [107.02479689909164]
In this work, we aim at facilitating research on 3D representation learning.
We measure the effect of unsupervised pre-training on a large source set of 3D scenes.
arXiv Detail & Related papers (2020-07-21T17:59:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.