ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation via
Online Exploration and Synthesis
- URL: http://arxiv.org/abs/2109.05488v1
- Date: Sun, 12 Sep 2021 11:15:42 GMT
- Title: ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation via
Online Exploration and Synthesis
- Authors: Kailin Li, Lixin Yang, Xinyu Zhan, Jun Lv, Wenqiang Xu, Jiefeng Li,
Cewu Lu
- Abstract summary: ArtiBoost is a lightweight online data enrichment method that boosts articulated hand-object pose estimation from the data perspective.
We apply ArtiBoost on a simple learning baseline network and demonstrate the performance boost on several hand-object benchmarks.
- Score: 38.54763542838848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating the articulated 3D hand-object pose from a single RGB image is a
highly ambiguous and challenging problem requiring large-scale datasets that
contain diverse hand poses, object poses, and camera viewpoints. Most
real-world datasets lack this diversity. In contrast, synthetic datasets can
easily ensure vast diversity, but learning from them is inefficient and suffers
from heavy training consumption. To address the above issues, we propose
ArtiBoost, a lightweight online data enrichment method that boosts articulated
hand-object pose estimation from the data perspective. ArtiBoost is employed
along with a real-world source dataset. During training, ArtiBoost
alternatively performs data exploration and synthesis. ArtiBoost can cover
various hand-object poses and camera viewpoints based on a Compositional
hand-object Configuration and Viewpoint space (CCV-space) and can adaptively
enrich the current hard-discernable samples by a mining strategy. We apply
ArtiBoost on a simple learning baseline network and demonstrate the performance
boost on several hand-object benchmarks. As an illustrative example, with
ArtiBoost, even a simple baseline network can outperform the previous
start-of-the-art based on Transformer on the HO3D dataset. Our code is
available at https://github.com/MVIG-SJTU/ArtiBoost.
Related papers
- HOGSA: Bimanual Hand-Object Interaction Understanding with 3D Gaussian Splatting Based Data Augmentation [29.766317710266765]
We propose a new 3D Gaussian Splatting based data augmentation framework for bimanual hand-object interaction.
We use mesh-based 3DGS to model objects and hands, and to deal with the rendering blur problem due to multi-resolution input images used.
We extend the single hand grasping pose optimization module for the bimanual hand object to generate various poses of bimanual hand-object interaction.
arXiv Detail & Related papers (2025-01-06T08:48:17Z) - HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions [68.28684509445529]
We present HandBooster, a new approach to uplift the data diversity and boost the 3D hand-mesh reconstruction performance.
First, we construct versatile content-aware conditions to guide a diffusion model to produce realistic images with diverse hand appearances, poses, views, and backgrounds.
Then, we design a novel condition creator based on our similarity-aware distribution sampling strategies to deliberately find novel and realistic interaction poses that are distinctive from the training set.
arXiv Detail & Related papers (2024-03-27T13:56:08Z) - Boosting Semi-Supervised 2D Human Pose Estimation by Revisiting Data Augmentation and Consistency Training [54.074020740827855]
We find that SSHPE can be boosted from two cores: advanced data augmentations and concise consistency training ways.
This simple and compact design is interpretable, and easily benefits from newly found augmentations.
We extensively validate the superiority and versatility of our approach on conventional human body images, overhead fisheye images, and human hand images.
arXiv Detail & Related papers (2024-02-18T12:27:59Z) - Objaverse: A Universe of Annotated 3D Objects [53.2537614157313]
We present averse 1.0, a large dataset of objects with 800K+ (and growing) 3D models with descriptive tags, captions and animations.
We demonstrate the large potential of averse 3D models via four applications: training diverse 3D models, improving tail category segmentation on the LVIS benchmark, training open-vocabulary object-navigation models for Embodied vision models, and creating a new benchmark for robustness analysis of vision models.
arXiv Detail & Related papers (2022-12-15T18:56:53Z) - Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond
Algorithms [31.2529724533643]
This work presents the first comprehensive benchmarking study from three under-explored perspectives beyond algorithms.
An analysis on 31 datasets reveals the distinct impacts of data samples.
We achieve a PA-MPJPE of 47.3 mm on the 3DPW test set with a relatively simple model.
arXiv Detail & Related papers (2022-09-21T17:39:53Z) - Playing for 3D Human Recovery [88.91567909861442]
In this work, we obtain massive human sequences by playing the video game with automatically annotated 3D ground truths.
Specifically, we contribute GTA-Human, a large-scale 3D human dataset generated with the GTA-V game engine.
A simple frame-based baseline trained on GTA-Human outperforms more sophisticated methods by a large margin.
arXiv Detail & Related papers (2021-10-14T17:49:42Z) - PointContrast: Unsupervised Pre-training for 3D Point Cloud
Understanding [107.02479689909164]
In this work, we aim at facilitating research on 3D representation learning.
We measure the effect of unsupervised pre-training on a large source set of 3D scenes.
arXiv Detail & Related papers (2020-07-21T17:59:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.