Related papers: ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation via Online Exploration and Synthesis

ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation via Online Exploration and Synthesis

URL: http://arxiv.org/abs/2109.05488v1
Date: Sun, 12 Sep 2021 11:15:42 GMT
Title: ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation via Online Exploration and Synthesis
Authors: Kailin Li, Lixin Yang, Xinyu Zhan, Jun Lv, Wenqiang Xu, Jiefeng Li, Cewu Lu
Abstract summary: ArtiBoost is a lightweight online data enrichment method that boosts articulated hand-object pose estimation from the data perspective. We apply ArtiBoost on a simple learning baseline network and demonstrate the performance boost on several hand-object benchmarks.
Score: 38.54763542838848
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Estimating the articulated 3D hand-object pose from a single RGB image is a highly ambiguous and challenging problem requiring large-scale datasets that contain diverse hand poses, object poses, and camera viewpoints. Most real-world datasets lack this diversity. In contrast, synthetic datasets can easily ensure vast diversity, but learning from them is inefficient and suffers from heavy training consumption. To address the above issues, we propose ArtiBoost, a lightweight online data enrichment method that boosts articulated hand-object pose estimation from the data perspective. ArtiBoost is employed along with a real-world source dataset. During training, ArtiBoost alternatively performs data exploration and synthesis. ArtiBoost can cover various hand-object poses and camera viewpoints based on a Compositional hand-object Configuration and Viewpoint space (CCV-space) and can adaptively enrich the current hard-discernable samples by a mining strategy. We apply ArtiBoost on a simple learning baseline network and demonstrate the performance boost on several hand-object benchmarks. As an illustrative example, with ArtiBoost, even a simple baseline network can outperform the previous start-of-the-art based on Transformer on the HO3D dataset. Our code is available at https://github.com/MVIG-SJTU/ArtiBoost.

Related papers

HOGSA: Bimanual Hand-Object Interaction Understanding with 3D Gaussian Splatting Based Data Augmentation [29.766317710266765]
We propose a new 3D Gaussian Splatting based data augmentation framework for bimanual hand-object interaction. We use mesh-based 3DGS to model objects and hands, and to deal with the rendering blur problem due to multi-resolution input images used. We extend the single hand grasping pose optimization module for the bimanual hand object to generate various poses of bimanual hand-object interaction.
arXiv Detail & Related papers (2025-01-06T08:48:17Z)
Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns. A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z)
HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions [68.28684509445529]
We present HandBooster, a new approach to uplift the data diversity and boost the 3D hand-mesh reconstruction performance. First, we construct versatile content-aware conditions to guide a diffusion model to produce realistic images with diverse hand appearances, poses, views, and backgrounds. Then, we design a novel condition creator based on our similarity-aware distribution sampling strategies to deliberately find novel and realistic interaction poses that are distinctive from the training set.
arXiv Detail & Related papers (2024-03-27T13:56:08Z)
Boosting Semi-Supervised 2D Human Pose Estimation by Revisiting Data Augmentation and Consistency Training [54.074020740827855]
We find that SSHPE can be boosted from two cores: advanced data augmentations and concise consistency training ways. This simple and compact design is interpretable, and easily benefits from newly found augmentations. We extensively validate the superiority and versatility of our approach on conventional human body images, overhead fisheye images, and human hand images.
arXiv Detail & Related papers (2024-02-18T12:27:59Z)
Objaverse: A Universe of Annotated 3D Objects [53.2537614157313]
We present averse 1.0, a large dataset of objects with 800K+ (and growing) 3D models with descriptive tags, captions and animations. We demonstrate the large potential of averse 3D models via four applications: training diverse 3D models, improving tail category segmentation on the LVIS benchmark, training open-vocabulary object-navigation models for Embodied vision models, and creating a new benchmark for robustness analysis of vision models.
arXiv Detail & Related papers (2022-12-15T18:56:53Z)
PoseScript: Linking 3D Human Poses and Natural Language [38.85620213438554]
We introduce the PoseScript dataset, which pairs more than six thousand 3D human poses with rich human-annotated descriptions. To increase the size of the dataset to a scale that is compatible with data-hungry learning algorithms, we have proposed an elaborate captioning process. This process extracts low-level pose information, known as "posecodes", using a set of simple but generic rules on the 3D keypoints. With automatic annotations, the amount of available data significantly scales up (100k), making it possible to effectively pretrain deep models for finetuning on human captions.
arXiv Detail & Related papers (2022-10-21T08:18:49Z)
Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond Algorithms [31.2529724533643]
This work presents the first comprehensive benchmarking study from three under-explored perspectives beyond algorithms. An analysis on 31 datasets reveals the distinct impacts of data samples. We achieve a PA-MPJPE of 47.3 mm on the 3DPW test set with a relatively simple model.
arXiv Detail & Related papers (2022-09-21T17:39:53Z)
Playing for 3D Human Recovery [88.91567909861442]
In this work, we obtain massive human sequences by playing the video game with automatically annotated 3D ground truths. Specifically, we contribute GTA-Human, a large-scale 3D human dataset generated with the GTA-V game engine. A simple frame-based baseline trained on GTA-Human outperforms more sophisticated methods by a large margin.
arXiv Detail & Related papers (2021-10-14T17:49:42Z)
PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding [107.02479689909164]
In this work, we aim at facilitating research on 3D representation learning. We measure the effect of unsupervised pre-training on a large source set of 3D scenes.
arXiv Detail & Related papers (2020-07-21T17:59:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.