Active Learning with Pseudo-Labels for Multi-View 3D Pose Estimation
- URL: http://arxiv.org/abs/2112.13709v1
- Date: Mon, 27 Dec 2021 14:34:25 GMT
- Title: Active Learning with Pseudo-Labels for Multi-View 3D Pose Estimation
- Authors: Qi Feng, Kun He, He Wen, Cem Keskin, Yuting Ye
- Abstract summary: We improve Active Learning for the problem of 3D pose estimation in a multi-view setting.
We develop a framework that allows us to efficiently extend existing single-view AL strategies.
We demonstrate additional performance gains by incorporating predicted pseudo-labels, which is a form of self-training.
- Score: 18.768030475943213
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pose estimation of the human body/hand is a fundamental problem in computer
vision, and learning-based solutions require a large amount of annotated data.
Given limited annotation budgets, a common approach to increasing label
efficiency is Active Learning (AL), which selects examples with the highest
value to annotate, but choosing the selection strategy is often nontrivial.
In this work, we improve Active Learning for the problem of 3D pose
estimation in a multi-view setting, which is of increasing importance in many
application scenarios. We develop a framework that allows us to efficiently
extend existing single-view AL strategies, and then propose two novel AL
strategies that make full use of multi-view geometry. Moreover, we demonstrate
additional performance gains by incorporating predicted pseudo-labels, which is
a form of self-training. Our system significantly outperforms baselines in 3D
body and hand pose estimation on two large-scale benchmarks: CMU Panoptic
Studio and InterHand2.6M. Notably, on CMU Panoptic Studio, we are able to match
the performance of a fully-supervised model using only 20% of labeled training
data.
Related papers
- Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training [44.790636524264]
Point Prompt Training is a novel framework for multi-dataset synergistic learning in the context of 3D representation learning.
It can overcome the negative transfer associated with synergistic learning and produce generalizable representations.
It achieves state-of-the-art performance on each dataset using a single weight-shared model with supervised multi-dataset training.
arXiv Detail & Related papers (2023-08-18T17:59:57Z) - Simultaneous Multiple Object Detection and Pose Estimation using 3D
Model Infusion with Monocular Vision [21.710141497071373]
Multiple object detection and pose estimation are vital computer vision tasks.
We propose simultaneous neural modeling of both using monocular vision and 3D model infusion.
Our Simultaneous Multiple Object detection and Pose Estimation network (SMOPE-Net) is an end-to-end trainable multitasking network.
arXiv Detail & Related papers (2022-11-21T05:18:56Z) - Open-Set Semi-Supervised Learning for 3D Point Cloud Understanding [62.17020485045456]
It is commonly assumed in semi-supervised learning (SSL) that the unlabeled data are drawn from the same distribution as that of the labeled ones.
We propose to selectively utilize unlabeled data through sample weighting, so that only conducive unlabeled data would be prioritized.
arXiv Detail & Related papers (2022-05-02T16:09:17Z) - Unsupervised Learning on 3D Point Clouds by Clustering and Contrasting [11.64827192421785]
unsupervised representation learning is a promising direction to auto-extract features without human intervention.
This paper proposes a general unsupervised approach, named textbfConClu, to perform the learning of point-wise and global features.
arXiv Detail & Related papers (2022-02-05T12:54:17Z) - Activation to Saliency: Forming High-Quality Labels for Unsupervised
Salient Object Detection [54.92703325989853]
We propose a two-stage Activation-to-Saliency (A2S) framework that effectively generates high-quality saliency cues.
No human annotations are involved in our framework during the whole training process.
Our framework reports significant performance compared with existing USOD methods.
arXiv Detail & Related papers (2021-12-07T11:54:06Z) - Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images [79.70127290464514]
We decompose the task into two stages, i.e. person localization and pose estimation.
And we propose three task-specific graph neural networks for effective message passing.
Our approach achieves state-of-the-art performance on CMU Panoptic and Shelf datasets.
arXiv Detail & Related papers (2021-09-13T11:44:07Z) - Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose Estimation [52.94078950641959]
We present a deployment friendly, fast bottom-up framework for multi-person 3D human pose estimation.
We adopt a novel neural representation of multi-person 3D pose which unifies the position of person instances with their corresponding 3D pose representation.
We propose a practical deployment paradigm where paired 2D or 3D pose annotations are unavailable.
arXiv Detail & Related papers (2020-08-04T07:54:25Z) - Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the
Wild [101.70320427145388]
We propose a weakly-supervised approach that does not require 3D annotations and learns to estimate 3D poses from unlabeled multi-view data.
We evaluate our proposed approach on two large scale datasets.
arXiv Detail & Related papers (2020-03-17T08:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.