Rethinking Self-Supervised Visual Representation Learning in
Pre-training for 3D Human Pose and Shape Estimation
- URL: http://arxiv.org/abs/2303.05370v1
- Date: Thu, 9 Mar 2023 16:17:52 GMT
- Title: Rethinking Self-Supervised Visual Representation Learning in
Pre-training for 3D Human Pose and Shape Estimation
- Authors: Hongsuk Choi, Hyeongjin Nam, Taeryung Lee, Gyeongsik Moon, Kyoung Mu
Lee
- Abstract summary: Self-supervised representation learning (SSL) methods have outperformed the ImageNet classification pre-training for vision tasks such as object detection.
We empirically study and analyze the effects of SSL and compare it with other pre-training alternatives for 3DHPSE.
Our observations challenge the naive application of the current SSL pre-training to 3DHPSE and relight the value of other data types in the pre-training aspect.
- Score: 57.206129938611454
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recently, a few self-supervised representation learning (SSL) methods have
outperformed the ImageNet classification pre-training for vision tasks such as
object detection. However, its effects on 3D human body pose and shape
estimation (3DHPSE) are open to question, whose target is fixed to a unique
class, the human, and has an inherent task gap with SSL. We empirically study
and analyze the effects of SSL and further compare it with other pre-training
alternatives for 3DHPSE. The alternatives are 2D annotation-based pre-training
and synthetic data pre-training, which share the motivation of SSL that aims to
reduce the labeling cost. They have been widely utilized as a source of
weak-supervision or fine-tuning, but have not been remarked as a pre-training
source. SSL methods underperform the conventional ImageNet classification
pre-training on multiple 3DHPSE benchmarks by 7.7% on average. In contrast,
despite a much less amount of pre-training data, the 2D annotation-based
pre-training improves accuracy on all benchmarks and shows faster convergence
during fine-tuning. Our observations challenge the naive application of the
current SSL pre-training to 3DHPSE and relight the value of other data types in
the pre-training aspect.
Related papers
- Class-Imbalanced Semi-Supervised Learning for Large-Scale Point Cloud
Semantic Segmentation via Decoupling Optimization [64.36097398869774]
Semi-supervised learning (SSL) has been an active research topic for large-scale 3D scene understanding.
The existing SSL-based methods suffer from severe training bias due to class imbalance and long-tail distributions of the point cloud data.
We introduce a new decoupling optimization framework, which disentangles feature representation learning and classifier in an alternative optimization manner to shift the bias decision boundary effectively.
arXiv Detail & Related papers (2024-01-13T04:16:40Z) - Self-supervised learning for skin cancer diagnosis with limited training data [0.196629787330046]
Self-supervised learning (SSL) is an alternative to the standard supervised pre-training on ImageNet for scenarios with limited training data.
We consider textitfurther SSL pre-training on task-specific datasets, where our implementation is motivated by supervised transfer learning.
We find minimal further SSL pre-training on task-specific data can be as effective as large-scale SSL pre-training on ImageNet for medical image classification tasks with limited labelled data.
arXiv Detail & Related papers (2024-01-01T08:11:38Z) - FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with
Pre-trained Vision-Language Models [62.663113296987085]
Few-shot class-incremental learning aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data.
We introduce two novel components: the Redundant Feature Eliminator (RFE) and the Spatial Noise Compensator (SNC)
Considering the imbalance in existing 3D datasets, we also propose new evaluation metrics that offer a more nuanced assessment of a 3D FSCIL model.
arXiv Detail & Related papers (2023-12-28T14:52:07Z) - Visual Self-supervised Learning Scheme for Dense Prediction Tasks on X-ray Images [3.782392436834913]
Self-supervised learning (SSL) has led to considerable progress in natural language processing (NLP)
However, the incorporation of contrastive learning into existing visual SSL models has led to considerable progress, often surpassing supervised counterparts.
Here, we focus on dense prediction tasks using security inspection x-ray images to evaluate our proposed model, Segment localization (SegLoc)
Based upon the Instance localization (InsLoc) model, SegLoc addresses one of the key challenges of contrastive learning, i.e., false negative pairs of query embeddings.
arXiv Detail & Related papers (2023-10-12T15:42:17Z) - Understanding and Improving the Role of Projection Head in
Self-Supervised Learning [77.59320917894043]
Self-supervised learning (SSL) aims to produce useful feature representations without access to human-labeled data annotations.
Current contrastive learning approaches append a parametrized projection head to the end of some backbone network to optimize the InfoNCE objective.
This raises a fundamental question: Why is a learnable projection head required if we are to discard it after training?
arXiv Detail & Related papers (2022-12-22T05:42:54Z) - Class-Level Confidence Based 3D Semi-Supervised Learning [18.95161296147023]
We show that unlabeled data class-level confidence can represent the learning status in the 3D imbalanced dataset.
Our method significantly outperforms state-of-the-art counterparts for both 3D SSL classification and detection tasks.
arXiv Detail & Related papers (2022-10-18T20:13:28Z) - A Closer Look at Invariances in Self-supervised Pre-training for 3D
Vision [0.0]
Self-supervised pre-training for 3D vision has drawn increasing research interest in recent years.
We present a unified framework under which various pre-training methods can be investigated.
We propose a simple but effective method that jointly pre-trains a 3D encoder and a depth map encoder using contrastive learning.
arXiv Detail & Related papers (2022-07-11T16:44:15Z) - Open-Set Semi-Supervised Learning for 3D Point Cloud Understanding [62.17020485045456]
It is commonly assumed in semi-supervised learning (SSL) that the unlabeled data are drawn from the same distribution as that of the labeled ones.
We propose to selectively utilize unlabeled data through sample weighting, so that only conducive unlabeled data would be prioritized.
arXiv Detail & Related papers (2022-05-02T16:09:17Z) - Advancing 3D Medical Image Analysis with Variable Dimension Transform
based Supervised 3D Pre-training [45.90045513731704]
This paper revisits an innovative yet simple fully-supervised 3D network pre-training framework.
With a redesigned 3D network architecture, reformulated natural images are used to address the problem of data scarcity.
Comprehensive experiments on four benchmark datasets demonstrate that the proposed pre-trained models can effectively accelerate convergence.
arXiv Detail & Related papers (2022-01-05T03:11:21Z) - Cascaded deep monocular 3D human pose estimation with evolutionary
training data [76.3478675752847]
Deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation.
This paper proposes a novel data augmentation method that is scalable for massive amount of training data.
Our method synthesizes unseen 3D human skeletons based on a hierarchical human representation and synthesizings inspired by prior knowledge.
arXiv Detail & Related papers (2020-06-14T03:09:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.