How Self-Supervised Learning Can be Used for Fine-Grained Head Pose
Estimation?
- URL: http://arxiv.org/abs/2108.04893v2
- Date: Thu, 12 Aug 2021 17:32:46 GMT
- Title: How Self-Supervised Learning Can be Used for Fine-Grained Head Pose
Estimation?
- Authors: Mahdi Pourmirzaei and Gholam Ali Montazer and Farzaneh Esmaili
- Abstract summary: We have tried to answer a question: How SSL can be used for Head Pose estimation?
modified versions of jigsaw puzzling and rotation as SSL pre-text tasks are used.
The error rate reduced by the HTML method up to 11% compare to the SL.
- Score: 2.0625936401496237
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent progress of Self-Supervised Learning (SSL) demonstrates the capability
of these methods in computer vision field. However, this progress could not
show any promises for fine-grained tasks such as Head Pose estimation. In this
article, we have tried to answer a question: How SSL can be used for Head Pose
estimation? In general, there are two main approaches to use SSL: 1. Using
pre-trained weights which can be done via weights pre-training on ImageNet or
via SSL tasks. 2. Leveraging SSL as an auxiliary co-training task besides of
Supervised Learning (SL) tasks at the same time. In this study, modified
versions of jigsaw puzzling and rotation as SSL pre-text tasks are used and the
best architecture for our proposed Hybrid Multi-Task Learning (HMTL) is found.
Finally, the HopeNet method as a baseline is selected and the impact of SSL
pre-training and ImageNet pre-training on both HMTL and SL are compared. The
error rate reduced by the HTML method up to 11% compare to the SL. Moreover,
HMTL method showed that it was good with all kinds of initial weights: random,
ImageNet and SSL pre-training weights. Also, it was observed, when puzzled
images are used for SL alone, the average error rate placed between SL and HMTL
which showed the importance of local spatial features compare to global spatial
features.
Related papers
- Erasing the Bias: Fine-Tuning Foundation Models for Semi-Supervised Learning [4.137391543972184]
Semi-supervised learning (SSL) has witnessed remarkable progress, resulting in numerous method variations.
In this paper, we present a novel SSL approach named FineSSL that significantly addresses this limitation by adapting pre-trained foundation models.
We demonstrate that FineSSL sets a new state of the art for SSL on multiple benchmark datasets, reduces the training cost by over six times, and can seamlessly integrate various fine-tuning and modern SSL algorithms.
arXiv Detail & Related papers (2024-05-20T03:33:12Z) - Self-supervised visual learning in the low-data regime: a comparative evaluation [40.27083924454058]
Self-Supervised Learning (SSL) is a robust training methodology for contemporary Deep Neural Networks (DNNs)
This work introduces a taxonomy of modern visual SSL methods, accompanied by detailed explanations and insights regarding the main categories of approaches.
For domain-specific downstream tasks, in-domain low-data SSL pretraining outperforms the common approach of large-scale pretraining.
arXiv Detail & Related papers (2024-04-26T07:23:14Z) - DailyMAE: Towards Pretraining Masked Autoencoders in One Day [37.206816999538496]
Masked image modeling (MIM) has drawn attention for its effectiveness in learning data representation from unlabeled data.
In this study, we propose efficient training recipes for MIM based SSL that focuses on mitigating data loading bottlenecks.
Our library enables the training of a MAE-Base/16 model on the ImageNet 1K dataset for 800 epochs within just 18 hours.
arXiv Detail & Related papers (2024-03-31T00:59:10Z) - Rethinking Self-Supervised Visual Representation Learning in
Pre-training for 3D Human Pose and Shape Estimation [57.206129938611454]
Self-supervised representation learning (SSL) methods have outperformed the ImageNet classification pre-training for vision tasks such as object detection.
We empirically study and analyze the effects of SSL and compare it with other pre-training alternatives for 3DHPSE.
Our observations challenge the naive application of the current SSL pre-training to 3DHPSE and relight the value of other data types in the pre-training aspect.
arXiv Detail & Related papers (2023-03-09T16:17:52Z) - Understanding and Improving the Role of Projection Head in
Self-Supervised Learning [77.59320917894043]
Self-supervised learning (SSL) aims to produce useful feature representations without access to human-labeled data annotations.
Current contrastive learning approaches append a parametrized projection head to the end of some backbone network to optimize the InfoNCE objective.
This raises a fundamental question: Why is a learnable projection head required if we are to discard it after training?
arXiv Detail & Related papers (2022-12-22T05:42:54Z) - DATA: Domain-Aware and Task-Aware Pre-training [94.62676913928831]
We present DATA, a simple yet effective NAS approach specialized for self-supervised learning (SSL)
Our method achieves promising results across a wide range of computation costs on downstream tasks, including image classification, object detection and semantic segmentation.
arXiv Detail & Related papers (2022-03-17T02:38:49Z) - Sound and Visual Representation Learning with Multiple Pretraining Tasks [104.11800812671953]
Self-supervised tasks (SSL) reveal different features from the data.
This work aims to combine Multiple SSL tasks (Multi-SSL) that generalizes well for all downstream tasks.
Experiments on sound representations demonstrate that Multi-SSL via incremental learning (IL) of SSL tasks outperforms single SSL task models.
arXiv Detail & Related papers (2022-01-04T09:09:38Z) - Interventional Few-Shot Learning [88.31112565383457]
We propose a novel Few-Shot Learning paradigm: Interventional Few-Shot Learning.
Code is released at https://github.com/yue-zhongqi/ifsl.
arXiv Detail & Related papers (2020-09-28T01:16:54Z) - TAFSSL: Task-Adaptive Feature Sub-Space Learning for few-shot
classification [50.358839666165764]
We show that the Task-Adaptive Feature Sub-Space Learning (TAFSSL) can significantly boost the performance in Few-Shot Learning scenarios.
Specifically, we show that on the challenging miniImageNet and tieredImageNet benchmarks, TAFSSL can improve the current state-of-the-art in both transductive and semi-supervised FSL settings by more than $5%$.
arXiv Detail & Related papers (2020-03-14T16:59:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.