Mask Usage Recognition using Vision Transformer with Transfer Learning
and Data Augmentation
- URL: http://arxiv.org/abs/2203.11542v1
- Date: Tue, 22 Mar 2022 08:50:41 GMT
- Title: Mask Usage Recognition using Vision Transformer with Transfer Learning
and Data Augmentation
- Authors: Hensel Donato Jahja, Novanto Yudistira, Sutrisno
- Abstract summary: MaskedFace-Net is a suitable dataset consisting of 137016 digital images with 4 class labels, namely Mask, Mask Chin, Mask Mouth Chin, and Mask Nose Mouth.
This study found that the best classification is transfer learning and augmentation using ViT Huge-14.
This research shows that training the ViT model with data augmentation transfer learning improves classification of the mask usage, even better than convolutional-based Residual Network (ResNet)
- Score: 2.191505742658975
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The COVID-19 pandemic has disrupted various levels of society. The use of
masks is essential in preventing the spread of COVID-19 by identifying an image
of a person using a mask. Although only 23.1% of people use masks correctly,
Artificial Neural Networks (ANN) can help classify the use of good masks to
help slow the spread of the Covid-19 virus. However, it requires a large
dataset to train an ANN that can classify the use of masks correctly.
MaskedFace-Net is a suitable dataset consisting of 137016 digital images with 4
class labels, namely Mask, Mask Chin, Mask Mouth Chin, and Mask Nose Mouth.
Mask classification training utilizes Vision Transformers (ViT) architecture
with transfer learning method using pre-trained weights on ImageNet-21k, with
random augmentation. In addition, the hyper-parameters of training of 20
epochs, an Stochastic Gradient Descent (SGD) optimizer with a learning rate of
0.03, a batch size of 64, a Gaussian Cumulative Distribution (GeLU) activation
function, and a Cross-Entropy loss function are used to be applied on the
training of three architectures of ViT, namely Base-16, Large-16, and Huge-14.
Furthermore, comparisons of with and without augmentation and transfer learning
are conducted. This study found that the best classification is transfer
learning and augmentation using ViT Huge-14. Using this method on
MaskedFace-Net dataset, the research reaches an accuracy of 0.9601 on training
data, 0.9412 on validation data, and 0.9534 on test data. This research shows
that training the ViT model with data augmentation and transfer learning
improves classification of the mask usage, even better than convolutional-based
Residual Network (ResNet).
Related papers
- Facial Emotion Recognition Under Mask Coverage Using a Data Augmentation
Technique [0.0]
We propose a facial emotion recognition system capable of recognizing emotions from individuals wearing different face masks.
We evaluated the effectiveness of four convolutional neural networks that were trained using transfer learning.
The Resnet50 has demonstrated superior performance, with accuracies of 73.68% for the person-dependent mode and 59.57% for the person-independent mode.
arXiv Detail & Related papers (2023-12-03T09:50:46Z) - A transfer learning approach with convolutional neural network for Face
Mask Detection [0.30693357740321775]
We propose a mask recognition system based on transfer learning and Inception v3 architecture.
In addition to masked and unmasked faces, it can also detect cases of incorrect use of mask.
arXiv Detail & Related papers (2023-10-29T07:38:33Z) - Fast Training of Diffusion Models with Masked Transformers [107.77340216247516]
We propose an efficient approach to train large diffusion models with masked transformers.
Specifically, we randomly mask out a high proportion of patches in diffused input images during training.
Experiments on ImageNet-256x256 and ImageNet-512x512 show that our approach achieves competitive and even better generative performance than the state-of-the-art Diffusion Transformer (DiT) model.
arXiv Detail & Related papers (2023-06-15T17:38:48Z) - Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data.
We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process.
In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z) - A Unified View of Masked Image Modeling [117.79456335844439]
Masked image modeling has demonstrated great potential to eliminate the label-hungry problem of training large-scale vision Transformers.
We introduce a simple yet effective method, termed as MaskDistill, which reconstructs normalized semantic features from teacher models at the masked positions.
Experimental results on image classification and semantic segmentation show that MaskDistill achieves comparable or superior performance than state-of-the-art methods.
arXiv Detail & Related papers (2022-10-19T14:59:18Z) - Adversarial Masking for Self-Supervised Learning [81.25999058340997]
Masked image model (MIM) framework for self-supervised learning, ADIOS, is proposed.
It simultaneously learns a masking function and an image encoder using an adversarial objective.
It consistently improves on state-of-the-art self-supervised learning (SSL) methods on a variety of tasks and datasets.
arXiv Detail & Related papers (2022-01-31T10:23:23Z) - A Comparative Analysis of Machine Learning Approaches for Automated Face
Mask Detection During COVID-19 [0.0]
WHO recommends wearing face masks as one of the most effective measures to prevent COVID-19 transmission.
We explore a number of deep learning models for face-mask detection and evaluate them on two benchmark datasets.
We find that while the performances of all the models are quite good, transfer learning models achieve the best performance.
arXiv Detail & Related papers (2021-12-15T06:30:50Z) - COVID-19 Face Mask Recognition with Advanced Face Cut Algorithm for
Human Safety Measures [0.0]
COVID-19 is a highly contaminated disease that affects mainly the respiratory organs of the human body.
Our proposal deploys a computer vision and deep learning framework to recognize face masks from images or videos.
The experimental result shows a significant advancement of 3.4 percent compared to the YOLOV3 mask recognition architecture in just 10 epochs.
arXiv Detail & Related papers (2021-10-08T18:03:36Z) - Boosting Masked Face Recognition with Multi-Task ArcFace [0.973681576519524]
Given the global health crisis caused by COVID-19, mouth and nose-covering masks have become an essential everyday-clothing-accessory.
This measure has put the state-of-the-art face recognition models on the ropes since they have not been designed to work with masked faces.
A full training pipeline is presented based on the ArcFace work, with several modifications for the backbone and the loss function.
arXiv Detail & Related papers (2021-04-20T10:12:04Z) - Contrastive Context-Aware Learning for 3D High-Fidelity Mask Face
Presentation Attack Detection [103.7264459186552]
Face presentation attack detection (PAD) is essential to secure face recognition systems.
Most existing 3D mask PAD benchmarks suffer from several drawbacks.
We introduce a largescale High-Fidelity Mask dataset to bridge the gap to real-world applications.
arXiv Detail & Related papers (2021-04-13T12:48:38Z) - BinaryCoP: Binary Neural Network-based COVID-19 Face-Mask Wear and
Positioning Predictor on Edge Devices [63.56630165340053]
Face masks offer an effective solution in healthcare for bi-directional protection against air-borne diseases.
CNNs offer an excellent solution for face recognition and classification of correct mask wearing and positioning.
CNNs can be used at entrances to corporate buildings, airports, shopping areas, and other indoor locations, to mitigate the spread of the virus.
arXiv Detail & Related papers (2021-02-06T00:14:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.