Liveness Detection in Computer Vision: Transformer-based Self-Supervised Learning for Face Anti-Spoofing
- URL: http://arxiv.org/abs/2406.13860v1
- Date: Wed, 19 Jun 2024 21:44:43 GMT
- Title: Liveness Detection in Computer Vision: Transformer-based Self-Supervised Learning for Face Anti-Spoofing
- Authors: Arman Keresh, Pakizar Shamoi,
- Abstract summary: Face recognition systems are vulnerable to spoofing attacks, where attackers use photos, videos, or masks to impersonate legitimate users.
This research addresses these vulnerabilities by exploring the Vision Transformer (ViT) architecture, fine-tuned with the DINO framework.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Face recognition systems are increasingly used in biometric security for convenience and effectiveness. However, they remain vulnerable to spoofing attacks, where attackers use photos, videos, or masks to impersonate legitimate users. This research addresses these vulnerabilities by exploring the Vision Transformer (ViT) architecture, fine-tuned with the DINO framework. The DINO framework facilitates self-supervised learning, enabling the model to learn distinguishing features from unlabeled data. We compared the performance of the proposed fine-tuned ViT model using the DINO framework against a traditional CNN model, EfficientNet b2, on the face anti-spoofing task. Numerous tests on standard datasets show that the ViT model performs better than the CNN model in terms of accuracy and resistance to different spoofing methods. Additionally, we collected our own dataset from a biometric application to validate our findings further. This study highlights the superior performance of transformer-based architecture in identifying complex spoofing cues, leading to significant advancements in biometric security.
Related papers
- Combined CNN and ViT features off-the-shelf: Another astounding baseline for recognition [49.14350399025926]
We apply pre-trained architectures, originally developed for the ImageNet Large Scale Visual Recognition Challenge, for periocular recognition.
Middle-layer features from CNNs and ViTs are a suitable way to recognize individuals based on periocular images.
arXiv Detail & Related papers (2024-07-28T11:52:36Z) - Fiducial Focus Augmentation for Facial Landmark Detection [4.433764381081446]
We propose a novel image augmentation technique to enhance the model's understanding of facial structures.
We employ a Siamese architecture-based training mechanism with a Deep Canonical Correlation Analysis (DCCA)-based loss.
Our approach outperforms multiple state-of-the-art approaches across various benchmark datasets.
arXiv Detail & Related papers (2024-02-23T01:34:00Z) - AttackNet: Enhancing Biometric Security via Tailored Convolutional Neural Network Architectures for Liveness Detection [20.821562115822182]
AttackNet is a bespoke Convolutional Neural Network architecture designed to combat spoofing threats in biometric systems.
It offers a layered defense mechanism, seamlessly transitioning from low-level feature extraction to high-level pattern discernment.
Benchmarking our model across diverse datasets affirms its prowess, showcasing superior performance metrics in comparison to contemporary models.
arXiv Detail & Related papers (2024-02-06T07:22:50Z) - Embedding Non-Distortive Cancelable Face Template Generation [22.80706131626207]
We introduce an innovative image distortion technique that makes facial images unrecognizable to the eye but still identifiable by any custom embedding neural network model.
We test the reliability of biometric recognition networks by determining the maximum image distortion that does not change the predicted identity.
arXiv Detail & Related papers (2024-02-04T15:39:18Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Benchmarking Detection Transfer Learning with Vision Transformers [60.97703494764904]
complexity of object detection methods can make benchmarking non-trivial when new architectures, such as Vision Transformer (ViT) models, arrive.
We present training techniques that overcome these challenges, enabling the use of standard ViT models as the backbone of Mask R-CNN.
Our results show that recent masking-based unsupervised learning methods may, for the first time, provide convincing transfer learning improvements on COCO.
arXiv Detail & Related papers (2021-11-22T18:59:15Z) - Federated Test-Time Adaptive Face Presentation Attack Detection with
Dual-Phase Privacy Preservation [100.69458267888962]
Face presentation attack detection (fPAD) plays a critical role in the modern face recognition pipeline.
Due to legal and privacy issues, training data (real face images and spoof images) are not allowed to be directly shared between different data sources.
We propose a Federated Test-Time Adaptive Face Presentation Attack Detection with Dual-Phase Privacy Preservation framework.
arXiv Detail & Related papers (2021-10-25T02:51:05Z) - Shuffled Patch-Wise Supervision for Presentation Attack Detection [12.031796234206135]
Face anti-spoofing is essential to prevent false facial verification by using a photo, video, mask, or a different substitute for an authorized person's face.
Most presentation attack detection systems suffer from overfitting, where they achieve near-perfect scores on a single dataset but fail on a different dataset with more realistic data.
We propose a new PAD approach, which combines pixel-wise binary supervision with patch-based CNN.
arXiv Detail & Related papers (2021-09-08T08:14:13Z) - Towards a Safety Case for Hardware Fault Tolerance in Convolutional
Neural Networks Using Activation Range Supervision [1.7968112116887602]
Convolutional neural networks (CNNs) have become an established part of numerous safety-critical computer vision applications.
We build a prototypical safety case for CNNs by demonstrating that range supervision represents a highly reliable fault detector.
We explore novel, non-uniform range restriction methods that effectively suppress the probability of silent data corruptions and uncorrectable errors.
arXiv Detail & Related papers (2021-08-16T11:13:55Z) - Vision Transformers are Robust Learners [65.91359312429147]
We study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples.
We present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
arXiv Detail & Related papers (2021-05-17T02:39:22Z) - Aurora Guard: Reliable Face Anti-Spoofing via Mobile Lighting System [103.5604680001633]
Anti-spoofing against high-resolution rendering replay of paper photos or digital videos remains an open problem.
We propose a simple yet effective face anti-spoofing system, termed Aurora Guard (AG)
arXiv Detail & Related papers (2021-02-01T09:17:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.