Related papers: Liveness Detection in Computer Vision: Transformer-based Self-Supervised Learning for Face Anti-Spoofing

Liveness Detection in Computer Vision: Transformer-based Self-Supervised Learning for Face Anti-Spoofing

URL: http://arxiv.org/abs/2406.13860v1
Date: Wed, 19 Jun 2024 21:44:43 GMT
Title: Liveness Detection in Computer Vision: Transformer-based Self-Supervised Learning for Face Anti-Spoofing
Authors: Arman Keresh, Pakizar Shamoi,
Abstract summary: Face recognition systems are vulnerable to spoofing attacks, where attackers use photos, videos, or masks to impersonate legitimate users. This research addresses these vulnerabilities by exploring the Vision Transformer (ViT) architecture, fine-tuned with the DINO framework.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Face recognition systems are increasingly used in biometric security for convenience and effectiveness. However, they remain vulnerable to spoofing attacks, where attackers use photos, videos, or masks to impersonate legitimate users. This research addresses these vulnerabilities by exploring the Vision Transformer (ViT) architecture, fine-tuned with the DINO framework. The DINO framework facilitates self-supervised learning, enabling the model to learn distinguishing features from unlabeled data. We compared the performance of the proposed fine-tuned ViT model using the DINO framework against a traditional CNN model, EfficientNet b2, on the face anti-spoofing task. Numerous tests on standard datasets show that the ViT model performs better than the CNN model in terms of accuracy and resistance to different spoofing methods. Additionally, we collected our own dataset from a biometric application to validate our findings further. This study highlights the superior performance of transformer-based architecture in identifying complex spoofing cues, leading to significant advancements in biometric security.

Related papers

LoRA-Enhanced Vision Transformer for Single Image based Morphing Attack Detection via Knowledge Distillation from EfficientNet [7.409512128477373]
We propose a novel Single-Image Morphing Attack Detection (S-MAD) approach using a teacher-student framework.<n>To improve efficiency, we integrate Low-Rank Adaptation (LoRA) for fine-tuning, reducing computational costs while maintaining high detection accuracy.<n>The proposed method is benchmarked against six state-of-the-art S-MAD techniques, demonstrating superior detection performance and computational efficiency.
arXiv Detail & Related papers (2025-11-16T13:58:11Z)
Deep Learning Models for Robust Facial Liveness Detection [56.08694048252482]
This study introduces a robust solution through novel deep learning models addressing the deficiencies in contemporary anti-spoofing techniques.<n>By innovatively integrating texture analysis and reflective properties associated with genuine human traits, our models distinguish authentic presence from replicas with remarkable precision.
arXiv Detail & Related papers (2025-08-12T17:19:20Z)
Leveraging Intermediate Features of Vision Transformer for Face Anti-Spoofing [0.11184789007828977]
We propose a spoofing attack detection method based on Vision Transformer (ViT) to detect minute differences between live and spoofed face images.<n>The proposed method also introduces two data augmentation methods: face anti-sfing data augmentation and patch-wise data augmentation.<n>We demonstrate the effectiveness of the proposed method through experiments using the OULU-NPU and SiW datasets.
arXiv Detail & Related papers (2025-05-30T09:33:01Z)
Combined CNN and ViT features off-the-shelf: Another astounding baseline for recognition [49.14350399025926]
We apply pre-trained architectures, originally developed for the ImageNet Large Scale Visual Recognition Challenge, for periocular recognition. Middle-layer features from CNNs and ViTs are a suitable way to recognize individuals based on periocular images.
arXiv Detail & Related papers (2024-07-28T11:52:36Z)
Fiducial Focus Augmentation for Facial Landmark Detection [4.433764381081446]
We propose a novel image augmentation technique to enhance the model's understanding of facial structures. We employ a Siamese architecture-based training mechanism with a Deep Canonical Correlation Analysis (DCCA)-based loss. Our approach outperforms multiple state-of-the-art approaches across various benchmark datasets.
arXiv Detail & Related papers (2024-02-23T01:34:00Z)
AttackNet: Enhancing Biometric Security via Tailored Convolutional Neural Network Architectures for Liveness Detection [20.821562115822182]
AttackNet is a bespoke Convolutional Neural Network architecture designed to combat spoofing threats in biometric systems. It offers a layered defense mechanism, seamlessly transitioning from low-level feature extraction to high-level pattern discernment. Benchmarking our model across diverse datasets affirms its prowess, showcasing superior performance metrics in comparison to contemporary models.
arXiv Detail & Related papers (2024-02-06T07:22:50Z)
Embedding Non-Distortive Cancelable Face Template Generation [22.80706131626207]
We introduce an innovative image distortion technique that makes facial images unrecognizable to the eye but still identifiable by any custom embedding neural network model. We test the reliability of biometric recognition networks by determining the maximum image distortion that does not change the predicted identity.
arXiv Detail & Related papers (2024-02-04T15:39:18Z)
Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust. Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model. We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z)
Benchmarking Detection Transfer Learning with Vision Transformers [60.97703494764904]
complexity of object detection methods can make benchmarking non-trivial when new architectures, such as Vision Transformer (ViT) models, arrive. We present training techniques that overcome these challenges, enabling the use of standard ViT models as the backbone of Mask R-CNN. Our results show that recent masking-based unsupervised learning methods may, for the first time, provide convincing transfer learning improvements on COCO.
arXiv Detail & Related papers (2021-11-22T18:59:15Z)
Federated Test-Time Adaptive Face Presentation Attack Detection with Dual-Phase Privacy Preservation [100.69458267888962]
Face presentation attack detection (fPAD) plays a critical role in the modern face recognition pipeline. Due to legal and privacy issues, training data (real face images and spoof images) are not allowed to be directly shared between different data sources. We propose a Federated Test-Time Adaptive Face Presentation Attack Detection with Dual-Phase Privacy Preservation framework.
arXiv Detail & Related papers (2021-10-25T02:51:05Z)
Shuffled Patch-Wise Supervision for Presentation Attack Detection [12.031796234206135]
Face anti-spoofing is essential to prevent false facial verification by using a photo, video, mask, or a different substitute for an authorized person's face. Most presentation attack detection systems suffer from overfitting, where they achieve near-perfect scores on a single dataset but fail on a different dataset with more realistic data. We propose a new PAD approach, which combines pixel-wise binary supervision with patch-based CNN.
arXiv Detail & Related papers (2021-09-08T08:14:13Z)
Towards a Safety Case for Hardware Fault Tolerance in Convolutional Neural Networks Using Activation Range Supervision [1.7968112116887602]
Convolutional neural networks (CNNs) have become an established part of numerous safety-critical computer vision applications. We build a prototypical safety case for CNNs by demonstrating that range supervision represents a highly reliable fault detector. We explore novel, non-uniform range restriction methods that effectively suppress the probability of silent data corruptions and uncorrectable errors.
arXiv Detail & Related papers (2021-08-16T11:13:55Z)
Vision Transformers are Robust Learners [65.91359312429147]
We study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples. We present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
arXiv Detail & Related papers (2021-05-17T02:39:22Z)
Aurora Guard: Reliable Face Anti-Spoofing via Mobile Lighting System [103.5604680001633]
Anti-spoofing against high-resolution rendering replay of paper photos or digital videos remains an open problem. We propose a simple yet effective face anti-spoofing system, termed Aurora Guard (AG)
arXiv Detail & Related papers (2021-02-01T09:17:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.