M3FAS: An Accurate and Robust MultiModal Mobile Face Anti-Spoofing System
- URL: http://arxiv.org/abs/2301.12831v3
- Date: Thu, 21 Mar 2024 05:39:44 GMT
- Title: M3FAS: An Accurate and Robust MultiModal Mobile Face Anti-Spoofing System
- Authors: Chenqi Kong, Kexin Zheng, Yibing Liu, Shiqi Wang, Anderson Rocha, Haoliang Li,
- Abstract summary: Face presentation attacks (FPA) have brought increasing concerns to the public through various malicious applications.
We devise an accurate and robust MultiModal Mobile Face Anti-Spoofing system named M3FAS.
- Score: 39.37647248710612
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Face presentation attacks (FPA), also known as face spoofing, have brought increasing concerns to the public through various malicious applications, such as financial fraud and privacy leakage. Therefore, safeguarding face recognition systems against FPA is of utmost importance. Although existing learning-based face anti-spoofing (FAS) models can achieve outstanding detection performance, they lack generalization capability and suffer significant performance drops in unforeseen environments. Many methodologies seek to use auxiliary modality data (e.g., depth and infrared maps) during the presentation attack detection (PAD) to address this limitation. However, these methods can be limited since (1) they require specific sensors such as depth and infrared cameras for data capture, which are rarely available on commodity mobile devices, and (2) they cannot work properly in practical scenarios when either modality is missing or of poor quality. In this paper, we devise an accurate and robust MultiModal Mobile Face Anti-Spoofing system named M3FAS to overcome the issues above. The primary innovation of this work lies in the following aspects: (1) To achieve robust PAD, our system combines visual and auditory modalities using three commonly available sensors: camera, speaker, and microphone; (2) We design a novel two-branch neural network with three hierarchical feature aggregation modules to perform cross-modal feature fusion; (3). We propose a multi-head training strategy, allowing the model to output predictions from the vision, acoustic, and fusion heads, resulting in a more flexible PAD. Extensive experiments have demonstrated the accuracy, robustness, and flexibility of M3FAS under various challenging experimental settings. The source code and dataset are available at: https://github.com/ChenqiKONG/M3FAS/
Related papers
- A Multi-Modal Approach for Face Anti-Spoofing in Non-Calibrated Systems using Disparity Maps [0.6144680854063939]
Face recognition technologies are vulnerable to face spoofing attacks.
stereo-depth cameras can detect such attacks effectively, but their high-cost limits their widespread adoption.
We propose a method to overcome this challenge by leveraging facial attributes to derive disparity information.
arXiv Detail & Related papers (2024-10-31T15:29:51Z) - Robust Multimodal 3D Object Detection via Modality-Agnostic Decoding and Proximity-based Modality Ensemble [15.173314907900842]
Existing 3D object detection methods rely heavily on the LiDAR sensor.
We propose MEFormer to address the LiDAR over-reliance problem.
Our MEFormer achieves state-of-the-art performance of 73.9% NDS and 71.5% mAP in the nuScenes validation set.
arXiv Detail & Related papers (2024-07-27T03:21:44Z) - Flow-Attention-based Spatio-Temporal Aggregation Network for 3D Mask
Detection [12.160085404239446]
We propose a novel 3D mask detection framework called FASTEN.
We tailor the network for focusing more on fine details in large movements, which can eliminate redundant-temporal feature interference.
FASTEN only requires five frames input and outperforms eight competitors for both intra-dataset and cross-dataset evaluations.
arXiv Detail & Related papers (2023-10-25T11:54:21Z) - FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing [88.6654909354382]
We present a pure transformer-based framework, dubbed the Flexible Modal Vision Transformer (FM-ViT) for face anti-spoofing.
FM-ViT can flexibly target any single-modal (i.e., RGB) attack scenarios with the help of available multi-modal data.
Experiments demonstrate that the single model trained based on FM-ViT can not only flexibly evaluate different modal samples, but also outperforms existing single-modal frameworks by a large margin.
arXiv Detail & Related papers (2023-05-05T04:28:48Z) - Towards Effective Adversarial Textured 3D Meshes on Physical Face
Recognition [42.60954035488262]
The goal of this work is to develop a more reliable technique that can carry out an end-to-end evaluation of adversarial robustness for commercial systems.
We design adversarial textured 3D meshes (AT3D) with an elaborate topology on a human face, which can be 3D-printed and pasted on the attacker's face to evade the defenses.
To deviate from the mesh-based space, we propose to perturb the low-dimensional coefficient space based on 3D Morphable Model.
arXiv Detail & Related papers (2023-03-28T08:42:54Z) - Face Presentation Attack Detection [59.05779913403134]
Face recognition technology has been widely used in daily interactive applications such as checking-in and mobile payment.
However, its vulnerability to presentation attacks (PAs) limits its reliable use in ultra-secure applicational scenarios.
arXiv Detail & Related papers (2022-12-07T14:51:17Z) - Dual Spoof Disentanglement Generation for Face Anti-spoofing with Depth
Uncertainty Learning [54.15303628138665]
Face anti-spoofing (FAS) plays a vital role in preventing face recognition systems from presentation attacks.
Existing face anti-spoofing datasets lack diversity due to the insufficient identity and insignificant variance.
We propose Dual Spoof Disentanglement Generation framework to tackle this challenge by "anti-spoofing via generation"
arXiv Detail & Related papers (2021-12-01T15:36:59Z) - YOLOpeds: Efficient Real-Time Single-Shot Pedestrian Detection for Smart
Camera Applications [2.588973722689844]
This work addresses the challenge of achieving a good trade-off between accuracy and speed for efficient deployment of deep-learning-based pedestrian detection in smart camera applications.
A computationally efficient architecture is introduced based on separable convolutions and proposes integrating dense connections across layers and multi-scale feature fusion.
Overall, YOLOpeds provides real-time sustained operation of over 30 frames per second with detection rates in the range of 86% outperforming existing deep learning models.
arXiv Detail & Related papers (2020-07-27T09:50:11Z) - Face Anti-Spoofing with Human Material Perception [76.4844593082362]
Face anti-spoofing (FAS) plays a vital role in securing the face recognition systems from presentation attacks.
We rephrase face anti-spoofing as a material recognition problem and combine it with classical human material perception.
We propose the Bilateral Convolutional Networks (BCN), which is able to capture intrinsic material-based patterns.
arXiv Detail & Related papers (2020-07-04T18:25:53Z) - ASFD: Automatic and Scalable Face Detector [129.82350993748258]
We propose a novel Automatic and Scalable Face Detector (ASFD)
ASFD is based on a combination of neural architecture search techniques as well as a new loss design.
Our ASFD-D6 outperforms the prior strong competitors, and our lightweight ASFD-D0 runs at more than 120 FPS with Mobilenet for VGA-resolution images.
arXiv Detail & Related papers (2020-03-25T06:00:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.