Flow-Attention-based Spatio-Temporal Aggregation Network for 3D Mask
Detection
- URL: http://arxiv.org/abs/2310.16569v1
- Date: Wed, 25 Oct 2023 11:54:21 GMT
- Title: Flow-Attention-based Spatio-Temporal Aggregation Network for 3D Mask
Detection
- Authors: Yuxin Cao, Yian Li, Yumeng Zhu, Derui Wang, Minhui Xue
- Abstract summary: We propose a novel 3D mask detection framework called FASTEN.
We tailor the network for focusing more on fine details in large movements, which can eliminate redundant-temporal feature interference.
FASTEN only requires five frames input and outperforms eight competitors for both intra-dataset and cross-dataset evaluations.
- Score: 12.160085404239446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Anti-spoofing detection has become a necessity for face recognition systems
due to the security threat posed by spoofing attacks. Despite great success in
traditional attacks, most deep-learning-based methods perform poorly in 3D
masks, which can highly simulate real faces in appearance and structure,
suffering generalizability insufficiency while focusing only on the spatial
domain with single frame input. This has been mitigated by the recent
introduction of a biomedical technology called rPPG (remote
photoplethysmography). However, rPPG-based methods are sensitive to noisy
interference and require at least one second (> 25 frames) of observation time,
which induces high computational overhead. To address these challenges, we
propose a novel 3D mask detection framework, called FASTEN
(Flow-Attention-based Spatio-Temporal aggrEgation Network). We tailor the
network for focusing more on fine-grained details in large movements, which can
eliminate redundant spatio-temporal feature interference and quickly capture
splicing traces of 3D masks in fewer frames. Our proposed network contains
three key modules: 1) a facial optical flow network to obtain non-RGB
inter-frame flow information; 2) flow attention to assign different
significance to each frame; 3) spatio-temporal aggregation to aggregate
high-level spatial features and temporal transition features. Through extensive
experiments, FASTEN only requires five frames of input and outperforms eight
competitors for both intra-dataset and cross-dataset evaluations in terms of
multiple detection metrics. Moreover, FASTEN has been deployed in real-world
mobile devices for practical 3D mask detection.
Related papers
- UniForensics: Face Forgery Detection via General Facial Representation [60.5421627990707]
High-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization.
We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video network, with a meta-functional face classification for enriched facial representation.
arXiv Detail & Related papers (2024-07-26T20:51:54Z) - FALCON: Frequency Adjoint Link with CONtinuous Density Mask for Fast Single Image Dehazing [8.703680337470285]
This work introduces FALCON, a single-image dehazing system achieving state-of-the-art performance on both quality and speed.
We leverage the underlying haze distribution based on the atmospheric scattering model via a Continuous Density Mask.
Experiments involving multiple state-of-the-art methods and ablation analysis demonstrate FALCON's exceptional performance in both dehazing quality and speed.
arXiv Detail & Related papers (2024-07-01T05:16:26Z) - M3FAS: An Accurate and Robust MultiModal Mobile Face Anti-Spoofing System [39.37647248710612]
Face presentation attacks (FPA) have brought increasing concerns to the public through various malicious applications.
We devise an accurate and robust MultiModal Mobile Face Anti-Spoofing system named M3FAS.
arXiv Detail & Related papers (2023-01-30T12:37:04Z) - S^2-Transformer for Mask-Aware Hyperspectral Image Reconstruction [48.83280067393851]
A representative hyperspectral image acquisition procedure conducts a 3D-to-2D encoding by the coded aperture snapshot spectral imager (CASSI)
Two major challenges stand in the way of a high-fidelity reconstruction: (i) To obtain 2D measurements, CASSI dislocates multiple channels by disperser-titling and squeezes them onto the same spatial region, yielding an entangled data loss.
We propose a spatial-spectral (S2-) transformer architecture with a mask-aware learning strategy to tackle these challenges.
arXiv Detail & Related papers (2022-09-24T19:26:46Z) - TransRPPG: Remote Photoplethysmography Transformer for 3D Mask Face
Presentation Attack Detection [53.98866801690342]
3D mask face presentation attack detection (PAD) plays a vital role in securing face recognition systems from 3D mask attacks.
We propose a pure r transformer (TransR) framework for learning live intrinsicness representation efficiently.
Our TransR is lightweight and efficient (with only 547K parameters and 763MOPs) which is promising for mobile-level applications.
arXiv Detail & Related papers (2021-04-15T12:33:13Z) - Contrastive Context-Aware Learning for 3D High-Fidelity Mask Face
Presentation Attack Detection [103.7264459186552]
Face presentation attack detection (PAD) is essential to secure face recognition systems.
Most existing 3D mask PAD benchmarks suffer from several drawbacks.
We introduce a largescale High-Fidelity Mask dataset to bridge the gap to real-world applications.
arXiv Detail & Related papers (2021-04-13T12:48:38Z) - Efficient Two-Stream Network for Violence Detection Using Separable
Convolutional LSTM [0.0]
We propose an efficient two-stream deep learning architecture leveraging Separable Convolutional LSTM (SepConvLSTM) and pre-trained MobileNet.
SepConvLSTM is constructed by replacing convolution operation at each gate of ConvLSTM with a depthwise separable convolution.
Our model outperforms the accuracy on the larger and more challenging RWF-2000 dataset by more than a 2% margin.
arXiv Detail & Related papers (2021-02-21T12:01:48Z) - AutoHR: A Strong End-to-end Baseline for Remote Heart Rate Measurement
with Neural Searching [76.4844593082362]
We investigate the reason why existing end-to-end networks perform poorly in challenging conditions and establish a strong baseline for remote HR measurement with architecture search (NAS)
Comprehensive experiments are performed on three benchmark datasets on both intra-temporal and cross-dataset testing.
arXiv Detail & Related papers (2020-04-26T05:43:21Z) - Deep Spatial Gradient and Temporal Depth Learning for Face Anti-spoofing [61.82466976737915]
Depth supervised learning has been proven as one of the most effective methods for face anti-spoofing.
We propose a new approach to detect presentation attacks from multiple frames based on two insights.
The proposed approach achieves state-of-the-art results on five benchmark datasets.
arXiv Detail & Related papers (2020-03-18T06:11:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.