Flexible-Modal Face Anti-Spoofing: A Benchmark
- URL: http://arxiv.org/abs/2202.08192v1
- Date: Wed, 16 Feb 2022 16:55:39 GMT
- Title: Flexible-Modal Face Anti-Spoofing: A Benchmark
- Authors: Zitong Yu, Chenxu Zhao, Kevin H. M. Cheng, Xu Cheng, Guoying Zhao
- Abstract summary: Face anti-spoofing (FAS) plays a vital role in securing face recognition systems from presentation attacks.
We establish the first flexible-modal FAS benchmark with the principle train one for all'
We also investigate prevalent deep models and feature fusion strategies for flexible-modal FAS.
- Score: 66.18359076810549
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Face anti-spoofing (FAS) plays a vital role in securing face recognition
systems from presentation attacks. Benefitted from the maturing camera sensors,
single-modal (RGB) and multi-modal (e.g., RGB+Depth) FAS has been applied in
various scenarios with different configurations of sensors/modalities. Existing
single- and multi-modal FAS methods usually separately train and deploy models
for each possible modality scenario, which might be redundant and inefficient.
Can we train a unified model, and flexibly deploy it under various modality
scenarios? In this paper, we establish the first flexible-modal FAS benchmark
with the principle `train one for all'. To be specific, with trained
multi-modal (RGB+Depth+IR) FAS models, both intra- and cross-dataset testings
are conducted on four flexible-modal sub-protocols (RGB, RGB+Depth, RGB+IR, and
RGB+Depth+IR). We also investigate prevalent deep models and feature fusion
strategies for flexible-modal FAS. We hope this new benchmark will facilitate
the future research of the multi-modal FAS. The protocols and codes are
available at https://github.com/ZitongYu/Flex-Modal-FAS.
Related papers
- Centering the Value of Every Modality: Towards Efficient and Resilient Modality-agnostic Semantic Segmentation [7.797154022794006]
Recent endeavors regard RGB modality as the center and the others as the auxiliary, yielding an asymmetric architecture with two branches.
We propose a novel method, named MAGIC, that can be flexibly paired with various backbones, ranging from compact to high-performance models.
Our method achieves state-of-the-art performance while reducing the model parameters by 60%.
arXiv Detail & Related papers (2024-07-16T03:19:59Z) - All in One Framework for Multimodal Re-identification in the Wild [58.380708329455466]
multimodal learning paradigm for ReID introduced, referred to as All-in-One (AIO)
AIO harnesses a frozen pre-trained big model as an encoder, enabling effective multimodal retrieval without additional fine-tuning.
Experiments on cross-modal and multimodal ReID reveal that AIO not only adeptly handles various modal data but also excels in challenging contexts.
arXiv Detail & Related papers (2024-05-08T01:04:36Z) - Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification [64.36210786350568]
We propose a novel learning framework named textbfEDITOR to select diverse tokens from vision Transformers for multi-modal object ReID.
Our framework can generate more discriminative features for multi-modal object ReID.
arXiv Detail & Related papers (2024-03-15T12:44:35Z) - Bi-directional Adapter for Multi-modal Tracking [67.01179868400229]
We propose a novel multi-modal visual prompt tracking model based on a universal bi-directional adapter.
We develop a simple but effective light feature adapter to transfer modality-specific information from one modality to another.
Our model achieves superior tracking performance in comparison with both the full fine-tuning methods and the prompt learning-based methods.
arXiv Detail & Related papers (2023-12-17T05:27:31Z) - Visual Prompt Flexible-Modal Face Anti-Spoofing [23.58674017653937]
multimodal face data collected from the real world is often imperfect due to missing modalities from various imaging sensors.
We propose flexible-modal FAS, which learns the modal-relevant prompts to adapt the frozen pre-trained foundation model to downstream flexible-modal FAS task.
experiments conducted on two multimodal FAS benchmark datasets demonstrate the effectiveness of our VP-FAS framework.
arXiv Detail & Related papers (2023-07-26T05:06:41Z) - FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing [88.6654909354382]
We present a pure transformer-based framework, dubbed the Flexible Modal Vision Transformer (FM-ViT) for face anti-spoofing.
FM-ViT can flexibly target any single-modal (i.e., RGB) attack scenarios with the help of available multi-modal data.
Experiments demonstrate that the single model trained based on FM-ViT can not only flexibly evaluate different modal samples, but also outperforms existing single-modal frameworks by a large margin.
arXiv Detail & Related papers (2023-05-05T04:28:48Z) - MA-ViT: Modality-Agnostic Vision Transformers for Face Anti-Spoofing [3.3031006227198003]
We present Modality-Agnostic Vision Transformer (MA-ViT), which aims to improve the performance of arbitrary modal attacks with the help of multi-modal data.
Specifically, MA-ViT adopts the early fusion to aggregate all the available training modalities data and enables flexible testing of any given modal samples.
Experiments demonstrate that the single model trained on MA-ViT can not only flexibly evaluate different modal samples, but also outperforms existing single-modal frameworks by a large margin.
arXiv Detail & Related papers (2023-04-15T13:03:44Z) - Rethinking Vision Transformer and Masked Autoencoder in Multimodal Face
Anti-Spoofing [19.142582966452935]
We investigate three key factors (i.e., inputs, pre-training, and finetuning) in ViT for multimodal FAS with RGB, Infrared (IR), and Depth.
We propose the modality-asymmetric masked autoencoder (M$2$A$2$E) for multimodal FAS self-supervised pre-training without costly annotated labels.
arXiv Detail & Related papers (2023-02-11T17:02:34Z) - Multi-Modal Face Anti-Spoofing Based on Central Difference Networks [93.6690714235887]
Face anti-spoofing (FAS) plays a vital role in securing face recognition systems from presentation attacks.
Existing multi-modal FAS methods rely on stacked vanilla convolutions.
We extend the central difference convolutional networks (CDCN) to a multi-modal version, intending to capture intrinsic spoofing patterns.
arXiv Detail & Related papers (2020-04-17T11:42:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.