Saliency-based Video Summarization for Face Anti-spoofing
- URL: http://arxiv.org/abs/2308.12364v2
- Date: Wed, 11 Oct 2023 22:38:38 GMT
- Title: Saliency-based Video Summarization for Face Anti-spoofing
- Authors: Usman Muhammad, Mourad Oussalah, and Jorma Laaksonen
- Abstract summary: We present a video summarization method for face anti-spoofing detection that aims to enhance the performance of deep learning models by leveraging visual saliency.
In particular, saliency information is extracted from the differences between the Laplacian and Wiener filter outputs of the source images.
Weighting maps are then computed based on the saliency information, indicating the importance of each pixel in the image.
- Score: 4.730428911461921
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the growing availability of databases for face presentation attack
detection, researchers are increasingly focusing on video-based face
anti-spoofing methods that involve hundreds to thousands of images for training
the models. However, there is currently no clear consensus on the optimal
number of frames in a video to improve face spoofing detection. Inspired by the
visual saliency theory, we present a video summarization method for face
anti-spoofing detection that aims to enhance the performance and efficiency of
deep learning models by leveraging visual saliency. In particular, saliency
information is extracted from the differences between the Laplacian and Wiener
filter outputs of the source images, enabling identification of the most
visually salient regions within each frame. Subsequently, the source images are
decomposed into base and detail images, enhancing the representation of the
most important information. Weighting maps are then computed based on the
saliency information, indicating the importance of each pixel in the image. By
linearly combining the base and detail images using the weighting maps, the
method fuses the source images to create a single representative image that
summarizes the entire video. The key contribution of the proposed method lies
in demonstrating how visual saliency can be used as a data-centric approach to
improve the performance and efficiency for face presentation attack detection.
By focusing on the most salient images or regions within the images, a more
representative and diverse training set can be created, potentially leading to
more effective models. To validate the method's effectiveness, a simple CNN-RNN
deep learning architecture was used, and the experimental results showcased
state-of-the-art performance on five challenging face anti-spoofing datasets
Related papers
- Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems [80.62854148838359]
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate.
We use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data.
Our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
arXiv Detail & Related papers (2024-03-23T22:32:06Z) - Foveation in the Era of Deep Learning [6.602118206533142]
We introduce an end-to-end differentiable foveated active vision architecture that leverages a graph convolutional network to process foveated images.
Our model learns to iteratively attend to regions of the image relevant for classification.
We find that our model outperforms a state-of-the-art CNN and foveated vision architectures of comparable parameters and a given pixel or budget.
arXiv Detail & Related papers (2023-12-03T16:48:09Z) - Detecting Generated Images by Real Images Only [64.12501227493765]
Existing generated image detection methods detect visual artifacts in generated images or learn discriminative features from both real and generated images by massive training.
This paper approaches the generated image detection problem from a new perspective: Start from real images.
By finding the commonality of real images and mapping them to a dense subspace in feature space, the goal is that generated images, regardless of their generative model, are then projected outside the subspace.
arXiv Detail & Related papers (2023-11-02T03:09:37Z) - Multi-modal reward for visual relationships-based image captioning [4.354364351426983]
This paper proposes a deep neural network architecture for image captioning based on fusing the visual relationships information extracted from an image's scene graph with the spatial feature maps of the image.
A multi-modal reward function is then introduced for deep reinforcement learning of the proposed network using a combination of language and vision similarities in a common embedding space.
arXiv Detail & Related papers (2023-03-19T20:52:44Z) - A survey on facial image deblurring [3.6775758132528877]
When the facial image is blurred, it has a great impact on high-level vision tasks such as face recognition.
This paper surveys and summarizes recently published methods for facial image deblurring, most of which are based on deep learning.
We show the performance of classical methods on datasets and metrics and give a brief discussion on the differences of model-based and learning-based methods.
arXiv Detail & Related papers (2023-02-10T02:24:56Z) - Efficient Textured Mesh Recovery from Multiple Views with Differentiable
Rendering [8.264851594332677]
We propose an efficient coarse-to-fine approach to recover the textured mesh from multi-view images.
We optimize the shape geometry by minimizing the difference between the rendered mesh with the depth predicted by the learning-based multi-view stereo algorithm.
In contrast to the implicit neural representation on shape and color, we introduce a physically based inverse rendering scheme to jointly estimate the lighting and reflectance of the objects.
arXiv Detail & Related papers (2022-05-25T03:33:55Z) - ViCE: Self-Supervised Visual Concept Embeddings as Contextual and Pixel
Appearance Invariant Semantic Representations [77.3590853897664]
This work presents a self-supervised method to learn dense semantically rich visual embeddings for images inspired by methods for learning word embeddings in NLP.
arXiv Detail & Related papers (2021-11-24T12:27:30Z) - Face Anti-Spoofing Via Disentangled Representation Learning [90.90512800361742]
Face anti-spoofing is crucial to security of face recognition systems.
We propose a novel perspective of face anti-spoofing that disentangles the liveness features and content features from images.
arXiv Detail & Related papers (2020-08-19T03:54:23Z) - Saliency-driven Class Impressions for Feature Visualization of Deep
Neural Networks [55.11806035788036]
It is advantageous to visualize the features considered to be essential for classification.
Existing visualization methods develop high confidence images consisting of both background and foreground features.
In this work, we propose a saliency-driven approach to visualize discriminative features that are considered most important for a given task.
arXiv Detail & Related papers (2020-07-31T06:11:06Z) - Cross-Identity Motion Transfer for Arbitrary Objects through
Pose-Attentive Video Reassembling [40.20163225821707]
Given a source image and a driving video, our networks animate the subject in the source images according to the motion in the driving video.
In our attention mechanism, dense similarities between the learned keypoints in the source and the driving images are computed.
To reduce the training-testing discrepancy of the self-supervised learning, a novel cross-identity training scheme is additionally introduced.
arXiv Detail & Related papers (2020-07-17T07:21:12Z) - Joint Deep Learning of Facial Expression Synthesis and Recognition [97.19528464266824]
We propose a novel joint deep learning of facial expression synthesis and recognition method for effective FER.
The proposed method involves a two-stage learning procedure. Firstly, a facial expression synthesis generative adversarial network (FESGAN) is pre-trained to generate facial images with different facial expressions.
In order to alleviate the problem of data bias between the real images and the synthetic images, we propose an intra-class loss with a novel real data-guided back-propagation (RDBP) algorithm.
arXiv Detail & Related papers (2020-02-06T10:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.