Vista-Morph: Unsupervised Image Registration of Visible-Thermal Facial
Pairs
- URL: http://arxiv.org/abs/2306.06505v1
- Date: Sat, 10 Jun 2023 18:42:36 GMT
- Title: Vista-Morph: Unsupervised Image Registration of Visible-Thermal Facial
Pairs
- Authors: Catherine Ordun, Edward Raff, Sanjay Purushotham
- Abstract summary: We introduce our approach for Visible-Thermal (VT) image registration called Vista Morph.
By learning the affine matrix through a Vision Transformer (ViT)-based Spatial Transformer Network (STN) and Generative Adversarial Networks (GAN), Vista Morph successfully aligns facial and non-facial VT images.
We conduct a downstream generative AI task to show that registering training data with Vista Morph improves subject identity of generated thermal faces when performing V2T image translation.
- Score: 36.33347149799959
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For a variety of biometric cross-spectral tasks, Visible-Thermal (VT) facial
pairs are used. However, due to a lack of calibration in the lab, photographic
capture between two different sensors leads to severely misaligned pairs that
can lead to poor results for person re-identification and generative AI. To
solve this problem, we introduce our approach for VT image registration called
Vista Morph. Unlike existing VT facial registration that requires manual,
hand-crafted features for pixel matching and/or a supervised thermal reference,
Vista Morph is completely unsupervised without the need for a reference. By
learning the affine matrix through a Vision Transformer (ViT)-based Spatial
Transformer Network (STN) and Generative Adversarial Networks (GAN), Vista
Morph successfully aligns facial and non-facial VT images. Our approach learns
warps in Hard, No, and Low-light visual settings and is robust to geometric
perturbations and erasure at test time. We conduct a downstream generative AI
task to show that registering training data with Vista Morph improves subject
identity of generated thermal faces when performing V2T image translation.
Related papers
- LADIMO: Face Morph Generation through Biometric Template Inversion with Latent Diffusion [5.602947425285195]
Face morphing attacks pose a severe security threat to face recognition systems.
We present a representation-level face morphing approach, namely LADIMO, that performs morphing on two face recognition embeddings.
We show that each face morph variant has an individual attack success rate, enabling us to maximize the morph attack potential.
arXiv Detail & Related papers (2024-10-10T14:41:37Z) - CleftGAN: Adapting A Style-Based Generative Adversarial Network To
Create Images Depicting Cleft Lip Deformity [2.1647227058902336]
We have built a deep learning-based cleft lip generator designed to produce an almost unlimited number of artificial images exhibiting high-fidelity facsimiles of cleft lip.
We undertook a transfer learning protocol testing different versions of StyleGAN-ADA.
Training images depicting a variety of cleft deformities were pre-processed to adjust for rotation, scaling, color adjustment and background blurring.
arXiv Detail & Related papers (2023-10-12T01:25:21Z) - A Generative Approach for Image Registration of Visible-Thermal (VT)
Cancer Faces [77.77475333490744]
We modernize the classic computer vision task of image registration by applying and modifying a generative alignment algorithm.
We demonstrate that the quality of thermal images produced in the generative AI downstream task of Visible-to-Thermal (V2T) image translation significantly improves up to 52.5%.
arXiv Detail & Related papers (2023-08-23T17:39:58Z) - MorphGANFormer: Transformer-based Face Morphing and De-Morphing [55.211984079735196]
StyleGAN-based approaches to face morphing are among the leading techniques.
We propose a transformer-based alternative to face morphing and demonstrate its superiority to StyleGAN-based methods.
arXiv Detail & Related papers (2023-02-18T19:09:11Z) - Emotion Separation and Recognition from a Facial Expression by Generating the Poker Face with Vision Transformers [57.1091606948826]
We propose a novel FER model, named Poker Face Vision Transformer or PF-ViT, to address these challenges.
PF-ViT aims to separate and recognize the disturbance-agnostic emotion from a static facial image via generating its corresponding poker face.
PF-ViT utilizes vanilla Vision Transformers, and its components are pre-trained as Masked Autoencoders on a large facial expression dataset.
arXiv Detail & Related papers (2022-07-22T13:39:06Z) - Face Reconstruction with Variational Autoencoder and Face Masks [0.0]
In this work, we investigated how face masks can help the training of VAEs for face reconstruction.
An evaluation of the proposal using the celebA dataset shows that the reconstructed images are enhanced with the face masks.
arXiv Detail & Related papers (2021-12-03T19:49:52Z) - Intriguing Properties of Vision Transformers [114.28522466830374]
Vision transformers (ViT) have demonstrated impressive performance across various machine vision problems.
We systematically study this question via an extensive set of experiments and comparisons with a high-performing convolutional neural network (CNN)
We show effective features of ViTs are due to flexible receptive and dynamic fields possible via the self-attention mechanism.
arXiv Detail & Related papers (2021-05-21T17:59:18Z) - MorphGAN: One-Shot Face Synthesis GAN for Detecting Recognition Bias [13.162012586770576]
We describe a simulator that applies specific head pose and facial expression adjustments to images of previously unseen people.
We show that by augmenting small datasets of faces with new poses and expressions improves the recognition performance by up to 9% depending on the augmentation and data scarcity.
arXiv Detail & Related papers (2020-12-09T18:43:03Z) - Fine-grained Image-to-Image Transformation towards Visual Recognition [102.51124181873101]
We aim at transforming an image with a fine-grained category to synthesize new images that preserve the identity of the input image.
We adopt a model based on generative adversarial networks to disentangle the identity related and unrelated factors of an image.
Experiments on the CompCars and Multi-PIE datasets demonstrate that our model preserves the identity of the generated images much better than the state-of-the-art image-to-image transformation models.
arXiv Detail & Related papers (2020-01-12T05:26:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.