Text-Guided Face Recognition using Multi-Granularity Cross-Modal
Contrastive Learning
- URL: http://arxiv.org/abs/2312.09367v1
- Date: Thu, 14 Dec 2023 22:04:22 GMT
- Title: Text-Guided Face Recognition using Multi-Granularity Cross-Modal
Contrastive Learning
- Authors: Md Mahedi Hasan, Shoaib Meraj Sami, and Nasser Nasrabadi
- Abstract summary: We introduce text-guided face recognition (TGFR) to analyze the impact of integrating facial attributes in the form of natural language descriptions.
TGFR demonstrates remarkable improvements, particularly on low-quality images, over existing face recognition models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: State-of-the-art face recognition (FR) models often experience a significant
performance drop when dealing with facial images in surveillance scenarios
where images are in low quality and often corrupted with noise. Leveraging
facial characteristics, such as freckles, scars, gender, and ethnicity, becomes
highly beneficial in improving FR performance in such scenarios. In this paper,
we introduce text-guided face recognition (TGFR) to analyze the impact of
integrating facial attributes in the form of natural language descriptions. We
hypothesize that adding semantic information into the loop can significantly
improve the image understanding capability of an FR algorithm compared to other
soft biometrics. However, learning a discriminative joint embedding within the
multimodal space poses a considerable challenge due to the semantic gap in the
unaligned image-text representations, along with the complexities arising from
ambiguous and incoherent textual descriptions of the face. To address these
challenges, we introduce a face-caption alignment module (FCAM), which
incorporates cross-modal contrastive losses across multiple granularities to
maximize the mutual information between local and global features of the
face-caption pair. Within FCAM, we refine both facial and textual features for
learning aligned and discriminative features. We also design a face-caption
fusion module (FCFM) that applies fine-grained interactions and coarse-grained
associations among cross-modal features. Through extensive experiments
conducted on three face-caption datasets, proposed TGFR demonstrates remarkable
improvements, particularly on low-quality images, over existing FR models and
outperforms other related methods and benchmarks.
Related papers
- From Pixels to Words: Leveraging Explainability in Face Recognition through Interactive Natural Language Processing [2.7568948557193287]
Face Recognition (FR) has advanced significantly with the development of deep learning, achieving high accuracy in several applications.
The lack of interpretability of these systems raises concerns about their accountability, fairness, and reliability.
We propose an interactive framework to enhance the explainability of FR models by combining model-agnostic Explainable Artificial Intelligence (XAI) and Natural Language Processing (NLP) techniques.
arXiv Detail & Related papers (2024-09-24T13:40:39Z) - MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection [64.29452783056253]
The rapid development of photo-realistic face generation methods has raised significant concerns in society and academia.
Although existing approaches mainly capture face forgery patterns using image modality, other modalities like fine-grained noises and texts are not fully explored.
We propose a novel multi-modal fine-grained CLIP (MFCLIP) model, which mines comprehensive and fine-grained forgery traces across image-noise modalities.
arXiv Detail & Related papers (2024-09-15T13:08:59Z) - Improving Face Recognition from Caption Supervision with Multi-Granular
Contextual Feature Aggregation [0.0]
We introduce caption-guided face recognition (CGFR) as a new framework to improve the performance of commercial-off-the-shelf (COTS) face recognition systems.
We implement the proposed CGFR framework on two face recognition models (ArcFace and AdaFace) and evaluated its performance on the Multi-Modal CelebA-HQ dataset.
arXiv Detail & Related papers (2023-08-13T23:52:15Z) - GaFET: Learning Geometry-aware Facial Expression Translation from
In-The-Wild Images [55.431697263581626]
We introduce a novel Geometry-aware Facial Expression Translation framework, which is based on parametric 3D facial representations and can stably decoupled expression.
We achieve higher-quality and more accurate facial expression transfer results compared to state-of-the-art methods, and demonstrate applicability of various poses and complex textures.
arXiv Detail & Related papers (2023-08-07T09:03:35Z) - More comprehensive facial inversion for more effective expression
recognition [8.102564078640274]
We propose a novel generative method based on the image inversion mechanism for the FER task, termed Inversion FER (IFER)
ASIT is equipped with an image inversion discriminator that measures the cosine similarity of semantic features between source and generated images, constrained by a distribution alignment loss.
We extensively evaluate ASIT on facial datasets such as FFHQ and CelebA-HQ, showing that our approach achieves state-of-the-art facial inversion performance.
arXiv Detail & Related papers (2022-11-24T12:31:46Z) - Two-stage Visual Cues Enhancement Network for Referring Image
Segmentation [89.49412325699537]
Referring Image (RIS) aims at segmenting the target object from an image referred by one given natural language expression.
In this paper, we tackle this problem by devising a Two-stage Visual cues enhancement Network (TV-Net)
Through the two-stage enhancement, our proposed TV-Net enjoys better performances in learning fine-grained matching behaviors between the natural language expression and image.
arXiv Detail & Related papers (2021-10-09T02:53:39Z) - Self-supervised Contrastive Learning of Multi-view Facial Expressions [9.949781365631557]
Facial expression recognition (FER) has emerged as an important component of human-computer interaction systems.
We propose Contrastive Learning of Multi-view facial Expressions (CL-MEx) to exploit facial images captured simultaneously from different angles towards FER.
arXiv Detail & Related papers (2021-08-15T11:23:34Z) - Multi-Margin based Decorrelation Learning for Heterogeneous Face
Recognition [90.26023388850771]
This paper presents a deep neural network approach to extract decorrelation representations in a hyperspherical space for cross-domain face images.
The proposed framework can be divided into two components: heterogeneous representation network and decorrelation representation learning.
Experimental results on two challenging heterogeneous face databases show that our approach achieves superior performance on both verification and recognition tasks.
arXiv Detail & Related papers (2020-05-25T07:01:12Z) - DotFAN: A Domain-transferred Face Augmentation Network for Pose and
Illumination Invariant Face Recognition [94.96686189033869]
We propose a 3D model-assisted domain-transferred face augmentation network (DotFAN)
DotFAN can generate a series of variants of an input face based on the knowledge distilled from existing rich face datasets collected from other domains.
Experiments show that DotFAN is beneficial for augmenting small face datasets to improve their within-class diversity.
arXiv Detail & Related papers (2020-02-23T08:16:34Z) - Joint Deep Learning of Facial Expression Synthesis and Recognition [97.19528464266824]
We propose a novel joint deep learning of facial expression synthesis and recognition method for effective FER.
The proposed method involves a two-stage learning procedure. Firstly, a facial expression synthesis generative adversarial network (FESGAN) is pre-trained to generate facial images with different facial expressions.
In order to alleviate the problem of data bias between the real images and the synthetic images, we propose an intra-class loss with a novel real data-guided back-propagation (RDBP) algorithm.
arXiv Detail & Related papers (2020-02-06T10:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.