KeyPoint Relative Position Encoding for Face Recognition
- URL: http://arxiv.org/abs/2403.14852v1
- Date: Thu, 21 Mar 2024 21:56:09 GMT
- Title: KeyPoint Relative Position Encoding for Face Recognition
- Authors: Minchul Kim, Yiyang Su, Feng Liu, Anil Jain, Xiaoming Liu,
- Abstract summary: Keypoint RPE (KP-RPE) is an extension of the principle where significance of pixels is not solely dictated by their proximity.
Code and pre-trained models are available.
- Score: 15.65725865703615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we address the challenge of making ViT models more robust to unseen affine transformations. Such robustness becomes useful in various recognition tasks such as face recognition when image alignment failures occur. We propose a novel method called KP-RPE, which leverages key points (e.g.~facial landmarks) to make ViT more resilient to scale, translation, and pose variations. We begin with the observation that Relative Position Encoding (RPE) is a good way to bring affine transform generalization to ViTs. RPE, however, can only inject the model with prior knowledge that nearby pixels are more important than far pixels. Keypoint RPE (KP-RPE) is an extension of this principle, where the significance of pixels is not solely dictated by their proximity but also by their relative positions to specific keypoints within the image. By anchoring the significance of pixels around keypoints, the model can more effectively retain spatial relationships, even when those relationships are disrupted by affine transformations. We show the merit of KP-RPE in face and gait recognition. The experimental results demonstrate the effectiveness in improving face recognition performance from low-quality images, particularly where alignment is prone to failure. Code and pre-trained models are available.
Related papers
- CSIM: A Copula-based similarity index sensitive to local changes for Image quality assessment [2.3874115898130865]
Image similarity metrics play an important role in computer vision applications, as they are used in image processing, computer vision and machine learning.
Existing metrics, such as PSNR, MSE, SSIM, ISSM and FSIM, often face limitations in terms of either speed, complexity or sensitivity to small changes in images.
A novel image similarity metric, namely CSIM, that combines real-time while being sensitive to subtle image variations is investigated in this paper.
arXiv Detail & Related papers (2024-10-02T10:46:05Z) - CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition.
Our method uses the attention mechanism to correlate multiple images within a batch.
Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z) - Differentiable Registration of Images and LiDAR Point Clouds with
VoxelPoint-to-Pixel Matching [58.10418136917358]
Cross-modality registration between 2D images from cameras and 3D point clouds from LiDARs is a crucial task in computer vision and robotic training.
Previous methods estimate 2D-3D correspondences by matching point and pixel patterns learned by neural networks.
We learn a structured cross-modality matching solver to represent 3D features via a different latent pixel space.
arXiv Detail & Related papers (2023-12-07T05:46:10Z) - Pixel-Inconsistency Modeling for Image Manipulation Localization [59.968362815126326]
Digital image forensics plays a crucial role in image authentication and manipulation localization.
This paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts.
Experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints.
arXiv Detail & Related papers (2023-09-30T02:54:51Z) - Enhancing Deformable Local Features by Jointly Learning to Detect and
Describe Keypoints [8.390939268280235]
Local feature extraction is a standard approach in computer vision for tackling important tasks such as image matching and retrieval.
We propose DALF, a novel deformation-aware network for jointly detecting and describing keypoints.
Our approach also enhances the performance of two real-world applications: deformable object retrieval and non-rigid 3D surface registration.
arXiv Detail & Related papers (2023-04-02T18:01:51Z) - Parameter Efficient Local Implicit Image Function Network for Face
Segmentation [13.124513975412254]
Face parsing is defined as the per-pixel labeling of images containing human faces.
We make use of the structural consistency of the human face to propose a lightweight face-parsing method.
arXiv Detail & Related papers (2023-03-27T11:50:27Z) - Self-Supervised Equivariant Learning for Oriented Keypoint Detection [35.94215211409985]
We introduce a self-supervised learning framework using rotation-equivariant CNNs to learn to detect robust oriented keypoints.
We propose a dense orientation alignment loss by an image pair generated by synthetic transformations for training a histogram-based orientation map.
Our method outperforms the previous methods on an image matching benchmark and a camera pose estimation benchmark.
arXiv Detail & Related papers (2022-04-19T02:26:07Z) - Cascaded Asymmetric Local Pattern: A Novel Descriptor for Unconstrained
Facial Image Recognition and Retrieval [20.77994516381]
In this paper a novel hand crafted cascaded asymmetric local pattern (CALP) is proposed for retrieval and recognition facial image.
The proposed encoding scheme has optimum feature length and shows significant improvement in accuracy under environmental and physiological changes in a facial image.
arXiv Detail & Related papers (2022-01-03T08:23:38Z) - Pixel-Perfect Structure-from-Motion with Featuremetric Refinement [96.73365545609191]
We refine two key steps of structure-from-motion by a direct alignment of low-level image information from multiple views.
This significantly improves the accuracy of camera poses and scene geometry for a wide range of keypoint detectors.
Our system easily scales to large image collections, enabling pixel-perfect crowd-sourced localization at scale.
arXiv Detail & Related papers (2021-08-18T17:58:55Z) - Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression [81.05772887221333]
We study the dense keypoint regression framework that is previously inferior to the keypoint detection and grouping framework.
We present a simple yet effective approach, named disentangled keypoint regression (DEKR)
We empirically show that the proposed direct regression method outperforms keypoint detection and grouping methods.
arXiv Detail & Related papers (2021-04-06T05:54:46Z) - Bottom-Up Human Pose Estimation by Ranking Heatmap-Guided Adaptive
Keypoint Estimates [76.51095823248104]
We present several schemes that are rarely or unthoroughly studied before for improving keypoint detection and grouping (keypoint regression) performance.
First, we exploit the keypoint heatmaps for pixel-wise keypoint regression instead of separating them for improving keypoint regression.
Second, we adopt a pixel-wise spatial transformer network to learn adaptive representations for handling the scale and orientation variance.
Third, we present a joint shape and heatvalue scoring scheme to promote the estimated poses that are more likely to be true poses.
arXiv Detail & Related papers (2020-06-28T01:14:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.