Related papers: 2D-3D Attention and Entropy for Pose Robust 2D Facial Recognition

2D-3D Attention and Entropy for Pose Robust 2D Facial Recognition

URL: http://arxiv.org/abs/2505.09073v1
Date: Wed, 14 May 2025 02:17:53 GMT
Title: 2D-3D Attention and Entropy for Pose Robust 2D Facial Recognition
Authors: J. Brennan Peace, Shuowen Hu, Benjamin S. Riggan,
Abstract summary: We propose a novel performance framework to facilitate improvement across large discrepancies in image-based performances.<n>Our proposed framework achieves better performances in at least 7.100 by enabling shared cloud (3D) representations.
Score: 3.1632426898254224
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Despite recent advances in facial recognition, there remains a fundamental issue concerning degradations in performance due to substantial perspective (pose) differences between enrollment and query (probe) imagery. Therefore, we propose a novel domain adaptive framework to facilitate improved performances across large discrepancies in pose by enabling image-based (2D) representations to infer properties of inherently pose invariant point cloud (3D) representations. Specifically, our proposed framework achieves better pose invariance by using (1) a shared (joint) attention mapping to emphasize common patterns that are most correlated between 2D facial images and 3D facial data and (2) a joint entropy regularizing loss to promote better consistency$\unicode{x2014}$enhancing correlations among the intersecting 2D and 3D representations$\unicode{x2014}$by leveraging both attention maps. This framework is evaluated on FaceScape and ARL-VTF datasets, where it outperforms competitive methods by achieving profile (90$\unicode{x00b0}$$\unicode{x002b}$) TAR @ 1$\unicode{x0025}$ FAR improvements of at least 7.1$\unicode{x0025}$ and 1.57$\unicode{x0025}$, respectively.

Related papers

DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image [98.29284902879652]
We present DICE, the first end-to-end method for Deformation-aware hand-face Interaction reCovEry from a single image.<n>It features disentangling the regression of local deformation fields and global mesh locations into two network branches.<n>It achieves state-of-the-art performance on a standard benchmark and in-the-wild data in terms of accuracy and physical plausibility.
arXiv Detail & Related papers (2024-06-26T00:08:29Z)
PostoMETRO: Pose Token Enhanced Mesh Transformer for Robust 3D Human Mesh Recovery [20.763457281944834]
We present PostoMETRO, which integrates 2D pose representation into transformers in a token-wise manner. We are able to produce more precise 3D coordinates, even under extreme scenarios like occlusion.
arXiv Detail & Related papers (2024-03-19T06:18:25Z)
Learning Naturally Aggregated Appearance for Efficient 3D Editing [90.57414218888536]
We learn the color field as an explicit 2D appearance aggregation, also called canonical image.<n>We complement the canonical image with a projection field that maps 3D points onto 2D pixels for texture query.<n>Our approach demonstrates remarkable efficiency by being at least 20 times faster per edit compared to existing NeRF-based editing methods.
arXiv Detail & Related papers (2023-12-11T18:59:31Z)
Two Views Are Better than One: Monocular 3D Pose Estimation with Multiview Consistency [0.493599216374976]
We introduce a novel loss function, consistency loss, which operates on two synchronized views.<n>Our consistency loss substantially improves performance for fine-tuning without requiring 3D data.<n>We show that using our consistency loss can yield state-of-the-art performance when training models from scratch in a semi-supervised manner.
arXiv Detail & Related papers (2023-11-21T08:21:55Z)
A Single 2D Pose with Context is Worth Hundreds for 3D Human Pose Estimation [18.72362803593654]
The dominant paradigm in 3D human pose estimation that lifts a 2D pose sequence to 3D heavily relies on long-term temporal clues. This can be attributed to their inherent inability to perceive spatial context as plain 2D joint coordinates carry no visual cues. We propose a straightforward yet powerful solution: leveraging the readily available intermediate visual representations produced by off-the-shelf (pre-trained) 2D pose detectors.
arXiv Detail & Related papers (2023-11-06T18:04:13Z)
Unpaired Multi-domain Attribute Translation of 3D Facial Shapes with a Square and Symmetric Geometric Map [23.461476902880584]
We propose a learning framework for 3D facial attribute translation. We use a novel geometric map for 3D shape representation and embed it in an end-to-end generative adversarial network. We employ a unified and unpaired learning framework for multi-domain attribute translation.
arXiv Detail & Related papers (2023-08-25T08:37:55Z)
Learning from Abstract Images: on the Importance of Occlusion in a Minimalist Encoding of Human Poses [0.0]
2D-to-D representation suffers from poor performance in cross-dataset benchmarks. We propose a novel representation using 3D information while encoding it. The result allows us to predict poses that are completely independent of camera viewpoint.
arXiv Detail & Related papers (2023-07-19T10:45:49Z)
MPM: A Unified 2D-3D Human Pose Representation via Masked Pose Modeling [59.74064212110042]
mpmcan handle multiple tasks including 3D human pose estimation, 3D pose estimation from cluded 2D pose, and 3D pose completion in a textocbfsingle framework. We conduct extensive experiments and ablation studies on several widely used human pose datasets and achieve state-of-the-art performance on MPI-INF-3DHP.
arXiv Detail & Related papers (2023-06-29T10:30:00Z)
CheckerPose: Progressive Dense Keypoint Localization for Object Pose Estimation with Graph Neural Network [66.24726878647543]
Estimating the 6-DoF pose of a rigid object from a single RGB image is a crucial yet challenging task. Recent studies have shown the great potential of dense correspondence-based solutions. We propose a novel pose estimation algorithm named CheckerPose, which improves on three main aspects.
arXiv Detail & Related papers (2023-03-29T17:30:53Z)
RiCS: A 2D Self-Occlusion Map for Harmonizing Volumetric Objects [68.85305626324694]
Ray-marching in Camera Space (RiCS) is a new method to represent the self-occlusions of foreground objects in 3D into a 2D self-occlusion map. We show that our representation map not only allows us to enhance the image quality but also to model temporally coherent complex shadow effects.
arXiv Detail & Related papers (2022-05-14T05:35:35Z)
KeypointNeRF: Generalizing Image-based Volumetric Avatars using Relative Spatial Encoding of Keypoints [28.234772596912165]
We propose a highly effective approach to modeling high-fidelity volumetric avatars from sparse views. One of the key ideas is to encode relative spatial 3D information via sparse 3D keypoints. Our experiments show that a majority of errors in prior work stem from an inappropriate choice of spatial encoding.
arXiv Detail & Related papers (2022-05-10T15:57:03Z)
Non-Local Latent Relation Distillation for Self-Adaptive 3D Human Pose Estimation [63.199549837604444]
3D human pose estimation approaches leverage different forms of strong (2D/3D pose) or weak (multi-view or depth) paired supervision. We cast 3D pose learning as a self-supervised adaptation problem that aims to transfer the task knowledge from a labeled source domain to a completely unpaired target. We evaluate different self-adaptation settings and demonstrate state-of-the-art 3D human pose estimation performance on standard benchmarks.
arXiv Detail & Related papers (2022-04-05T03:52:57Z)
Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach [76.10879433430466]
We propose to estimate 3D human pose from multi-view images and a few IMUs attached at person's limbs. It operates by firstly detecting 2D poses from the two signals, and then lifting them to the 3D space. The simple two-step approach reduces the error of the state-of-the-art by a large margin on a public dataset.
arXiv Detail & Related papers (2020-03-25T00:26:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.