Exploring 3D Face Reconstruction and Fusion Methods for Face Verification: A Case-Study in Video Surveillance
- URL: http://arxiv.org/abs/2409.10481v1
- Date: Mon, 16 Sep 2024 17:17:47 GMT
- Title: Exploring 3D Face Reconstruction and Fusion Methods for Face Verification: A Case-Study in Video Surveillance
- Authors: Simone Maurizio La Cava, Sara Concas, Ruben Tolosana, Roberto Casula, Giulia OrrĂ¹, Martin Drahansky, Julian Fierrez, Gian Luca Marcialis,
- Abstract summary: 3D face reconstruction (3DFR) algorithms are based on specific assumptions tailored to distinct application scenarios.
We show that the complementarity induced by different 3DFR algorithms improves performance when tests are conducted at never-seen-before distances from the camera.
- Score: 6.277064632667653
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D face reconstruction (3DFR) algorithms are based on specific assumptions tailored to distinct application scenarios. These assumptions limit their use when acquisition conditions, such as the subject's distance from the camera or the camera's characteristics, are different than expected, as typically happens in video surveillance. Additionally, 3DFR algorithms follow various strategies to address the reconstruction of a 3D shape from 2D data, such as statistical model fitting, photometric stereo, or deep learning. In the present study, we explore the application of three 3DFR algorithms representative of the SOTA, employing each one as the template set generator for a face verification system. The scores provided by each system are combined by score-level fusion. We show that the complementarity induced by different 3DFR algorithms improves performance when tests are conducted at never-seen-before distances from the camera and camera characteristics (cross-distance and cross-camera settings), thus encouraging further investigations on multiple 3DFR-based approaches.
Related papers
- Exploiting Multiple Representations: 3D Face Biometrics Fusion with Application to Surveillance [6.277064632667653]
3D face reconstruction (3DFR) algorithms are based on specific assumptions tailored to the limits and characteristics of the different application scenarios.
In this study, we investigate how multiple state-of-the-art 3DFR algorithms can be used to generate a better representation of subjects.
We also explore how different parametric and non-parametric score-level fusion methods can exploit the unique strengths of multiple 3DFR algorithms to enhance biometric recognition robustness.
arXiv Detail & Related papers (2025-04-26T10:21:46Z) - Adapt3R: Adaptive 3D Scene Representation for Domain Transfer in Imitation Learning [28.80962812015936]
3D scene representations that incorporate observations from calibrated RGBD cameras have been proposed as a way to improve generalizability of IL policies.
We propose Adaptive 3D Scene Representation (Adapt3R) which uses a novel architecture to synthesize data from one or more RGBD cameras into a single vector that can then be used as conditioning for arbitrary IL algorithms.
We show that when trained end-to-end with several SOTA multi-task IL algorithms, Adapt3R maintains these algorithms' multi-task learning capacity while enabling zero-shot transfer to novel embodiments and camera poses.
arXiv Detail & Related papers (2025-03-06T18:17:09Z) - Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors [17.544733016978928]
3D object generation from a single image involves estimating the full 3D geometry and texture of unseen views from an unposed RGB image captured in the wild.
Recent advancements in 3D object generation have introduced techniques that reconstruct an object's 3D shape and texture.
We propose bridging the gap between 2D and 3D diffusion models to address this limitation.
arXiv Detail & Related papers (2024-10-12T10:14:11Z) - VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection [80.62052650370416]
monocular 3D object detection holds significant importance across various applications, including autonomous driving and robotics.
In this paper, we present VFMM3D, an innovative framework that leverages the capabilities of Vision Foundation Models (VFMs) to accurately transform single-view images into LiDAR point cloud representations.
arXiv Detail & Related papers (2024-04-15T03:12:12Z) - MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation [54.27399121779011]
We present MVD-Fusion: a method for single-view 3D inference via generative modeling of multi-view-consistent RGB-D images.
We show that our approach can yield more accurate synthesis compared to recent state-of-the-art, including distillation-based 3D inference and prior multi-view generation methods.
arXiv Detail & Related papers (2024-04-04T17:59:57Z) - OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation [67.56268991234371]
OV-Uni3DETR achieves the state-of-the-art performance on various scenarios, surpassing existing methods by more than 6% on average.
Code and pre-trained models will be released later.
arXiv Detail & Related papers (2024-03-28T17:05:04Z) - Geometry-Biased Transformer for Robust Multi-View 3D Human Pose
Reconstruction [3.069335774032178]
We propose a novel encoder-decoder Transformer architecture to estimate 3D poses from multi-view 2D pose sequences.
We conduct experiments on three benchmark public datasets, Human3.6M, CMU Panoptic and Occlusion-Persons.
arXiv Detail & Related papers (2023-12-28T16:30:05Z) - 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and
Quasi-Planar Segmentation [53.83313235792596]
We present a new methodology for real-time semantic mapping from RGB-D sequences.
It combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping.
Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems.
arXiv Detail & Related papers (2023-06-28T22:36:44Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.