Related papers: Deep and Shallow Covariance Feature Quantization for 3D Facial Expression Recognition

Deep and Shallow Covariance Feature Quantization for 3D Facial Expression Recognition

URL: http://arxiv.org/abs/2105.05708v1
Date: Wed, 12 May 2021 14:48:39 GMT
Title: Deep and Shallow Covariance Feature Quantization for 3D Facial Expression Recognition
Authors: Walid Hariri, Nadir Farah, Dinesh Kumar Vishwakarma
Abstract summary: We propose a multi-modal 2D + 3D feature-based method for facial expression recognition. We extract shallow features from the 3D images, and deep features using Convolutional Neural Networks (CNN) from the transformed 2D images. High classification performances have been achieved on the BU-3DFE and Bosphorus datasets.
Score: 7.773399781313892
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Facial expressions recognition (FER) of 3D face scans has received a significant amount of attention in recent years. Most of the facial expression recognition methods have been proposed using mainly 2D images. These methods suffer from several issues like illumination changes and pose variations. Moreover, 2D mapping from 3D images may lack some geometric and topological characteristics of the face. Hence, to overcome this problem, a multi-modal 2D + 3D feature-based method is proposed. We extract shallow features from the 3D images, and deep features using Convolutional Neural Networks (CNN) from the transformed 2D images. Combining these features into a compact representation uses covariance matrices as descriptors for both features instead of single-handedly descriptors. A covariance matrix learning is used as a manifold layer to reduce the deep covariance matrices size and enhance their discrimination power while preserving their manifold structure. We then use the Bag-of-Features (BoF) paradigm to quantize the covariance matrices after flattening. Accordingly, we obtained two codebooks using shallow and deep features. The global codebook is then used to feed an SVM classifier. High classification performances have been achieved on the BU-3DFE and Bosphorus datasets compared to the state-of-the-art methods.

Related papers

GSV3D: Gaussian Splatting-based Geometric Distillation with Stable Video Diffusion for Single-Image 3D Object Generation [24.255633621887988]
We propose a method that leverages 2D diffusion models' implicit 3D reasoning ability while ensuring 3D consistency. Specifically, the proposed Gaussian Splatting Decoder enforces 3D consistency by transforming SV3D latent outputs into an explicit 3D representation. As a result, our approach simultaneously generates high-quality, multi-view-consistent images and accurate 3D models.
arXiv Detail & Related papers (2025-03-08T09:10:31Z)
Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields. LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation. It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z)
Robust 3D Point Clouds Classification based on Declarative Defenders [18.51700931775295]
3D point clouds are unstructured and sparse, while 2D images are structured and dense. In this paper, we explore three distinct algorithms for mapping 3D point clouds into 2D images. The proposed approaches demonstrate superior accuracy and robustness against adversarial attacks.
arXiv Detail & Related papers (2024-10-13T01:32:38Z)
RAFaRe: Learning Robust and Accurate Non-parametric 3D Face Reconstruction from Pseudo 2D&3D Pairs [13.11105614044699]
We propose a robust and accurate non-parametric method for single-view 3D face reconstruction (SVFR) A large-scale pseudo 2D&3D dataset is created by first rendering the detailed 3D faces, then swapping the face in the wild images with the rendered face. Our model outperforms previous methods on FaceScape-wild/lab and MICC benchmarks.
arXiv Detail & Related papers (2023-02-10T19:40:26Z)
Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders [52.91248611338202]
We propose an alternative to obtain superior 3D representations from 2D pre-trained models via Image-to-Point Masked Autoencoders, named as I2P-MAE. By self-supervised pre-training, we leverage the well learned 2D knowledge to guide 3D masked autoencoding. I2P-MAE attains the state-of-the-art 90.11% accuracy, +3.68% to the second-best, demonstrating superior transferable capacity.
arXiv Detail & Related papers (2022-12-13T17:59:20Z)
Deep-MDS Framework for Recovering the 3D Shape of 2D Landmarks from a Single Image [8.368476827165114]
This paper proposes a framework to recover the 3D shape of 2D landmarks on a human face, in a single input image. A deep neural network learns the pairwise dissimilarity among 2D landmarks, used by NMDS approach.
arXiv Detail & Related papers (2022-10-27T06:20:10Z)
PointMCD: Boosting Deep Point Cloud Encoders via Multi-view Cross-modal Distillation for 3D Shape Recognition [55.38462937452363]
We propose a unified multi-view cross-modal distillation architecture, including a pretrained deep image encoder as the teacher and a deep point encoder as the student. By pair-wise aligning multi-view visual and geometric descriptors, we can obtain more powerful deep point encoders without exhausting and complicated network modification.
arXiv Detail & Related papers (2022-07-07T07:23:20Z)
End-to-End Learning of Multi-category 3D Pose and Shape Estimation [128.881857704338]
We propose an end-to-end method that simultaneously detects 2D keypoints from an image and lifts them to 3D. The proposed method learns both 2D detection and 3D lifting only from 2D keypoints annotations. In addition to being end-to-end in image to 3D learning, our method also handles objects from multiple categories using a single neural network.
arXiv Detail & Related papers (2021-12-19T17:10:40Z)
Implicit Neural Deformation for Multi-View Face Reconstruction [43.88676778013593]
We present a new method for 3D face reconstruction from multi-view RGB images. Unlike previous methods which are built upon 3D morphable models, our method leverages an implicit representation to encode rich geometric features. Our experimental results on several benchmark datasets demonstrate that our approach outperforms alternative baselines and achieves superior face reconstruction results compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-12-05T07:02:53Z)
Facial Depth and Normal Estimation using Single Dual-Pixel Camera [81.02680586859105]
We introduce a DP-oriented Depth/Normal network that reconstructs the 3D facial geometry. It contains the corresponding ground-truth 3D models including depth map and surface normal in metric scale. It achieves state-of-the-art performances over recent DP-based depth/normal estimation methods.
arXiv Detail & Related papers (2021-11-25T05:59:27Z)
Hard Example Generation by Texture Synthesis for Cross-domain Shape Similarity Learning [97.56893524594703]
Image-based 3D shape retrieval (IBSR) aims to find the corresponding 3D shape of a given 2D image from a large 3D shape database. metric learning with some adaptation techniques seems to be a natural solution to shape similarity learning. We develop a geometry-focused multi-view metric learning framework empowered by texture synthesis.
arXiv Detail & Related papers (2020-10-23T08:52:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.