CapsField: Light Field-based Face and Expression Recognition in the Wild
using Capsule Routing
- URL: http://arxiv.org/abs/2101.03503v1
- Date: Sun, 10 Jan 2021 09:06:02 GMT
- Title: CapsField: Light Field-based Face and Expression Recognition in the Wild
using Capsule Routing
- Authors: Alireza Sepas-Moghaddam, Ali Etemad, Fernando Pereira, Paulo Lobato
Correia
- Abstract summary: This paper proposes a new deep face and expression recognition solution, called CapsField, based on a convolutional neural network.
The proposed solution achieves superior performance for both face and expression recognition tasks when compared to the state-of-the-art.
- Score: 81.21490913108835
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Light field (LF) cameras provide rich spatio-angular visual representations
by sensing the visual scene from multiple perspectives and have recently
emerged as a promising technology to boost the performance of human-machine
systems such as biometrics and affective computing. Despite the significant
success of LF representation for constrained facial image analysis, this
technology has never been used for face and expression recognition in the wild.
In this context, this paper proposes a new deep face and expression recognition
solution, called CapsField, based on a convolutional neural network and an
additional capsule network that utilizes dynamic routing to learn hierarchical
relations between capsules. CapsField extracts the spatial features from facial
images and learns the angular part-whole relations for a selected set of 2D
sub-aperture images rendered from each LF image. To analyze the performance of
the proposed solution in the wild, the first in the wild LF face dataset, along
with a new complementary constrained face dataset captured from the same
subjects recorded earlier have been captured and are made available. A subset
of the in the wild dataset contains facial images with different expressions,
annotated for usage in the context of face expression recognition tests. An
extensive performance assessment study using the new datasets has been
conducted for the proposed and relevant prior solutions, showing that the
CapsField proposed solution achieves superior performance for both face and
expression recognition tasks when compared to the state-of-the-art.
Related papers
- MaskInversion: Localized Embeddings via Optimization of Explainability Maps [49.50785637749757]
MaskInversion generates a context-aware embedding for a query image region specified by a mask at test time.
It can be used for a broad range of tasks, including open-vocabulary class retrieval, referring expression comprehension, as well as for localized captioning and image generation.
arXiv Detail & Related papers (2024-07-29T14:21:07Z) - Alleviating Catastrophic Forgetting in Facial Expression Recognition with Emotion-Centered Models [49.3179290313959]
The proposed method, emotion-centered generative replay (ECgr), tackles this challenge by integrating synthetic images from generative adversarial networks.
ECgr incorporates a quality assurance algorithm to ensure the fidelity of generated images.
The experimental results on four diverse facial expression datasets demonstrate that incorporating images generated by our pseudo-rehearsal method enhances training on the targeted dataset and the source dataset.
arXiv Detail & Related papers (2024-04-18T15:28:34Z) - E2F-Net: Eyes-to-Face Inpainting via StyleGAN Latent Space [4.110419543591102]
We propose a Generative Adversarial Network (GAN)-based model called Eyes-to-Face Network (E2F-Net)
The proposed approach extracts identity and non-identity features from the periocular region using two dedicated encoders.
We show that our method successfully reconstructs the whole face with high quality, surpassing current techniques.
arXiv Detail & Related papers (2024-03-18T19:11:34Z) - Applying Unsupervised Semantic Segmentation to High-Resolution UAV Imagery for Enhanced Road Scene Parsing [12.558144256470827]
A novel unsupervised road parsing framework is presented.
The proposed method achieves a mean Intersection over Union (mIoU) of 89.96% on the development dataset without any manual annotation.
arXiv Detail & Related papers (2024-02-05T13:16:12Z) - Cross-view Self-localization from Synthesized Scene-graphs [1.9580473532948401]
Cross-view self-localization is a challenging scenario of visual place recognition in which database images are provided from sparse viewpoints.
We propose a new hybrid scene model that combines the advantages of view-invariant appearance features computed from raw images and view-dependent spatial-semantic features computed from synthesized images.
arXiv Detail & Related papers (2023-10-24T04:16:27Z) - Multi-modal reward for visual relationships-based image captioning [4.354364351426983]
This paper proposes a deep neural network architecture for image captioning based on fusing the visual relationships information extracted from an image's scene graph with the spatial feature maps of the image.
A multi-modal reward function is then introduced for deep reinforcement learning of the proposed network using a combination of language and vision similarities in a common embedding space.
arXiv Detail & Related papers (2023-03-19T20:52:44Z) - RoI Tanh-polar Transformer Network for Face Parsing in the Wild [50.8865921538953]
Face parsing aims to predict pixel-wise labels for facial components of a target face in an image.
Existing approaches usually crop the target face from the input image with respect to a bounding box calculated during pre-processing.
We propose RoI Tanh-polar transform that warps the whole image to a Tanh-polar representation with a fixed ratio between the face area and the context.
Third, we propose a hybrid residual representation learning block, coined HybridBlock, that contains convolutional layers in both the Tanh-polar space and the Tanh-Cartesian space.
arXiv Detail & Related papers (2021-02-04T16:25:26Z) - MorphGAN: One-Shot Face Synthesis GAN for Detecting Recognition Bias [13.162012586770576]
We describe a simulator that applies specific head pose and facial expression adjustments to images of previously unseen people.
We show that by augmenting small datasets of faces with new poses and expressions improves the recognition performance by up to 9% depending on the augmentation and data scarcity.
arXiv Detail & Related papers (2020-12-09T18:43:03Z) - InterFaceGAN: Interpreting the Disentangled Face Representation Learned
by GANs [73.27299786083424]
We propose a framework called InterFaceGAN to interpret the disentangled face representation learned by state-of-the-art GAN models.
We first find that GANs learn various semantics in some linear subspaces of the latent space.
We then conduct a detailed study on the correlation between different semantics and manage to better disentangle them via subspace projection.
arXiv Detail & Related papers (2020-05-18T18:01:22Z) - Joint Deep Learning of Facial Expression Synthesis and Recognition [97.19528464266824]
We propose a novel joint deep learning of facial expression synthesis and recognition method for effective FER.
The proposed method involves a two-stage learning procedure. Firstly, a facial expression synthesis generative adversarial network (FESGAN) is pre-trained to generate facial images with different facial expressions.
In order to alleviate the problem of data bias between the real images and the synthetic images, we propose an intra-class loss with a novel real data-guided back-propagation (RDBP) algorithm.
arXiv Detail & Related papers (2020-02-06T10:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.