Related papers: Identity-aware Facial Expression Recognition in Compressed Video

Identity-aware Facial Expression Recognition in Compressed Video

URL: http://arxiv.org/abs/2101.00317v2
Date: Thu, 7 Jan 2021 23:46:22 GMT
Title: Identity-aware Facial Expression Recognition in Compressed Video
Authors: Xiaofeng Liu, Linghao Jin, Xu Han, Jun Lu, Jane You, Lingsheng Kong
Abstract summary: In the up to two orders of magnitude compressed domain, we can explicitly infer the expression from the residual frames. We do not need the identity label or multiple expression samples from the same person for identity elimination. Our solution can achieve comparable or better performance than the recent decoded image based methods.
Score: 27.14473209125735
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper targets to explore the inter-subject variations eliminated facial expression representation in the compressed video domain. Most of the previous methods process the RGB images of a sequence, while the off-the-shelf and valuable expression-related muscle movement already embedded in the compression format. In the up to two orders of magnitude compressed domain, we can explicitly infer the expression from the residual frames and possible to extract identity factors from the I frame with a pre-trained face recognition network. By enforcing the marginal independent of them, the expression feature is expected to be purer for the expression and be robust to identity shifts. We do not need the identity label or multiple expression samples from the same person for identity elimination. Moreover, when the apex frame is annotated in the dataset, the complementary constraint can be further added to regularize the feature-level game. In testing, only the compressed residual frames are required to achieve expression prediction. Our solution can achieve comparable or better performance than the recent decoded image based methods on the typical FER benchmarks with about 3$\times$ faster inference with compressed data.

Related papers

Highly Compressed Tokenizer Can Generate Without Training [0.5033155053523042]
1D image tokenizers represent images as highly compressed one-dimensional sequences of as few as 32 discrete tokens.<n>We find that the high degree of compression achieved by a 1D tokenizer with vector quantization enables image editing and generative capabilities.<n>Our approach is demonstrated for inpainting and text-guided image editing use cases, and can generate diverse and realistic samples without requiring training of any generative model.
arXiv Detail & Related papers (2025-06-09T21:45:03Z)
Deep Lossless Image Compression via Masked Sampling and Coarse-to-Fine Auto-Regression [8.6984128323386]
We propose a deep lossless image compression via masked sampling and coarse-to-fine auto-regression. It combines lossy reconstruction and progressive residual compression, which fuses contexts from various directions. Our method achieves comparable compression performance on extensive datasets with competitive coding speed and more flexibility.
arXiv Detail & Related papers (2025-03-14T09:29:55Z)
WEM-GAN: Wavelet transform based facial expression manipulation [2.0918868193463207]
We propose WEM-GAN, in short for wavelet-based expression manipulation GAN. We take advantage of the wavelet transform technique and combine it with our generator with a U-net autoencoder backbone. Our model performs better in preserving identity features, editing capability, and image generation quality on the AffectNet dataset.
arXiv Detail & Related papers (2024-12-03T16:23:02Z)
EmojiDiff: Advanced Facial Expression Control with High Identity Preservation in Portrait Generation [8.314556078632412]
We introduce EmojiDiff, the first end-to-end solution that enables simultaneous control of extremely detailed expression (RGB-level) and high-fidelity identity in portrait generation. For decoupled training, we innovate ID-irrelevant Data Iteration (IDI) to synthesize cross-identity expression pairs. We also present ID-enhanced Contrast Alignment (ICA) for further fine-tuning.
arXiv Detail & Related papers (2024-12-02T08:24:11Z)
Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaption [57.056311855630916]
We propose a Controllable Generative Image Compression framework, Control-GIC. It is capable of fine-grained adaption across a broad spectrum while ensuring high-fidelity and generality compression. We develop a conditional conditionalization that can trace back to historic encoded multi-granularity representations.
arXiv Detail & Related papers (2024-06-02T14:22:09Z)
Object Recognition as Next Token Prediction [99.40793702627396]
We present an approach to pose object recognition as next token prediction. The idea is to apply a language decoder that auto-regressively predicts the text tokens from image embeddings to form labels.
arXiv Detail & Related papers (2023-12-04T18:58:40Z)
Latent-OFER: Detect, Mask, and Reconstruct with Latent Vectors for Occluded Facial Expression Recognition [0.0]
The proposed method can detect occluded parts of the face as if they were unoccluded, and recognize them, improving FER accuracy. It involves three steps: First, the vision transformer (ViT)-based occlusion patch detector masks the occluded position by training only latent vectors from the unoccluded patches. Second, the hybrid reconstruction network generates the masking position as a complete image using the ViT and convolutional neural network (CNN) Last, the expression-relevant latent vector extractor retrieves and uses expression-related information from all latent vectors by applying a CNN-based class activation map
arXiv Detail & Related papers (2023-07-21T07:56:32Z)
Set-Based Face Recognition Beyond Disentanglement: Burstiness Suppression With Variance Vocabulary [78.203301910422]
We argue that the two crucial issues in SFR, the face quality and burstiness, are both identity-irrelevant and variance-relevant. We propose a light-weighted set-based disentanglement framework to separate the identity features with the variance features. To suppress face burstiness in the sets, we propose a vocabulary-based burst suppression (VBS) method.
arXiv Detail & Related papers (2023-04-13T04:02:58Z)
SARGAN: Spatial Attention-based Residuals for Facial Expression Manipulation [1.7056768055368383]
We present a novel method named SARGAN that addresses the limitations from three perspectives. We exploited a symmetric encoder-decoder network to attend facial features at multiple scales. Our proposed model performs significantly better than state-of-the-art methods.
arXiv Detail & Related papers (2023-03-30T08:15:18Z)
Disentangling Identity and Pose for Facial Expression Recognition [54.50747989860957]
We propose an identity and pose disentangled facial expression recognition (IPD-FER) model to learn more discriminative feature representation. For identity encoder, a well pre-trained face recognition model is utilized and fixed during training, which alleviates the restriction on specific expression training data. By comparing the difference between synthesized neutral and expressional images of the same individual, the expression component is further disentangled from identity and pose.
arXiv Detail & Related papers (2022-08-17T06:48:13Z)
Semi-parametric Makeup Transfer via Semantic-aware Correspondence [99.02329132102098]
Large discrepancy between source non-makeup image and reference makeup image is one of key challenges in makeup transfer. Non-parametric techniques have a high potential for addressing the pose, expression, and occlusion discrepancies. We propose a textbfSemi-textbfparametric textbfMakeup textbfTransfer (SpMT) method, which combines the reciprocal strengths of non-parametric and parametric mechanisms.
arXiv Detail & Related papers (2022-03-04T12:54:19Z)
Mutual Information Regularized Identity-aware Facial ExpressionRecognition in Compressed Video [27.602648102881535]
We propose a novel collaborative min-min game for mutual information (MI) minimization in latent space. We do not need the identity label or multiple expression samples from the same person for identity elimination. Our solution can achieve comparable or better performance than the recent decoded image-based methods.
arXiv Detail & Related papers (2020-10-20T21:42:18Z)
Blind Face Restoration via Deep Multi-scale Component Dictionaries [75.02640809505277]
We propose a deep face dictionary network (termed as DFDNet) to guide the restoration process of degraded observations. DFDNet generates deep dictionaries for perceptually significant face components from high-quality images. component AdaIN is leveraged to eliminate the style diversity between the input and dictionary features.
arXiv Detail & Related papers (2020-08-02T07:02:07Z)
LEED: Label-Free Expression Editing via Disentanglement [57.09545215087179]
LEED framework is capable of editing the expression of both frontal and profile facial images without requiring any expression label. Two novel losses are designed for optimal expression disentanglement and consistent synthesis.
arXiv Detail & Related papers (2020-07-17T13:36:15Z)
Fine-Grained Expression Manipulation via Structured Latent Space [30.789513209376032]
We propose an end-to-end expression-guided generative adversarial network (EGGAN) to manipulate fine-grained expressions. Our method can manipulate fine-grained expressions, and generate continuous intermediate expressions between source and target expressions.
arXiv Detail & Related papers (2020-04-21T06:18:34Z)
Discernible Image Compression [124.08063151879173]
This paper aims to produce compressed images by pursuing both appearance and perceptual consistency. Based on the encoder-decoder framework, we propose using a pre-trained CNN to extract features of the original and compressed images. Experiments on benchmarks demonstrate that images compressed by using the proposed method can also be well recognized by subsequent visual recognition and detection models.
arXiv Detail & Related papers (2020-02-17T07:35:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.