Soften the Mask: Adaptive Temporal Soft Mask for Efficient Dynamic Facial Expression Recognition
- URL: http://arxiv.org/abs/2502.21004v1
- Date: Fri, 28 Feb 2025 12:45:08 GMT
- Title: Soften the Mask: Adaptive Temporal Soft Mask for Efficient Dynamic Facial Expression Recognition
- Authors: Mengzhu Li, Quanxing Zha, Hongjun Wu,
- Abstract summary: Dynamic Facial Expression Recognition (DFER) facilitates the understanding of psychological intentions through non-verbal communication.<n>Existing methods struggle to manage irrelevant information, such as background noise and redundant semantics, which impacts both efficiency and effectiveness.<n>We propose a novel supervised temporal soft masked autoencoder network for DFER, namely AdaTosk, which integrates a parallel supervised classification branch with the self-supervised reconstruction branch.
- Score: 4.151073288078749
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Dynamic Facial Expression Recognition (DFER) facilitates the understanding of psychological intentions through non-verbal communication. Existing methods struggle to manage irrelevant information, such as background noise and redundant semantics, which impacts both efficiency and effectiveness. In this work, we propose a novel supervised temporal soft masked autoencoder network for DFER, namely AdaTosk, which integrates a parallel supervised classification branch with the self-supervised reconstruction branch. The self-supervised reconstruction branch applies random binary hard mask to generate diverse training samples, encouraging meaningful feature representations in visible tokens. Meanwhile the classification branch employs an adaptive temporal soft mask to flexibly mask visible tokens based on their temporal significance. Its two key components, respectively of, class-agnostic and class-semantic soft masks, serve to enhance critical expression moments and reduce semantic redundancy over time. Extensive experiments conducted on widely-used benchmarks demonstrate that our AdaTosk remarkably reduces computational costs compared with current state-of-the-art methods while still maintaining competitive performance.
Related papers
- Adapting Vision-Language Model with Fine-grained Semantics for Open-Vocabulary Segmentation [42.020470627552136]
Open-vocabulary segmentation is primarily bottlenecked by mask classification, not mask generation.<n>We propose a novel Fine-grained Semantic Adaptation (FISA) method to address this limitation.<n>FISA enhances the extracted visual features with fine-grained semantic awareness by explicitly integrating this crucial semantic information early in the visual encoding process.
arXiv Detail & Related papers (2024-09-24T17:50:28Z) - ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework.
We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise.
We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z) - Masked Face Recognition with Generative-to-Discriminative Representations [29.035270415311427]
We propose a unified deep network to learn generative-to-discriminative representations for facilitating masked face recognition.
First, we leverage a generative encoder pretrained for face inpainting and finetune it to represent masked faces into category-aware descriptors.
We incorporate a multi-layer convolutional network as a discriminative reformer and learn it to convert the category-aware descriptors into identity-aware vectors.
arXiv Detail & Related papers (2024-05-27T02:20:55Z) - Cross-Task Multi-Branch Vision Transformer for Facial Expression and Mask Wearing Classification [13.995453649985732]
We propose a unified multi-branch vision transformer for facial expression recognition and mask wearing classification tasks.
Our approach extracts shared features for both tasks using a dual-branch architecture.
Our proposed framework reduces the overall complexity compared with using separate networks for both tasks.
arXiv Detail & Related papers (2024-04-22T22:02:19Z) - Downstream Task Guided Masking Learning in Masked Autoencoders Using Multi-Level Optimization [40.78236375917571]
Masked Autoencoder (MAE) is a notable method for self-supervised pretraining in visual representation learning.
We introduce the Multi-level Optimized Mask Autoencoder (MLO-MAE), a novel framework that leverages end-to-end feedback from downstream tasks to learn an optimal masking strategy during pretraining.
arXiv Detail & Related papers (2024-02-28T07:37:26Z) - Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision [87.15580604023555]
Unpair-Seg is a novel weakly-supervised open-vocabulary segmentation framework.
It learns from unpaired image-mask and image-text pairs, which can be independently and efficiently collected.
It achieves 14.6% and 19.5% mIoU on the ADE-847 and PASCAL Context-459 datasets.
arXiv Detail & Related papers (2024-02-14T06:01:44Z) - Variance-insensitive and Target-preserving Mask Refinement for
Interactive Image Segmentation [68.16510297109872]
Point-based interactive image segmentation can ease the burden of mask annotation in applications such as semantic segmentation and image editing.
We introduce a novel method, Variance-Insensitive and Target-Preserving Mask Refinement to enhance segmentation quality with fewer user inputs.
Experiments on GrabCut, Berkeley, SBD, and DAVIS datasets demonstrate our method's state-of-the-art performance in interactive image segmentation.
arXiv Detail & Related papers (2023-12-22T02:31:31Z) - CL-MAE: Curriculum-Learned Masked Autoencoders [49.24994655813455]
We propose a curriculum learning approach that updates the masking strategy to continually increase the complexity of the self-supervised reconstruction task.
We train our Curriculum-Learned Masked Autoencoder (CL-MAE) on ImageNet and show that it exhibits superior representation learning capabilities compared to MAE.
arXiv Detail & Related papers (2023-08-31T09:13:30Z) - Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training [55.12082817901671]
We propose a new self-supervised pre-training approach, named Masked and Permuted Vision Transformer (MaPeT)
MaPeT employs autoregressive and permuted predictions to capture intra-patch dependencies.
Our results demonstrate that MaPeT achieves competitive performance on ImageNet, compared to baselines and competitors under the same model setting.
arXiv Detail & Related papers (2023-06-12T18:12:19Z) - Non-Iterative Scribble-Supervised Learning with Pacing Pseudo-Masks for
Medical Image Segmentation [13.940364677162968]
Scribble-supervised medical image segmentation tackles the limitation of sparse masks.
We propose a non-iterative method where a stream of varying (pacing) pseudo-masks teach a network via consistency training, named PacingPseudo.
The efficacy of the proposed PacingPseudo is validated on three public medical image datasets.
arXiv Detail & Related papers (2022-10-20T01:57:44Z) - MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image
Pretraining [138.86293836634323]
MaskCLIP incorporates a newly proposed masked self-distillation into contrastive language-image pretraining.
MaskCLIP achieves superior results in linear probing, finetuning, and zero-shot performance with the guidance of the language encoder.
arXiv Detail & Related papers (2022-08-25T17:59:58Z) - Self-Supervised Visual Representations Learning by Contrastive Mask
Prediction [129.25459808288025]
We propose a novel contrastive mask prediction (CMP) task for visual representation learning.
MaskCo contrasts region-level features instead of view-level features, which makes it possible to identify the positive sample without any assumptions.
We evaluate MaskCo on training datasets beyond ImageNet and compare its performance with MoCo V2.
arXiv Detail & Related papers (2021-08-18T02:50:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.