NR-DFERNet: Noise-Robust Network for Dynamic Facial Expression
Recognition
- URL: http://arxiv.org/abs/2206.04975v1
- Date: Fri, 10 Jun 2022 10:17:30 GMT
- Title: NR-DFERNet: Noise-Robust Network for Dynamic Facial Expression
Recognition
- Authors: Hanting Li, Mingzhe Sui, Zhaoqing Zhu, and Feng zhao
- Abstract summary: We propose a noise-robust dynamic facial expression recognition network (NR-DFERNet) to reduce the interference of noisy frames on the DFER task.
Specifically, at the spatial stage, we devise a dynamic-static fusion module (DSF) that introduces dynamic features to static features for learning more discriminative spatial features.
To suppress the impact of target irrelevant frames, we introduce a novel dynamic class token (DCT) for the transformer at the temporal stage.
- Score: 1.8604727699812171
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dynamic facial expression recognition (DFER) in the wild is an extremely
challenging task, due to a large number of noisy frames in the video sequences.
Previous works focus on extracting more discriminative features, but ignore
distinguishing the key frames from the noisy frames. To tackle this problem, we
propose a noise-robust dynamic facial expression recognition network
(NR-DFERNet), which can effectively reduce the interference of noisy frames on
the DFER task. Specifically, at the spatial stage, we devise a dynamic-static
fusion module (DSF) that introduces dynamic features to static features for
learning more discriminative spatial features. To suppress the impact of target
irrelevant frames, we introduce a novel dynamic class token (DCT) for the
transformer at the temporal stage. Moreover, we design a snippet-based filter
(SF) at the decision stage to reduce the effect of too many neutral frames on
non-neutral sequence classification. Extensive experimental results demonstrate
that our NR-DFERNet outperforms the state-of-the-art methods on both the DFEW
and AFEW benchmarks.
Related papers
- Robust Network Learning via Inverse Scale Variational Sparsification [55.64935887249435]
We introduce an inverse scale variational sparsification framework within a time-continuous inverse scale space formulation.
Unlike frequency-based methods, our approach not only removes noise by smoothing small-scale features.
We show the efficacy of our approach through enhanced robustness against various noise types.
arXiv Detail & Related papers (2024-09-27T03:17:35Z) - Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization [60.899082019130766]
We introduce a frame-level detection network (FDN) and a proposal refinement network (PRN) for audio temporal forgery detection and localization.
FDN aims to mine informative inconsistency cues between real and fake frames to obtain discriminative features that are beneficial for roughly indicating forgery regions.
PRN is responsible for predicting confidence scores and regression offsets to refine the coarse-grained proposals derived from the FDN.
arXiv Detail & Related papers (2024-07-23T15:07:52Z) - NID-SLAM: Neural Implicit Representation-based RGB-D SLAM in dynamic environments [9.706447888754614]
We present NID-SLAM, which significantly improves the performance of neural SLAM in dynamic environments.
We propose a new approach to enhance inaccurate regions in semantic masks, particularly in marginal areas.
We also introduce a selection strategy for dynamic scenes, which enhances camera tracking robustness against large-scale objects.
arXiv Detail & Related papers (2024-01-02T12:35:03Z) - Video Dynamics Prior: An Internal Learning Approach for Robust Video
Enhancements [83.5820690348833]
We present a framework for low-level vision tasks that does not require any external training data corpus.
Our approach learns neural modules by optimizing over a corrupted sequence, leveraging the weights of the coherence-temporal test and statistics internal statistics.
arXiv Detail & Related papers (2023-12-13T01:57:11Z) - From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial Expression Recognition in Videos [88.08209394979178]
Dynamic facial expression recognition (DFER) in the wild is still hindered by data limitations.
We introduce a novel Static-to-Dynamic model (S2D) that leverages existing SFER knowledge and dynamic information implicitly encoded in extracted facial landmark-aware features.
arXiv Detail & Related papers (2023-12-09T03:16:09Z) - EulerMormer: Robust Eulerian Motion Magnification via Dynamic Filtering
within Transformer [30.470336098766765]
Video Motion Magnification (VMM) aims to break the resolution limit of human visual perception capability.
This paper proposes a novel dynamic filtering strategy to achieve static-dynamic field adaptive denoising.
We demonstrate extensive experiments that EulerMormer achieves more robust video motion magnification from the Eulerian perspective.
arXiv Detail & Related papers (2023-12-07T09:10:16Z) - Hyperspectral Image Denoising via Self-Modulating Convolutional Neural
Networks [15.700048595212051]
We introduce a self-modulating convolutional neural network which utilizes correlated spectral and spatial information.
At the core of the model lies a novel block, which allows the network to transform the features in an adaptive manner based on the adjacent spectral data.
Experimental analysis on both synthetic and real data shows that the proposed SM-CNN outperforms other state-of-the-art HSI denoising methods.
arXiv Detail & Related papers (2023-09-15T06:57:43Z) - Alignment-free HDR Deghosting with Semantics Consistent Transformer [76.91669741684173]
High dynamic range imaging aims to retrieve information from multiple low-dynamic range inputs to generate realistic output.
Existing methods often focus on the spatial misalignment across input frames caused by the foreground and/or camera motion.
We propose a novel alignment-free network with a Semantics Consistent Transformer (SCTNet) with both spatial and channel attention modules.
arXiv Detail & Related papers (2023-05-29T15:03:23Z) - Intensity-Aware Loss for Dynamic Facial Expression Recognition in the
Wild [1.8604727699812171]
Video sequences often contain frames with different expression intensities, especially for the facial expressions in the real-world scenarios.
We propose the global convolution-attention block (GCA) to rescale the channels of the feature maps.
In addition, we introduce the intensity-aware loss (IAL) in the training process to help the network distinguish the samples with relatively low expression intensities.
arXiv Detail & Related papers (2022-08-19T12:48:07Z) - Dynamic Slimmable Denoising Network [64.77565006158895]
Dynamic slimmable denoising network (DDSNet) is a general method to achieve good denoising quality with less computational complexity.
OurNet is empowered with the ability of dynamic inference by a dynamic gate.
Our experiments demonstrate our-Net consistently outperforms the state-of-the-art individually trained static denoising networks.
arXiv Detail & Related papers (2021-10-17T22:45:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.