SKD-TSTSAN: Three-Stream Temporal-Shift Attention Network Based on Self-Knowledge Distillation for Micro-Expression Recognition
- URL: http://arxiv.org/abs/2406.17538v1
- Date: Tue, 25 Jun 2024 13:22:22 GMT
- Title: SKD-TSTSAN: Three-Stream Temporal-Shift Attention Network Based on Self-Knowledge Distillation for Micro-Expression Recognition
- Authors: Guanghao Zhu, Lin Liu, Yuhao Hu, Haixin Sun, Fang Liu, Xiaohui Du, Ruqian Hao, Juanxiu Liu, Yong Liu, Hao Deng, Jing Zhang,
- Abstract summary: Micro-expression recognition (MER) is crucial in many fields, including criminal analysis and psychotherapy.
A three-stream temporal-shift attention network based on self-knowledge distillation (SKD-TSTSAN) is proposed in this paper.
- Score: 21.675660978188617
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Micro-expressions (MEs) are subtle facial movements that occur spontaneously when people try to conceal the real emotions. Micro-expression recognition (MER) is crucial in many fields, including criminal analysis and psychotherapy. However, MER is challenging since MEs have low intensity and ME datasets are small in size. To this end, a three-stream temporal-shift attention network based on self-knowledge distillation (SKD-TSTSAN) is proposed in this paper. Firstly, to address the low intensity of ME muscle movements, we utilize learning-based motion magnification modules to enhance the intensity of ME muscle movements. Secondly, we employ efficient channel attention (ECA) modules in the local-spatial stream to make the network focus on facial regions that are highly relevant to MEs. In addition, temporal shift modules (TSMs) are used in the dynamic-temporal stream, which enables temporal modeling with no additional parameters by mixing ME motion information from two different temporal domains. Furthermore, we introduce self-knowledge distillation (SKD) into the MER task by introducing auxiliary classifiers and using the deepest section of the network for supervision, encouraging all blocks to fully explore the features of the training set. Finally, extensive experiments are conducted on four ME datasets: CASME II, SAMM, MMEW, and CAS(ME)3. The experimental results demonstrate that our SKD-TSTSAN outperforms other existing methods and achieves new state-of-the-art performance. Our code will be available at https://github.com/GuanghaoZhu663/SKD-TSTSAN.
Related papers
- Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition [48.21696443824074]
We propose a novel framework for micro-expression recognition, named the Adaptive Temporal Motion Guided Graph Convolution Network (ATM-GCN)
Our framework excels at capturing temporal dependencies between frames across the entire clip, thereby enhancing micro-expression recognition at the clip level.
arXiv Detail & Related papers (2024-06-13T10:57:24Z) - Masked Motion Predictors are Strong 3D Action Representation Learners [143.9677635274393]
In 3D human action recognition, limited supervised data makes it challenging to fully tap into the modeling potential of powerful networks such as transformers.
We show that instead of following the prevalent pretext to perform masked self-component reconstruction in human joints, explicit contextual motion modeling is key to the success of learning effective feature representation for 3D action recognition.
arXiv Detail & Related papers (2023-08-14T11:56:39Z) - Video-based Facial Micro-Expression Analysis: A Survey of Datasets,
Features and Algorithms [52.58031087639394]
micro-expressions are involuntary and transient facial expressions.
They can provide important information in a broad range of applications such as lie detection, criminal detection, etc.
Since micro-expressions are transient and of low intensity, their detection and recognition is difficult and relies heavily on expert experiences.
arXiv Detail & Related papers (2022-01-30T05:14:13Z) - MMNet: Muscle motion-guided network for micro-expression recognition [2.032432845751978]
We propose a robust micro-expression recognition framework, namely muscle motion-guided network (MMNet)
Specifically, a continuous attention (CA) block is introduced to focus on modeling local subtle muscle motion patterns with little identity information.
Our approach outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2022-01-14T04:05:49Z) - Short and Long Range Relation Based Spatio-Temporal Transformer for
Micro-Expression Recognition [61.374467942519374]
We propose a novel a-temporal transformer architecture -- to the best of our knowledge, the first purely transformer based approach for micro-expression recognition.
The architecture comprises a spatial encoder which learns spatial patterns, a temporal dimension classification for temporal analysis, and a head.
A comprehensive evaluation on three widely used spontaneous micro-expression data sets, shows that the proposed approach consistently outperforms the state of the art.
arXiv Detail & Related papers (2021-12-10T22:10:31Z) - Progressive Spatio-Temporal Bilinear Network with Monte Carlo Dropout
for Landmark-based Facial Expression Recognition with Uncertainty Estimation [93.73198973454944]
The performance of our method is evaluated on three widely used datasets.
It is comparable to that of video-based state-of-the-art methods while it has much less complexity.
arXiv Detail & Related papers (2021-06-08T13:40:30Z) - MERANet: Facial Micro-Expression Recognition using 3D Residual Attention
Network [14.285700243381537]
We propose a facial-expression recognition model using 3D attention called MERANet.
The proposed model also encompasses both spatial and temporal information.
A superior performance is observed as compared to the state-of-the-art for facial micro-expression recognition.
arXiv Detail & Related papers (2020-12-07T16:41:42Z) - Multi-Temporal Convolutions for Human Action Recognition in Videos [83.43682368129072]
We present a novel temporal-temporal convolution block that is capable of extracting at multiple resolutions.
The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture.
arXiv Detail & Related papers (2020-11-08T10:40:26Z) - SMA-STN: Segmented Movement-Attending Spatiotemporal Network
forMicro-Expression Recognition [20.166205708651194]
This paper proposes a segmented movement-attending network (SMA-STN) to reveal subtle movement changes visually in an efficient way.
Extensive experiments on three widely used benchmarks, i.e., CALoss II, SAMM, and SHIC, show that the proposed SMA-STN achieves better MER performance than other state-of-the-art methods.
arXiv Detail & Related papers (2020-10-19T09:23:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.