Multi-Scale Spatio-Temporal Graph Convolutional Network for Facial Expression Spotting
- URL: http://arxiv.org/abs/2403.15994v1
- Date: Sun, 24 Mar 2024 03:10:39 GMT
- Title: Multi-Scale Spatio-Temporal Graph Convolutional Network for Facial Expression Spotting
- Authors: Yicheng Deng, Hideaki Hayashi, Hajime Nagahara,
- Abstract summary: We propose a Multi-Scale Spatio-Temporal Graph Conal Network (SpoT-CN) for facial expression spotting.
We track both short- and long-term motion of facial muscles in compact sliding windows whose window length adapts to the temporal receptive field of the network.
This network learns both local and global features from multiple scales of facial graph structures using our proposed facial localvolution graph pooling (FLGP)
- Score: 11.978551396144532
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Facial expression spotting is a significant but challenging task in facial expression analysis. The accuracy of expression spotting is affected not only by irrelevant facial movements but also by the difficulty of perceiving subtle motions in micro-expressions. In this paper, we propose a Multi-Scale Spatio-Temporal Graph Convolutional Network (SpoT-GCN) for facial expression spotting. To extract more robust motion features, we track both short- and long-term motion of facial muscles in compact sliding windows whose window length adapts to the temporal receptive field of the network. This strategy, termed the receptive field adaptive sliding window strategy, effectively magnifies the motion features while alleviating the problem of severe head movement. The subtle motion features are then converted to a facial graph representation, whose spatio-temporal graph patterns are learned by a graph convolutional network. This network learns both local and global features from multiple scales of facial graph structures using our proposed facial local graph pooling (FLGP). Furthermore, we introduce supervised contrastive learning to enhance the discriminative capability of our model for difficult-to-classify frames. The experimental results on the SAMM-LV and CAS(ME)^2 datasets demonstrate that our method achieves state-of-the-art performance, particularly in micro-expression spotting. Ablation studies further verify the effectiveness of our proposed modules.
Related papers
- SpotFormer: Multi-Scale Spatio-Temporal Transformer for Facial Expression Spotting [11.978551396144532]
In this paper, we propose an efficient framework for facial expression spotting.
First, we propose a Sliding Window-based Multi-Resolution Optical flow (SW-MRO) feature, which calculates multi-resolution optical flow of the input sequence within compact sliding windows.
Second, we propose SpotFormer, a multi-scale-temporal Transformer that simultaneously encodes facial-temporal relationships of the SW-MRO features for accurate frame-level probability estimation.
Third, we introduce supervised contrastive learning into SpotFormer to enhance the discriminability between different types of expressions.
arXiv Detail & Related papers (2024-07-30T13:02:08Z) - Temporal Graph Representation Learning with Adaptive Augmentation
Contrastive [12.18909612212823]
Temporal graph representation learning aims to generate low-dimensional dynamic node embeddings to capture temporal information.
We propose a novel Temporal Graph representation learning with Adaptive augmentation Contrastive (TGAC) model.
Our experiments on various real networks demonstrate that the proposed model outperforms other temporal graph representation learning methods.
arXiv Detail & Related papers (2023-11-07T11:21:16Z) - Neural Point-based Volumetric Avatar: Surface-guided Neural Points for
Efficient and Photorealistic Volumetric Head Avatar [62.87222308616711]
We propose fullname (name), a method that adopts the neural point representation and the neural volume rendering process.
Specifically, the neural points are strategically constrained around the surface of the target expression via a high-resolution UV displacement map.
By design, our name is better equipped to handle topologically changing regions and thin structures while also ensuring accurate expression control when animating avatars.
arXiv Detail & Related papers (2023-07-11T03:40:10Z) - DisPositioNet: Disentangled Pose and Identity in Semantic Image
Manipulation [83.51882381294357]
DisPositioNet is a model that learns a disentangled representation for each object for the task of image manipulation using scene graphs.
Our framework enables the disentanglement of the variational latent embeddings as well as the feature representation in the graph.
arXiv Detail & Related papers (2022-11-10T11:47:37Z) - Privileged Attribution Constrained Deep Networks for Facial Expression
Recognition [31.98044070620145]
Facial Expression Recognition (FER) is crucial in many research domains because it enables machines to better understand human behaviours.
To alleviate these issues, we guide the model to concentrate on specific facial areas like the eyes, the mouth or the eyebrows.
We propose the Privileged Attribution Loss (PAL), a method that directs the attention of the model towards the most salient facial regions.
arXiv Detail & Related papers (2022-03-24T07:49:33Z) - Progressive Spatio-Temporal Bilinear Network with Monte Carlo Dropout
for Landmark-based Facial Expression Recognition with Uncertainty Estimation [93.73198973454944]
The performance of our method is evaluated on three widely used datasets.
It is comparable to that of video-based state-of-the-art methods while it has much less complexity.
arXiv Detail & Related papers (2021-06-08T13:40:30Z) - Video-based Facial Expression Recognition using Graph Convolutional
Networks [57.980827038988735]
We introduce a Graph Convolutional Network (GCN) layer into a common CNN-RNN based model for video-based facial expression recognition.
We evaluate our method on three widely-used datasets, CK+, Oulu-CASIA and MMI, and also one challenging wild dataset AFEW8.0.
arXiv Detail & Related papers (2020-10-26T07:31:51Z) - Towards Deeper Graph Neural Networks [63.46470695525957]
Graph convolutions perform neighborhood aggregation and represent one of the most important graph operations.
Several recent studies attribute this performance deterioration to the over-smoothing issue.
We propose Deep Adaptive Graph Neural Network (DAGNN) to adaptively incorporate information from large receptive fields.
arXiv Detail & Related papers (2020-07-18T01:11:14Z) - InterFaceGAN: Interpreting the Disentangled Face Representation Learned
by GANs [73.27299786083424]
We propose a framework called InterFaceGAN to interpret the disentangled face representation learned by state-of-the-art GAN models.
We first find that GANs learn various semantics in some linear subspaces of the latent space.
We then conduct a detailed study on the correlation between different semantics and manage to better disentangle them via subspace projection.
arXiv Detail & Related papers (2020-05-18T18:01:22Z) - Non-Linearities Improve OrigiNet based on Active Imaging for Micro
Expression Recognition [8.112868317921853]
We introduce an active imaging concept to segregate active changes in expressive regions of a video into a single frame.
We propose a shallow CNN network: hybrid local receptive field based augmented learning network (OrigiNet) that efficiently learns significant features of the micro-expressions in a video.
arXiv Detail & Related papers (2020-05-16T13:44:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.