Weakly-supervised Micro- and Macro-expression Spotting Based on
Multi-level Consistency
- URL: http://arxiv.org/abs/2305.02734v2
- Date: Mon, 30 Oct 2023 14:07:44 GMT
- Title: Weakly-supervised Micro- and Macro-expression Spotting Based on
Multi-level Consistency
- Authors: Wang-Wang Yu, Kai-Fu Yang, Hong-Mei Yan, Yong-Jie Li
- Abstract summary: Weakly-supervised expression spotting (WES) based on video-level labels can potentially mitigate the complexity of frame-level annotation.
We propose a novel and simple WES framework, MC-WES, using multi-consistency collaborative mechanisms.
We show that MC-WES is comparable to state-of-the-art fully-supervised methods.
- Score: 22.7160073059238
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most micro- and macro-expression spotting methods in untrimmed videos suffer
from the burden of video-wise collection and frame-wise annotation.
Weakly-supervised expression spotting (WES) based on video-level labels can
potentially mitigate the complexity of frame-level annotation while achieving
fine-grained frame-level spotting. However, we argue that existing
weakly-supervised methods are based on multiple instance learning (MIL)
involving inter-modality, inter-sample, and inter-task gaps. The inter-sample
gap is primarily from the sample distribution and duration. Therefore, we
propose a novel and simple WES framework, MC-WES, using multi-consistency
collaborative mechanisms that include modal-level saliency, video-level
distribution, label-level duration and segment-level feature consistency
strategies to implement fine frame-level spotting with only video-level labels
to alleviate the above gaps and merge prior knowledge. The modal-level saliency
consistency strategy focuses on capturing key correlations between raw images
and optical flow. The video-level distribution consistency strategy utilizes
the difference of sparsity in temporal distribution. The label-level duration
consistency strategy exploits the difference in the duration of facial muscles.
The segment-level feature consistency strategy emphasizes that features under
the same labels maintain similarity. Experimental results on three challenging
datasets -- CAS(ME)$^2$, CAS(ME)$^3$, and SAMM-LV -- demonstrate that MC-WES is
comparable to state-of-the-art fully-supervised methods.
Related papers
- Multimodal Alignment with Cross-Attentive GRUs for Fine-Grained Video Understanding [0.0]
We propose a framework that fuses video, image, and textcoding using GRU-based sequence encoders and cross-modal attention mechanisms.<n>Our results demonstrate that the proposed fusion strategy significantly outperforms unimodal baselines.
arXiv Detail & Related papers (2025-07-04T12:35:52Z) - Revisiting Self-Supervised Heterogeneous Graph Learning from Spectral Clustering Perspective [52.662463893268225]
Self-supervised heterogeneous graph learning (SHGL) has shown promising potential in diverse scenarios.
Existing SHGL methods encounter two significant limitations.
We introduce a novel framework enhanced by rank and dual consistency constraints.
arXiv Detail & Related papers (2024-12-01T09:33:20Z) - Low-Light Video Enhancement via Spatial-Temporal Consistent Decomposition [52.89441679581216]
Low-Light Video Enhancement (LLVE) seeks to restore dynamic or static scenes plagued by severe invisibility and noise.<n>We present an innovative video decomposition strategy that incorporates view-independent and view-dependent components.<n>Our framework consistently outperforms existing methods, establishing a new SOTA performance.
arXiv Detail & Related papers (2024-05-24T15:56:40Z) - Weak Supervision with Arbitrary Single Frame for Micro- and Macro-expression Spotting [22.04975008531069]
We propose a point-level weakly-supervised expression spotting framework, where each expression requires to be annotated with only one random frame (i.e., a point)
We show MPLG generates more reliable pseudo labels by merging class-specific probabilities, attention scores, fused features, and point-level labels.
Experiments on the CAS(ME)2, CAS(ME)3, and SAMM-LV datasets demonstrate PWES achieves promising performance comparable to that of recent fully-supervised methods.
arXiv Detail & Related papers (2024-03-21T09:01:21Z) - Exploring Homogeneous and Heterogeneous Consistent Label Associations
for Unsupervised Visible-Infrared Person ReID [62.81466902601807]
Unsupervised visible-infrared person re-identification (USL-VI-ReID) aims to retrieve pedestrian images of the same identity from different modalities without annotations.
We introduce a Modality-Unified Label Transfer (MULT) module that simultaneously accounts for both homogeneous and heterogeneous fine-grained instance-level structures.
It models both homogeneous and heterogeneous affinities, leveraging them to define the inconsistency for the pseudo-labels and then minimize it, leading to pseudo-labels that maintain alignment across modalities and consistency within intra-modality structures.
arXiv Detail & Related papers (2024-02-01T15:33:17Z) - SMC-NCA: Semantic-guided Multi-level Contrast for Semi-supervised Temporal Action Segmentation [53.010417880335424]
Semi-supervised temporal action segmentation (SS-TA) aims to perform frame-wise classification in long untrimmed videos.
Recent studies have shown the potential of contrastive learning in unsupervised representation learning using unlabelled data.
We propose a novel Semantic-guided Multi-level Contrast scheme with a Neighbourhood-Consistency-Aware unit (SMC-NCA) to extract strong frame-wise representations.
arXiv Detail & Related papers (2023-12-19T17:26:44Z) - Efficient Bilateral Cross-Modality Cluster Matching for Unsupervised Visible-Infrared Person ReID [56.573905143954015]
We propose a novel bilateral cluster matching-based learning framework to reduce the modality gap by matching cross-modality clusters.
Under such a supervisory signal, a Modality-Specific and Modality-Agnostic (MSMA) contrastive learning framework is proposed to align features jointly at a cluster-level.
Experiments on the public SYSU-MM01 and RegDB datasets demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2023-05-22T03:27:46Z) - Weakly-supervised Action Localization via Hierarchical Mining [76.00021423700497]
Weakly-supervised action localization aims to localize and classify action instances in the given videos temporally with only video-level categorical labels.
We propose a hierarchical mining strategy under video-level and snippet-level manners, i.e., hierarchical supervision and hierarchical consistency mining.
We show that HiM-Net outperforms existing methods on THUMOS14 and ActivityNet1.3 datasets with large margins by hierarchically mining the supervision and consistency.
arXiv Detail & Related papers (2022-06-22T12:19:09Z) - Learning Self-Supervised Low-Rank Network for Single-Stage Weakly and
Semi-Supervised Semantic Segmentation [119.009033745244]
This paper presents a Self-supervised Low-Rank Network ( SLRNet) for single-stage weakly supervised semantic segmentation (WSSS) and semi-supervised semantic segmentation (SSSS)
SLRNet uses cross-view self-supervision, that is, it simultaneously predicts several attentive LR representations from different views of an image to learn precise pseudo-labels.
Experiments on the Pascal VOC 2012, COCO, and L2ID datasets demonstrate that our SLRNet outperforms both state-of-the-art WSSS and SSSS methods with a variety of different settings.
arXiv Detail & Related papers (2022-03-19T09:19:55Z) - HiCLRE: A Hierarchical Contrastive Learning Framework for Distantly
Supervised Relation Extraction [24.853265244512954]
We propose a hierarchical contrastive learning Framework for DistantlySupervised relation extraction (HiCLRE) to reduce noisy sentences.
Specifically, we propose a three-level hierarchical learning framework to interact with cross levels, generating the de-noising context-aware representations.
Experiments demonstrate that HiCLRE significantly outperforms strong baselines in various mainstream DSRE datasets.
arXiv Detail & Related papers (2022-02-27T12:48:26Z) - Semi-supervised Semantic Segmentation with Directional Context-aware
Consistency [66.49995436833667]
We focus on the semi-supervised segmentation problem where only a small set of labeled data is provided with a much larger collection of totally unlabeled images.
A preferred high-level representation should capture the contextual information while not losing self-awareness.
We present the Directional Contrastive Loss (DC Loss) to accomplish the consistency in a pixel-to-pixel manner.
arXiv Detail & Related papers (2021-06-27T03:42:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.