Beyond Euclidean: Dual-Space Representation Learning for Weakly Supervised Video Violence Detection
- URL: http://arxiv.org/abs/2409.19252v1
- Date: Sat, 28 Sep 2024 05:54:20 GMT
- Title: Beyond Euclidean: Dual-Space Representation Learning for Weakly Supervised Video Violence Detection
- Authors: Jiaxu Leng, Zhanjie Wu, Mingpi Tan, Yiran Liu, Ji Gan, Haosheng Chen, Xinbo Gao,
- Abstract summary: We develop a novel Dual-Space Representation Learning (DSRL) method for weakly supervised Video Violence Detection (VVD)
Our method captures the visual features of events while also exploring the intrinsic relations between events, thereby enhancing the discriminative capacity of the features.
- Score: 41.37736889402566
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While numerous Video Violence Detection (VVD) methods have focused on representation learning in Euclidean space, they struggle to learn sufficiently discriminative features, leading to weaknesses in recognizing normal events that are visually similar to violent events (\emph{i.e.}, ambiguous violence). In contrast, hyperbolic representation learning, renowned for its ability to model hierarchical and complex relationships between events, has the potential to amplify the discrimination between visually similar events. Inspired by these, we develop a novel Dual-Space Representation Learning (DSRL) method for weakly supervised VVD to utilize the strength of both Euclidean and hyperbolic geometries, capturing the visual features of events while also exploring the intrinsic relations between events, thereby enhancing the discriminative capacity of the features. DSRL employs a novel information aggregation strategy to progressively learn event context in hyperbolic spaces, which selects aggregation nodes through layer-sensitive hyperbolic association degrees constrained by hyperbolic Dirichlet energy. Furthermore, DSRL attempts to break the cyber-balkanization of different spaces, utilizing cross-space attention to facilitate information interactions between Euclidean and hyperbolic space to capture better discriminative features for final violence detection. Comprehensive experiments demonstrate the effectiveness of our proposed DSRL.
Related papers
- CODEX: A Cluster-Based Method for Explainable Reinforcement Learning [0.0]
We present a method that incorporates semantic clustering, which can effectively summarize RL agent behavior in the state-action space.
Experiments on the MiniGrid and StarCraft II gaming environments reveal the semantic clusters retain temporal as well as entity information.
arXiv Detail & Related papers (2023-12-07T11:04:37Z) - Hyperbolic vs Euclidean Embeddings in Few-Shot Learning: Two Sides of
the Same Coin [49.12496652756007]
We show that the best few-shot results are attained for hyperbolic embeddings at a common hyperbolic radius.
In contrast to prior benchmark results, we demonstrate that better performance can be achieved by a fixed-radius encoder equipped with the Euclidean metric.
arXiv Detail & Related papers (2023-09-18T14:51:46Z) - Hyperbolic Face Anti-Spoofing [21.981129022417306]
We propose to learn richer hierarchical and discriminative spoofing cues in hyperbolic space.
For unimodal FAS learning, the feature embeddings are projected into the Poincar'e ball, and then the hyperbolic binary logistic regression layer is cascaded for classification.
To alleviate the vanishing gradient problem in hyperbolic space, a new feature clipping method is proposed to enhance the training stability of hyperbolic models.
arXiv Detail & Related papers (2023-08-17T17:18:21Z) - Learning Weakly Supervised Audio-Visual Violence Detection in Hyperbolic
Space [17.30264225835736]
HyperVD is a novel framework that learns snippet embeddings in hyperbolic space to improve model discrimination.
Our framework comprises a detour fusion module for multimodal fusion.
By learning snippet representations in this space, the framework effectively learns semantic discrepancies between violent and normal events.
arXiv Detail & Related papers (2023-05-30T07:18:56Z) - Hyperbolic Contrastive Learning [12.170564544949308]
We propose a novel contrastive learning framework to learn semantic relationships in the hyperbolic space.
We show that our proposed method achieves better results on self-supervised pretraining, supervised classification, and higher robust accuracy than baseline methods.
arXiv Detail & Related papers (2023-02-02T20:47:45Z) - Enhancing Hyperbolic Graph Embeddings via Contrastive Learning [7.901082408569372]
We propose a novel Hyperbolic Graph Contrastive Learning (HGCL) framework which learns node representations through multiple hyperbolic spaces.
Experimental results on multiple real-world datasets demonstrate the superiority of the proposed HGCL.
arXiv Detail & Related papers (2022-01-21T06:10:05Z) - Dual Contrastive Learning for General Face Forgery Detection [64.41970626226221]
We propose a novel face forgery detection framework, named Dual Contrastive Learning (DCL), which constructs positive and negative paired data.
To explore the essential discrepancies, Intra-Instance Contrastive Learning (Intra-ICL) is introduced to focus on the local content inconsistencies prevalent in the forged faces.
arXiv Detail & Related papers (2021-12-27T05:44:40Z) - Deep Clustering by Semantic Contrastive Learning [67.28140787010447]
We introduce a novel variant called Semantic Contrastive Learning (SCL)
It explores the characteristics of both conventional contrastive learning and deep clustering.
It can amplify the strengths of contrastive learning and deep clustering in a unified approach.
arXiv Detail & Related papers (2021-03-03T20:20:48Z) - Heterogeneous Contrastive Learning: Encoding Spatial Information for
Compact Visual Representations [183.03278932562438]
This paper presents an effective approach that adds spatial information to the encoding stage to alleviate the learning inconsistency between the contrastive objective and strong data augmentation operations.
We show that our approach achieves higher efficiency in visual representations and thus delivers a key message to inspire the future research of self-supervised visual representation learning.
arXiv Detail & Related papers (2020-11-19T16:26:25Z) - Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person
Re-Identification [208.1227090864602]
Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality pedestrian retrieval problem.
Existing VI-ReID methods tend to learn global representations, which have limited discriminability and weak robustness to noisy images.
We propose a novel dynamic dual-attentive aggregation (DDAG) learning method by mining both intra-modality part-level and cross-modality graph-level contextual cues for VI-ReID.
arXiv Detail & Related papers (2020-07-18T03:08:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.