Locality-aware Attention Network with Discriminative Dynamics Learning
for Weakly Supervised Anomaly Detection
- URL: http://arxiv.org/abs/2208.05636v1
- Date: Thu, 11 Aug 2022 04:27:33 GMT
- Title: Locality-aware Attention Network with Discriminative Dynamics Learning
for Weakly Supervised Anomaly Detection
- Authors: Yujiang Pu, Xiaoyu Wu
- Abstract summary: We propose a Discriminative Dynamics Learning (DDL) method with two objective functions, i.e., dynamics ranking loss and dynamics alignment loss.
A Locality-aware Attention Network (LA-Net) is constructed to capture global correlations and re-calibrate the location preference across snippets, followed by a multilayer perceptron with causal convolution to obtain anomaly scores.
- Score: 0.8883733362171035
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video anomaly detection is recently formulated as a multiple instance
learning task under weak supervision, in which each video is treated as a bag
of snippets to be determined whether contains anomalies. Previous efforts
mainly focus on the discrimination of the snippet itself without modeling the
temporal dynamics, which refers to the variation of adjacent snippets.
Therefore, we propose a Discriminative Dynamics Learning (DDL) method with two
objective functions, i.e., dynamics ranking loss and dynamics alignment loss.
The former aims to enlarge the score dynamics gap between positive and negative
bags while the latter performs temporal alignment of the feature dynamics and
score dynamics within the bag. Moreover, a Locality-aware Attention Network
(LA-Net) is constructed to capture global correlations and re-calibrate the
location preference across snippets, followed by a multilayer perceptron with
causal convolution to obtain anomaly scores. Experimental results show that our
method achieves significant improvements on two challenging benchmarks, i.e.,
UCF-Crime and XD-Violence.
Related papers
- Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations.
We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z) - Learn to Memorize and to Forget: A Continual Learning Perspective of Dynamic SLAM [17.661231232206028]
Simultaneous localization and mapping (SLAM) with implicit neural representations has received extensive attention.
We propose a novel SLAM framework for dynamic environments.
arXiv Detail & Related papers (2024-07-18T09:35:48Z) - Dynamic Distinction Learning: Adaptive Pseudo Anomalies for Video Anomaly Detection [8.957579200590985]
We introduce Dynamic Distinction Learning (DDL) for Video Anomaly Detection.
DDL combines pseudo-anomalies, dynamic anomaly weighting, and a distinction loss function to improve detection accuracy.
Our approach adapts to the variability of normal and anomalous behaviors without fixed anomaly thresholds.
arXiv Detail & Related papers (2024-04-07T15:06:48Z) - Learning Dynamics and Generalization in Reinforcement Learning [59.530058000689884]
We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training.
We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
arXiv Detail & Related papers (2022-06-05T08:49:16Z) - Fine-grained Temporal Contrastive Learning for Weakly-supervised
Temporal Action Localization [87.47977407022492]
This paper argues that learning by contextually comparing sequence-to-sequence distinctions offers an essential inductive bias in weakly-supervised action localization.
Under a differentiable dynamic programming formulation, two complementary contrastive objectives are designed, including Fine-grained Sequence Distance (FSD) contrasting and Longest Common Subsequence (LCS) contrasting.
Our method achieves state-of-the-art performance on two popular benchmarks.
arXiv Detail & Related papers (2022-03-31T05:13:50Z) - Self-Regulated Learning for Egocentric Video Activity Anticipation [147.9783215348252]
Self-Regulated Learning (SRL) aims to regulate the intermediate representation consecutively to produce representation that emphasizes the novel information in the frame of the current time-stamp.
SRL sharply outperforms existing state-of-the-art in most cases on two egocentric video datasets and two third-person video datasets.
arXiv Detail & Related papers (2021-11-23T03:29:18Z) - Instance-Level Relative Saliency Ranking with Graph Reasoning [126.09138829920627]
We present a novel unified model to segment salient instances and infer relative saliency rank order.
A novel loss function is also proposed to effectively train the saliency ranking branch.
experimental results demonstrate that our proposed model is more effective than previous methods.
arXiv Detail & Related papers (2021-07-08T13:10:42Z) - Modulating Localization and Classification for Harmonized Object
Detection [40.82723262074911]
We propose a mutual learning framework to modulate the two tasks.
In particular, the two tasks are forced to learn from each other with a novel mutual labeling strategy.
We achieve a significant performance gain over the baseline detectors on the COCO dataset.
arXiv Detail & Related papers (2021-03-16T10:36:02Z) - Hierarchically Decoupled Spatial-Temporal Contrast for Self-supervised
Video Representation Learning [6.523119805288132]
We present a novel technique for self-supervised video representation learning by: (a) decoupling the learning objective into two contrastive subtasks respectively emphasizing spatial and temporal features, and (b) performing it hierarchically to encourage multi-scale understanding.
arXiv Detail & Related papers (2020-11-23T08:05:39Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.