Bayesian Nonparametric Submodular Video Partition for Robust Anomaly
Detection
- URL: http://arxiv.org/abs/2203.12840v1
- Date: Thu, 24 Mar 2022 04:00:49 GMT
- Title: Bayesian Nonparametric Submodular Video Partition for Robust Anomaly
Detection
- Authors: Hitesh Sapkota, Qi Yu
- Abstract summary: Multiple-instance learning (MIL) provides an effective way to tackle the video anomaly detection problem.
We propose to conduct novel Bayesian non-parametric submodular video partition (BN-SVP) to significantly improve MIL model training.
Our theoretical analysis ensures a strong performance guarantee of the proposed algorithm.
- Score: 9.145168943972067
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multiple-instance learning (MIL) provides an effective way to tackle the
video anomaly detection problem by modeling it as a weakly supervised problem
as the labels are usually only available at the video level while missing for
frames due to expensive labeling cost. We propose to conduct novel Bayesian
non-parametric submodular video partition (BN-SVP) to significantly improve MIL
model training that can offer a highly reliable solution for robust anomaly
detection in practical settings that include outlier segments or multiple types
of abnormal events. BN-SVP essentially performs dynamic non-parametric
hierarchical clustering with an enhanced self-transition that groups segments
in a video into temporally consistent and semantically coherent hidden states
that can be naturally interpreted as scenes. Each segment is assumed to be
generated through a non-parametric mixture process that allows variations of
segments within the same scenes to accommodate the dynamic and noisy nature of
many real-world surveillance videos. The scene and mixture component assignment
of BN-SVP also induces a pairwise similarity among segments, resulting in
non-parametric construction of a submodular set function. Integrating this
function with an MIL loss effectively exposes the model to a diverse set of
potentially positive instances to improve its training. A greedy algorithm is
developed to optimize the submodular function and support efficient model
training. Our theoretical analysis ensures a strong performance guarantee of
the proposed algorithm. The effectiveness of the proposed approach is
demonstrated over multiple real-world anomaly video datasets with robust
detection performance.
Related papers
- Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Video Semantic Segmentation with Inter-Frame Feature Fusion and
Inner-Frame Feature Refinement [39.06589186472675]
We propose a spatial-temporal fusion (STF) module to model dense pairwise relationships among multi-frame features.
Besides, we propose a novel memory-augmented refinement (MAR) module to tackle difficult predictions among semantic boundaries.
arXiv Detail & Related papers (2023-01-10T07:57:05Z) - Regularity Learning via Explicit Distribution Modeling for Skeletal
Video Anomaly Detection [43.004613173363566]
A novel Motion Embedder (ME) is proposed to provide a pose motion representation from the probability perspective.
A novel task-specific Spatial-Temporal Transformer (STT) is deployed for self-supervised pose sequence reconstruction.
MoPRL achieves the state-of-the-art performance by an average improvement of 4.7% AUC on several challenging datasets.
arXiv Detail & Related papers (2021-12-07T11:52:25Z) - Dense Unsupervised Learning for Video Segmentation [49.46930315961636]
We present a novel approach to unsupervised learning for video object segmentation (VOS)
Unlike previous work, our formulation allows to learn dense feature representations directly in a fully convolutional regime.
Our approach exceeds the segmentation accuracy of previous work despite using significantly less training data and compute power.
arXiv Detail & Related papers (2021-11-11T15:15:11Z) - Reliable Shot Identification for Complex Event Detection via
Visual-Semantic Embedding [72.9370352430965]
We propose a visual-semantic guided loss method for event detection in videos.
Motivated by curriculum learning, we introduce a negative elastic regularization term to start training the classifier with instances of high reliability.
An alternative optimization algorithm is developed to solve the proposed challenging non-net regularization problem.
arXiv Detail & Related papers (2021-10-12T11:46:56Z) - EAN: Event Adaptive Network for Enhanced Action Recognition [66.81780707955852]
We propose a unified action recognition framework to investigate the dynamic nature of video content.
First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events.
Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer.
arXiv Detail & Related papers (2021-07-22T15:57:18Z) - Robust Unsupervised Video Anomaly Detection by Multi-Path Frame
Prediction [61.17654438176999]
We propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design.
Our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
arXiv Detail & Related papers (2020-11-05T11:34:12Z) - A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack
and Learning [122.49765136434353]
We present an effective method, called Hamiltonian Monte Carlo with Accumulated Momentum (HMCAM), aiming to generate a sequence of adversarial examples.
We also propose a new generative method called Contrastive Adversarial Training (CAT), which approaches equilibrium distribution of adversarial examples.
Both quantitative and qualitative analysis on several natural image datasets and practical systems have confirmed the superiority of the proposed algorithm.
arXiv Detail & Related papers (2020-10-15T16:07:26Z) - Unsupervised Learning Consensus Model for Dynamic Texture Videos
Segmentation [12.462608802359936]
We present an effective unsupervised learning consensus model for the segmentation of dynamic texture (ULCM)
In the proposed model, the set of values of the requantized local binary patterns (LBP) histogram around the pixel to be classified are used as features.
Experiments conducted on the challenging SynthDB dataset show that ULCM is significantly faster, easier to code, simple and has limited parameters.
arXiv Detail & Related papers (2020-06-29T16:40:59Z) - Disentangling Multiple Features in Video Sequences using Gaussian
Processes in Variational Autoencoders [6.461473289206789]
We introduce MGP-VAE, a variational autoencoder which uses Gaussian processes (GP) to model the latent space for the unsupervised learning of disentangled representations in video sequences.
We use fractional Brownian motions (fBM) and Brownian bridges (BB) to enforce an inter-frame correlation structure in each independent channel, and show that varying this structure enables one to capture different factors of variation in the data.
arXiv Detail & Related papers (2020-01-08T08:08:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.