Related papers: ProDisc-VAD: An Efficient System for Weakly-Supervised Anomaly Detection in Video Surveillance Applications

ProDisc-VAD: An Efficient System for Weakly-Supervised Anomaly Detection in Video Surveillance Applications

URL: http://arxiv.org/abs/2505.02179v3
Date: Thu, 17 Jul 2025 08:31:43 GMT
Title: ProDisc-VAD: An Efficient System for Weakly-Supervised Anomaly Detection in Video Surveillance Applications
Authors: Tao Zhu, Qi Yu, Xinru Dong, Shiyu Li, Yue Liu, Jinlong Jiang, Lei Shu,
Abstract summary: We propose ProDisc-VAD, an efficient framework tackling this via two synergistic components.<n>ProDisc-VAD achieves strong AUCs (97.98% ShanghaiTech, 87.12% UCF-Crime) using only 0.4M parameters, over 800x fewer than recent ViT-based methods like VadCLIP.
Score: 25.39536098488568
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Weakly-supervised video anomaly detection (WS-VAD) using Multiple Instance Learning (MIL) suffers from label ambiguity, hindering discriminative feature learning. We propose ProDisc-VAD, an efficient framework tackling this via two synergistic components. The Prototype Interaction Layer (PIL) provides controlled normality modeling using a small set of learnable prototypes, establishing a robust baseline without being overwhelmed by dominant normal data. The Pseudo-Instance Discriminative Enhancement (PIDE) loss boosts separability by applying targeted contrastive learning exclusively to the most reliable extreme-scoring instances (highest/lowest scores). ProDisc-VAD achieves strong AUCs (97.98% ShanghaiTech, 87.12% UCF-Crime) using only 0.4M parameters, over 800x fewer than recent ViT-based methods like VadCLIP. Code is available at https://github.com/modadundun/ProDisc-VAD.

Related papers

ProtoDCS: Towards Robust and Efficient Open-Set Test-Time Adaptation for Vision-Language Models [32.840734752367275]
Prototype-based Double-Check Separation (ProtoDCS) is a robust framework for OSTTA.<n>It separates csID and csOOD samples, enabling safe and efficient adaptation of Vision-Language Models to csID data.<n>ProtoDCS significantly boosts both known-class accuracy and OOD detection metrics.
arXiv Detail & Related papers (2026-02-27T03:39:02Z)
PRISM: Performer RS-IMLE for Single-pass Multisensory Imitation Learning [51.24484551729328]
We introduce PRISM, a single-pass policy based on a batch-global rejection-sampling variant of IMLE.<n> PRISM couples a temporal multisensory encoder with a linear-attention generator using a Performer architecture.<n>We demonstrate the efficacy of PRISM on a diverse real-world hardware suite, including loco-manipulation using a Unitree Go2 with a 7-DoF arm D1 and tabletop manipulation with a UR5 manipulator.
arXiv Detail & Related papers (2026-02-02T17:57:37Z)
Cyberattack Detection in Virtualized Microgrids Using LightGBM and Knowledge-Distilled Classifiers [0.0]
A complete virtual microgrid was designed and implemented in MG/Simulink.<n>A structured cyberattack framework was developed using MGLib to inject adversarial signals into secondary control pathways.<n>The results confirm that lightweight machine learning based intrusion detection methods can provide fast, accurate, and efficient cyberattack detection.
arXiv Detail & Related papers (2026-01-07T01:23:13Z)
Effective and Efficient Adversarial Detection for Vision-Language Models via A Single Vector [97.92369017531038]
We build a new laRge-scale Adervsarial images dataset with Diverse hArmful Responses (RADAR) We then develop a novel iN-time Embedding-based AdveRSarial Image DEtection (NEARSIDE) method, which exploits a single vector that distilled from the hidden states of Visual Language Models (VLMs) to achieve the detection of adversarial images against benign ones in the input.
arXiv Detail & Related papers (2024-10-30T10:33:10Z)
A Lightweight Dual-Branch System for Weakly-Supervised Video Anomaly Detection on Consumer Edge Devices [1.8274323268621635]
Rule-based Video Anomaly Detection (RuleVAD) is a novel, lightweight system engineered for high-efficiency and low-complexity threat detection on consumer hardware.<n>An implicit branch uses visual features for rapid, coarse-grained binary classification, efficiently filtering out normal activity to avoid unnecessary processing.<n>A multimodal explicit branch takes over, applying data mining to generate interpretable, text-based association rules from the scene.
arXiv Detail & Related papers (2024-10-29T12:22:07Z)
OpenVLA: An Open-Source Vision-Language-Action Model [131.74098076670103]
We introduce OpenVLA, an open-source VLA trained on a diverse collection of 970k real-world robot demonstrations. OpenVLA shows strong results for generalist manipulation, outperforming closed models such as RT-2-X (55B) by 16.5% in absolute task success rate. We release model checkpoints, fine-tuning notebooks, and our PyTorch with built-in support for training VLAs at scale on Open X-Embodiment datasets.
arXiv Detail & Related papers (2024-06-13T15:46:55Z)
Boost UAV-based Ojbect Detection via Scale-Invariant Feature Disentanglement and Adversarial Learning [18.11107031800982]
We propose to improve single-stage inference accuracy through learning scale-invariant features.<n>Our approach can effectively improve model accuracy and achieve state-of-the-art (SoTA) performance on two datasets.
arXiv Detail & Related papers (2024-05-24T11:40:22Z)
A Lightweight Multi-Attack CAN Intrusion Detection System on Hybrid FPGAs [13.581341206178525]
Intrusion detection and mitigation approaches have shown promising results in detecting multiple attack vectors in Controller Area Network (CAN) We present a lightweight multi-attack quantised machine learning model that is deployed using Xilinx's Deep Learning Processing Unit IP on a Zynq Ultrascale+ (XCZU3EG) FPGA. The model detects denial of service and fuzzing attacks with an accuracy of above 99 % and a false positive rate of 0.07%, which are comparable to the state-of-the-art techniques in the literature.
arXiv Detail & Related papers (2024-01-19T13:39:05Z)
A Mixture of Exemplars Approach for Efficient Out-of-Distribution Detection with Foundation Models [0.0]
This paper presents an efficient approach to tackling OOD detection that is designed to maximise the benefit of training with a high quality, frozen, pretrained foundation model.<n>MoLAR provides strong OOD performance when only comparing the similarity of OOD examples to the exemplars, a small set of images chosen to be representative of the dataset.
arXiv Detail & Related papers (2023-11-28T06:12:28Z)
VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection [58.47940430618352]
We propose VadCLIP, a new paradigm for weakly supervised video anomaly detection (WSVAD) VadCLIP makes full use of fine-grained associations between vision and language on the strength of CLIP. We conduct extensive experiments on two commonly-used benchmarks, demonstrating that VadCLIP achieves the best performance on both coarse-grained and fine-grained WSVAD.
arXiv Detail & Related papers (2023-08-22T14:58:36Z)
Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF) It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model. We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z)
Anomaly Detection via Multi-Scale Contrasted Memory [3.0170109896527086]
We introduce a new two-stage anomaly detector which memorizes during training multi-scale normal prototypes to compute an anomaly deviation score. Our model highly improves the state-of-the-art performance on a wide range of object, style and local anomalies with up to 35% error relative improvement on CIFAR-10.
arXiv Detail & Related papers (2022-11-16T16:58:04Z)
Towards Better Object Detection in Scale Variation with Adaptive Feature Selection [3.5352273012717044]
We propose a novel adaptive feature selection module (AFSM) to automatically learn the way to fuse multi-level representations in the channel dimension. It significantly improves the performance of the detectors that have a feature pyramid structure. A class-aware sampling mechanism (CASM) is proposed to tackle the class imbalance problem.
arXiv Detail & Related papers (2020-12-06T13:41:20Z)
Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos [82.02074241700728]
In this paper, we present a prohibitive-level action recognition model that is trained with only video-frame labels. Our method per person detectors have been trained on large image datasets within Multiple Instance Learning framework. We show how we can apply our method in cases where the standard Multiple Instance Learning assumption, that each bag contains at least one instance with the specified label, is invalid.
arXiv Detail & Related papers (2020-07-21T10:45:05Z)
SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier Detection [63.253850875265115]
Outlier detection (OD) is a key machine learning (ML) task for identifying abnormal objects from general samples. We propose a modular acceleration system, called SUOD, to address it.
arXiv Detail & Related papers (2020-03-11T00:22:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.