Related papers: PANDA: Towards Generalist Video Anomaly Detection via Agentic AI Engineer

PANDA: Towards Generalist Video Anomaly Detection via Agentic AI Engineer

URL: http://arxiv.org/abs/2509.26386v2
Date: Tue, 28 Oct 2025 14:48:45 GMT
Title: PANDA: Towards Generalist Video Anomaly Detection via Agentic AI Engineer
Authors: Zhiwei Yang, Chen Gao, Mike Zheng Shou,
Abstract summary: Video anomaly detection (VAD) is a critical yet challenging task due to the complex and diverse nature of real-world scenarios.<n>Previous methods rely on domain-specific training data and manual adjustments when applying to new scenarios and unseen anomaly types.<n>In this work, we propose PANDA, an agentic AI engineer based on MLLMs.
Score: 54.06481630066739
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video anomaly detection (VAD) is a critical yet challenging task due to the complex and diverse nature of real-world scenarios. Previous methods typically rely on domain-specific training data and manual adjustments when applying to new scenarios and unseen anomaly types, suffering from high labor costs and limited generalization. Therefore, we aim to achieve generalist VAD, \ie, automatically handle any scene and any anomaly types without training data or human involvement. In this work, we propose PANDA, an agentic AI engineer based on MLLMs. Specifically, we achieve PANDA by comprehensively devising four key capabilities: (1) self-adaptive scene-aware strategy planning, (2) goal-driven heuristic reasoning, (3) tool-augmented self-reflection, and (4) self-improving chain-of-memory. Concretely, we develop a self-adaptive scene-aware RAG mechanism, enabling PANDA to retrieve anomaly-specific knowledge for anomaly detection strategy planning. Next, we introduce a latent anomaly-guided heuristic prompt strategy to enhance reasoning precision. Furthermore, PANDA employs a progressive reflection mechanism alongside a suite of context-aware tools to iteratively refine decision-making in complex scenarios. Finally, a chain-of-memory mechanism enables PANDA to leverage historical experiences for continual performance improvement. Extensive experiments demonstrate that PANDA achieves state-of-the-art performance in multi-scenario, open-set, and complex scenario settings without training and manual involvement, validating its generalizable and robust anomaly detection capability. Code is released at https://github.com/showlab/PANDA.

Related papers

AnyPos: Automated Task-Agnostic Actions for Bimanual Manipulation [24.199522837278128]
We present a new notion of task-agnostic action paradigm that decouples action execution from task-specific conditioning.<n>ATARA is a scalable self-supervised framework that accelerates collection by over $ 30times $ compared to human teleoperation.<n>We propose AnyPos, an inverse dynamics model equipped with Arm-Decoupled Estimation and a Direction-Aware Decoder.
arXiv Detail & Related papers (2025-07-17T03:48:57Z)
MemoryOut: Learning Principal Features via Multimodal Sparse Filtering Network for Semi-supervised Video Anomaly Detection [30.470777079947958]
Video Anomaly Detection (VAD) methods based on reconstruction or prediction face two critical challenges.<n>Strong generalization capability often results in accurate reconstruction or prediction of abnormal events.<n>reliance only on low-level appearance and motion cues limits their ability to identify high-level semantic in abnormal events from complex scenes.
arXiv Detail & Related papers (2025-06-03T07:14:57Z)
AssistPDA: An Online Video Surveillance Assistant for Video Anomaly Prediction, Detection, and Analysis [52.261173507177396]
We introduce AssistPDA, the first online video anomaly surveillance assistant (VAPDA) that unifies anomaly prediction, detection, and analysis (VAPDA) within a single framework.<n> AssistPDA enables real-time inference on streaming videos while supporting interactive user engagement.<n>We also introduce a novel event-level anomaly prediction task, enabling proactive anomaly forecasting before anomalies fully unfold.
arXiv Detail & Related papers (2025-03-27T18:30:47Z)
EIAD: Explainable Industrial Anomaly Detection Via Multi-Modal Large Language Models [23.898938659720503]
Industrial Anomaly Detection (IAD) is critical to ensure product quality during manufacturing.<n>We propose a novel approach that introduces a dedicated multi-modal defect localization module to decouple the dialog functionality from the core feature extraction.<n>We also contribute to the first multi-modal industrial anomaly detection training dataset, named Defect Detection Question Answering (DDQA)
arXiv Detail & Related papers (2025-03-18T11:33:29Z)
AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM [2.5988879420706095]
Video anomaly detection (VAD) is crucial for video analysis and surveillance in computer vision.<n>Existing VAD models rely on learned normal patterns, which makes them difficult to apply to diverse environments.<n>This study proposes customizable video anomaly detection (C-VAD) technique and the AnyAnomaly model.
arXiv Detail & Related papers (2025-03-06T14:52:34Z)
Adaptive Meta-Learning for Robust Deepfake Detection: A Multi-Agent Framework to Data Drift and Model Generalization [6.589206192038365]
This paper proposes an adversarial meta-learning algorithm using task-specific adaptive sample synthesis and consistency regularization. It boosts both robustness and generalization of the model. Experimental results demonstrate the model's consistent performance across various datasets, outperforming the models in comparison.
arXiv Detail & Related papers (2024-11-12T19:55:07Z)
AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios. We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z)
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment. We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent. We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z)
Learning Task Automata for Reinforcement Learning using Hidden Markov Models [37.69303106863453]
This paper proposes a novel pipeline for learning non-Markovian task specifications as succinct finite-state task automata' We learn a product MDP, a model composed of the specification's automaton and the environment's MDP, by treating the product MDP as a partially observable MDP and using the well-known Baum-Welch algorithm for learning hidden Markov models. Our learnt task automaton enables the decomposition of a task into its constituent sub-tasks, which improves the rate at which an RL agent can later synthesise an optimal policy.
arXiv Detail & Related papers (2022-08-25T02:58:23Z)
MIST: Multiple Instance Self-Training Framework for Video Anomaly Detection [76.80153360498797]
We develop a multiple instance self-training framework (MIST) to efficiently refine task-specific discriminative representations. MIST is composed of 1) a multiple instance pseudo label generator, which adapts a sparse continuous sampling strategy to produce more reliable clip-level pseudo labels, and 2) a self-guided attention boosted feature encoder. Our method performs comparably to or even better than existing supervised and weakly supervised methods, specifically obtaining a frame-level AUC 94.83% on ShanghaiTech.
arXiv Detail & Related papers (2021-04-04T15:47:14Z)
Anomaly Detection Based on Selection and Weighting in Latent Space [73.01328671569759]
We propose a novel selection-and-weighting-based anomaly detection framework called SWAD. Experiments on both benchmark and real-world datasets have shown the effectiveness and superiority of SWAD.
arXiv Detail & Related papers (2021-03-08T10:56:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.