Fugu-MT 論文翻訳(概要): Motion Semantics Guided Normalizing Flow for Privacy-Preserving Video Anomaly Detection

論文の概要: Motion Semantics Guided Normalizing Flow for Privacy-Preserving Video Anomaly Detection

arxiv url: http://arxiv.org/abs/2603.26745v1
Date: Mon, 23 Mar 2026 08:45:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-31 23:18:44.565044
Title: Motion Semantics Guided Normalizing Flow for Privacy-Preserving Video Anomaly Detection
Title（参考訳）: プライバシ保護ビデオ異常検出のための正規化フローをガイドしたモーションセマンティクス
Authors: Yang Liu, Boan Chen, Yuanyuan Meng, Jing Liu, Zhengliang Guo, Wei Zhou, Peng Sun, Hong Chen,
Abstract要約: ビデオ異常検出は、インテリジェントな監視と法医学的分析のためのマルチメディアシステムにおいて重要な課題である。本稿では,骨格に基づくVADを階層的な動作セマンティックスモデリングに分解する動作セマンティックスガイド正規化フロー(MSG-Flow)を提案する。 MSG-Flowは、それぞれ88.1%と75.8%のAUCで最先端のパフォーマンスを達成した。
参考スコア（独自算出の注目度）: 21.81092485652255
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As embodied perception systems increasingly bridge digital and physical realms in interactive multimedia applications, the need for privacy-preserving approaches to understand human activities in physical environments has become paramount. Video anomaly detection is a critical task in such embodied multimedia systems for intelligent surveillance and forensic analysis. Skeleton-based approaches have emerged as a privacy-preserving alternative that processes physical world information through abstract human pose representations while discarding sensitive visual attributes such as identity and facial features. However, existing skeleton-based methods predominantly model continuous motion trajectories in a monolithic manner, failing to capture the hierarchical nature of human activities composed of discrete semantic primitives and fine-grained kinematic details, which leads to reduced discriminability when anomalies manifest at different abstraction levels. In this regard, we propose Motion Semantics Guided Normalizing Flow (MSG-Flow) that decomposes skeleton-based VAD into hierarchical motion semantics modeling. It employs vector quantized variational auto-encoder to discretize continuous motion into interpretable primitives, an autoregressive Transformer to model semantic-level temporal dependencies, and a conditional normalizing flow to capture detail-level pose variations. Extensive experiments on benchmarks (HR-ShanghaiTech & HR-UBnormal) demonstrate that MSG-Flow achieves state-of-the-art performance with 88.1% and 75.8% AUC respectively.
Abstract（参考訳）: インタラクティブなマルチメディアアプリケーションにおいて、デジタルと物理の領域を橋渡しする認識システムがますます普及するにつれて、物理的環境における人間の活動を理解するためのプライバシー保護アプローチの必要性が最重要視されている。ビデオ異常検出は、インテリジェントな監視と法医学的分析のために組み込まれたマルチメディアシステムにおいて重要な課題である。スケルトンをベースとしたアプローチは、アイデンティティや顔の特徴といった繊細な視覚的属性を捨てながら、抽象的な人間のポーズ表現を通じて物理世界情報を処理するためのプライバシー保護の代替として登場した。しかし、既存の骨格に基づく手法は、主にモノリシックな方法で連続的な運動軌跡をモデル化し、個別のセマンティックプリミティブと微粒なキネマティックディテールからなる人間の活動の階層的な性質を捉えることができず、異なる抽象レベルで異常が現れると識別可能性が低下する。そこで本研究では,骨格型VADを階層型動作セマンティクスモデリングに分解する動作セマンティクスガイド正規化フロー(MSG-Flow)を提案する。ベクトル量子化された変分自動エンコーダを用いて、連続運動を解釈可能なプリミティブに識別し、意味レベルの時間依存をモデル化するための自己回帰変換器、詳細レベルのポーズ変動をキャプチャするための条件正規化フローを使用する。ベンチマーク(HR-ShanghaiTechとHR-UBnormal)の大規模な実験は、MSG-Flowがそれぞれ88.1%と75.8%のAUCで最先端のパフォーマンスを達成することを示した。

論文の概要: Motion Semantics Guided Normalizing Flow for Privacy-Preserving Video Anomaly Detection

関連論文リスト