Fugu-MT 論文翻訳(概要): An Analysis Focused on Womens Safety: Can VAD Models Be Enhanced by a Multi-modal Dataset?

論文の概要: An Analysis Focused on Womens Safety: Can VAD Models Be Enhanced by a Multi-modal Dataset?

arxiv url: http://arxiv.org/abs/2605.25806v1
Date: Mon, 25 May 2026 12:59:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:20.055043
Title: An Analysis Focused on Womens Safety: Can VAD Models Be Enhanced by a Multi-modal Dataset?
Title（参考訳）: 女性の安全に焦点をあてた分析 : マルチモーダルデータセットによってVADモデルを拡張できるか?
Authors: Sangeeta, Maddikuntla Sai Prajwal, Debi Prosad Dogra, Kamalakar Vijay Thakare, Hyungjoo Jung, Ig-Jae Kim, Heeseung Choi,
Abstract要約: ExtrAnomは、テキスト記述付き1001のビデオ、500の正規および501の異常を含む、新しいマルチモーダルベンチマークである。ストーカー(3.9%)、チェーンスナッチ(17.6%)、誘拐(7.3%)、暗殺(2.3%)、ハラスメント(18.9%)、正常(50%)などの異常な出来事をカバーしている。各ビデオには4つのテキストアノテーションが補われており、1つの人間が生成した記述と3つのLDM生成した記述が含まれており、クロスモーダルとVLMベースの検証が可能である。
参考スコア（独自算出の注目度）: 15.899967533390841
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Women's safety and security are paramount for a modern society. Crimes against women occur in daylight as well as in low-light conditions. Often, such events are captured through real-world surveillance cameras that operate at lower resolutions. Despite substantial progress in CV-related research, video anomaly detection (VAD) focused on women's safety has not yet been adequately addressed. Existing video anomaly datasets contain well-lit, high-resolution, close-shot videos, and fail to represent women-centric anomalies such as chain snatching, stalking, inappropriate touch, and other subtle forms of crime against women. To address these problems, we propose the ExtrAnom dataset, a new multi-modal benchmark containing 1001 videos with textual descriptions, 500 normal and 501 anomalous, classified into 5 different types of women-centric crimes. The dataset comprises low-light (8%), low-resolution videos (13%), long-shot (15%), along with daylight (64%) anomalous videos. And it covers anomalous events like stalking (3.9%), chain snatching (17.6%), kidnapping (7.3%), assassinations (2.3%), harassment (18.9%), and normal (50%). Each video is supplemented with 4 textual annotations, including one human-generated and three LLM-generated descriptions, enabling cross-modal and VLM-based validations. The aim of creating a women-centric dataset is to accurately detect the women-centric anomaly patterns, which are possible to observe visually. The dataset supplements the VLMs to accurately generate video-level descriptions. ExtrAnom has been benchmarked against popular unimodal and multi-modal VAD datasets (e.g., XD-Violence, UCF-Crime, and UCA) and SOTA methods. Experiments reveal that the existing datasets are insufficient to train models for detecting women-centric anomalies.
Abstract（参考訳）: 女性の安全と安全は現代社会にとって最重要課題である。女性に対する犯罪は日光や低照度で起こる。多くの場合、このような事象は低解像度で動く現実世界の監視カメラによって捉えられる。 CV関連研究の進歩にもかかわらず、女性の安全に焦点を当てたビデオ異常検出(VAD)はまだ適切に対処されていない。既存のビデオ異常データセットには、よく照らされた高解像度のクローズショットビデオが含まれており、チェーンスナッチ、ストーキング、不適切なタッチなどの女性中心の異常を表現できない。これらの問題に対処するため、ExtrAnomデータセットは、テキスト記述付き1001ビデオ、500の正常および501の異常を含む新しいマルチモーダル・ベンチマークであり、5種類の女性中心犯罪に分類される。データセットは、低照度(8%)、低解像度(13%)、長撮影(15%)、日光(64%)の異常なビデオで構成されている。また、ストーカー(3.9%)、チェーンスナッチ(17.6%)、誘拐(7.3%)、暗殺(2.3%)、ハラスメント(18.9%)、正常(50%)といった異常な出来事もカバーしている。各ビデオには4つのテキストアノテーションが補われており、1つの人間が生成した記述と3つのLDM生成した記述が含まれており、クロスモーダルとVLMベースの検証が可能である。女性中心のデータセットを作成する目的は、視覚的に観察できる女性中心の異常パターンを正確に検出することである。データセットはVLMを補完し、ビデオレベルの記述を正確に生成する。 ExtrAnomは、一般的なユニモーダルおよびマルチモーダルなVADデータセット(例えば、XD-Violence、UCF-Crime、UCA)とSOTAメソッドに対してベンチマークされている。実験の結果、既存のデータセットは女性中心の異常を検出するためのモデルを訓練するには不十分であることが判明した。

論文の概要: An Analysis Focused on Womens Safety: Can VAD Models Be Enhanced by a Multi-modal Dataset?

関連論文リスト