Fugu-MT 論文翻訳(概要): Bridging the Pose-Semantic Gap: A Cascade Framework for Text-Based Person Anomaly Search

論文の概要: Bridging the Pose-Semantic Gap: A Cascade Framework for Text-Based Person Anomaly Search

arxiv url: http://arxiv.org/abs/2604.23282v1
Date: Sat, 25 Apr 2026 12:53:15 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-28 17:12:07.249627
Title: Bridging the Pose-Semantic Gap: A Cascade Framework for Text-Based Person Anomaly Search
Title（参考訳）: Pose-Semantic Gapをブリッジする: テキストベースの人物異常検索のためのカスケードフレームワーク
Authors: Zequn Xie, Guijin Luo, Chuxin Wang, Sihang Cai, Tao Jin, Zhou Zhao, Yixuan Tang,
Abstract要約: テキストベースの人物異常検索は、自然言語クエリを使用して監視アーカイブから特定の行動イベントを検索する。最近のポーズアウェア法は、意味的に異なるアクションが類似した骨格のジオメトリを共有できるという、基本的なPose-Semantic Gapに直面している。本稿では,検索を2段階に分離するSSDC(Structure-Semantic Decoupled Cascade)フレームワークを提案する。
参考スコア（独自算出の注目度）: 45.34874989015716
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-based person anomaly search retrieves specific behavioral events from surveillance archives using natural-language queries. Although recent pose-aware methods align geometric structures well, they face a fundamental Pose-Semantic Gap: semantically different actions can share similar skeletal geometries. While Multimodal Large Language Models (MLLMs) can reduce this ambiguity, using them for large-scale retrieval is computationally prohibitive. We propose the Structure-Semantic Decoupled Cascade (SSDC) framework, which decouples retrieval into two stages: (1) Structure-Aware Coarse Retrieval, where a lightweight model quickly filters candidates by skeletal similarity ; and (2) Detective Squad Interaction, a multi-agent semantic verification module. The squad consists of a Detective for fast binary filtering, an Analyst for evidence extraction, and a Writer for semantic synthesis. Finally, we re-rank candidates by fusing the synthesized captions with structural priors. Experiments on the PAB benchmark show that SSDC achieves state-of-the-art performance by balancing efficiency and semantic reasoning.
Abstract（参考訳）: テキストベースの人物異常検索は、自然言語クエリを使用して監視アーカイブから特定の行動イベントを検索する。最近のポーズアウェア法は幾何構造をうまく整合させるが、それらは基本的なポセマンティックギャップに直面している。 MLLM(Multimodal Large Language Models)はこの曖昧さを軽減できるが、大規模検索では計算が禁止されている。本研究では,検索を2段階に分離する構造・意味的分離カスケード(SSDC)フレームワークを提案する。(1) 骨格類似性により,軽量モデルで候補を迅速にフィルタリングする構造・認識粗大検索,(2) マルチエージェントセマンティック検証モジュールである検出スクワッドインタラクションである。チームは、高速バイナリフィルタリングのためのインテグレーティブ、エビデンス抽出のためのアナリスト、セマンティックシンセサイザーのためのライターで構成される。最後に, 合成キャプションを構造的事前に融合させることにより, 候補を再評価する。 PABベンチマークの実験では、SSDCは効率性とセマンティック推論のバランスをとることによって最先端のパフォーマンスを達成する。

論文の概要: Bridging the Pose-Semantic Gap: A Cascade Framework for Text-Based Person Anomaly Search

関連論文リスト