Fugu-MT 論文翻訳(概要): Causal Bootstrapped Alignment for Unsupervised Video-Based Visible-Infrared Person Re-Identification

論文の概要: Causal Bootstrapped Alignment for Unsupervised Video-Based Visible-Infrared Person Re-Identification

arxiv url: http://arxiv.org/abs/2604.15631v1
Date: Fri, 17 Apr 2026 02:15:59 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-20 22:00:19.706762
Title: Causal Bootstrapped Alignment for Unsupervised Video-Based Visible-Infrared Person Re-Identification
Title（参考訳）: 教師なしビデオベース可視赤外線人物再同定のための因果ブートストラップアライメント
Authors: Shuang Li, Jiaxu Leng, Changjiang Kuang, Mingpi Tan, Yu Yuan, Xinbo Gao,
Abstract要約: VVI-ReIDは、静止画像以外にも時間情報が付加的な手がかりを提供する、全日監視のための重要な技術である。既存のアプローチは、高価なクロスモダリティアノテーションによる完全な教師付き学習に大きく依存しており、スケーラビリティが制限されています。そこで,本研究では,ビデオの先行を明示的に活用するCausal Bootstrapped Alignmentフレームワークを提案する。
参考スコア（独自算出の注目度）: 52.784239635604735
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: VVI-ReID is a critical technique for all-day surveillance, where temporal information provides additional cues beyond static images. However, existing approaches rely heavily on fully supervised learning with expensive cross-modality annotations, limiting scalability. To address this issue, we investigate Unsupervised Learning for VVI-ReID (USL-VVI-ReID), which learns identity-discriminative representations directly from unlabeled video tracklets. Directly extending image-based USL-VI-ReID methods to this setting with generic pretrained encoders leads to suboptimal performance. Such encoders suffer from weak identity discrimination and strong modality bias, resulting in severe intra-modality identity confusion and pronounced clustering granularity imbalance between visible and infrared modalities. These issues jointly degrade pseudo-label reliability and hinder effective cross-modality alignment. To address these challenges, we propose a Causal Bootstrapped Alignment (CBA) framework that explicitly exploits inherent video priors. First, we introduce Causal Intervention Warm-up (CIW), which performs sequence-level causal interventions by leveraging temporal identity consistency and cross-modality identity consistency to suppress modality- and motion-induced spurious correlations while preserving identity-relevant semantics, yielding cleaner representations for unsupervised clustering. Second, we propose Prototype-Guided Uncertainty Refinement (PGUR), which employs a coarse-to-fine alignment strategy to resolve cross-modality granularity mismatch, reorganizing under-clustered infrared representations under the guidance of reliable visible prototypes with uncertainty-aware supervision. Extensive experiments on the HITSZ-VCM and BUPTCampus benchmarks demonstrate that CBA significantly outperforms existing USL-VI-ReID methods when extended to the USL-VVI-ReID setting.
Abstract（参考訳）: VVI-ReIDは、静止画像以外にも時間情報が付加的な手がかりを提供する、全日監視のための重要な技術である。しかし、既存のアプローチは、高価なクロスモダリティアノテーションによる完全な教師付き学習に大きく依存し、スケーラビリティを制限します。この問題に対処するために,未ラベルビデオトラッカーから直接個人識別表現を学習するUnsupervised Learning for VVI-ReID (USL-VVI-ReID) について検討する。画像ベースUSL-VI-ReIDメソッドを一般的な事前訓練エンコーダでこの設定に直接拡張すると、最適化性能が低下する。このようなエンコーダは、弱いアイデンティティの識別と強いモダリティバイアスに悩まされ、重度のモダリティ内アイデンティティの混乱と、可視と赤外線のモダリティ間のクラスタリングの粒度不均衡が顕著になる。これらの問題は、疑似ラベルの信頼性を損なうとともに、効果的な相互モダリティアライメントを阻害する。これらの課題に対処するために、我々は、固有のビデオの優先順位を明示的に活用するCausal Bootstrapped Alignment (CBA)フレームワークを提案する。まず、時間的アイデンティティ整合性と相互モダリティのアイデンティティ整合性を活用してシーケンシャルレベルの因果介入を行い、モダリティと運動によって引き起こされる刺激的相関を抑えるとともに、アイデンティティ関連セマンティクスを保存し、教師なしクラスタリングのためのよりクリーンな表現を提供する。第2に,不確実性を考慮した信頼性のある可視プロトタイプの指導の下で,粗粒度ミスマッチを解消するための粗粒度アライメント戦略を用いたプロトタイプガイド不確実性リファインメント(PGUR)を提案する。 HITSZ-VCMおよびBUPTCampusベンチマークの大規模な実験により、CBAはUSL-VVI-ReID設定に拡張された場合、既存のUSL-VI-ReID法よりも大幅に優れていることが示された。

論文の概要: Causal Bootstrapped Alignment for Unsupervised Video-Based Visible-Infrared Person Re-Identification

関連論文リスト