Fugu-MT 論文翻訳(概要): Temporal Prototyping and Hierarchical Alignment for Unsupervised Video-based Visible-Infrared Person Re-Identification

論文の概要: Temporal Prototyping and Hierarchical Alignment for Unsupervised Video-based Visible-Infrared Person Re-Identification

arxiv url: http://arxiv.org/abs/2604.21324v1
Date: Thu, 23 Apr 2026 06:26:07 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-24 14:40:06.333985
Title: Temporal Prototyping and Hierarchical Alignment for Unsupervised Video-based Visible-Infrared Person Re-Identification
Title（参考訳）: 教師なし映像に基づく可視赤外人物再同定のための時間的プロトタイピングと階層的アライメント
Authors: Zhiyong Li, Wei Jiang, Haojie Liu, Mingyu Wang, Wanchong Xu, Weijie Mao,
Abstract要約: Visible-infrared person re-identification (VI-ReID)は、全日監視のための相互モダリティ識別マッチングを可能にする。ビデオベースのVI-ReIDが最近登場し、時間的ダイナミクスを活用して堅牢性を改善している。教師なしビデオベースのVI-ReIDのためのプロトタイプ駆動フレームワークであるHiTProを提案する。
参考スコア（独自算出の注目度）: 8.813015264058574
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visible-infrared person re-identification (VI-ReID) enables cross-modality identity matching for all-day surveillance, yet existing methods predominantly focus on the image level or rely heavily on costly identity annotations. While video-based VI-ReID has recently emerged to exploit temporal dynamics for improved robustness, existing studies remain limited to supervised settings. Crucially, the unsupervised video VI-ReID problem, where models must learn from RGB and infrared tracklets without identity labels, remains largely unexplored despite its practical importance in real-world deployment. To bridge this gap, we propose HiTPro (Hierarchical Temporal Prototyping), a prototype-driven framework without explicit hard pseudo-label assignment for unsupervised video-based VI-ReID. HiTPro begins with an efficient Temporal-aware Feature Encoder that first extracts discriminative frame-level features and then aggregates them into a robust tracklet-level representation. Building upon these features, HiTPro first constructs reliable intra-camera prototypes via Intra-Camera Tracklet Prototyping by aggregating features from temporally partitioned sub-tracklets. Through Hierarchical Cross-Prototype Alignment, we perform a two-stage positive mining process: progressing from within-modality associations to cross-modality matching, enhanced by Dynamic Threshold Strategy and Soft Weight Assignment. Finally, {Hierarchical Contrastive Learning} progressively optimizes feature-prototype alignment across three levels: intra-camera discrimination, cross-camera same-modality consistency, and cross-modality invariance. Extensive experiments on HITSZ-VCM and BUPTCampus demonstrate that HiTPro achieves state-of-the-art performance under fully unsupervised settings, significantly outperforming adapted baselines and establishes a strong baseline for future research.
Abstract（参考訳）: Visible-infrared person re-identification (VI-ReID)は、全日監視のための相互モダリティIDマッチングを可能にするが、既存の手法は主に画像レベルにフォーカスするか、高価なアイデンティティアノテーションに大きく依存する。ビデオベースのVI-ReIDは、時間的ダイナミクスを利用して堅牢性を向上させるために最近登場したが、既存の研究は教師付き設定に限られている。重要なことに、教師なしビデオVI-ReID問題では、モデルがRGBや赤外線トラッカーからIDラベルなしで学ぶ必要があるが、実際の展開において実際に重要であるにもかかわらず、ほとんど探索されていない。このギャップを埋めるために、教師なしビデオベースのVI-ReIDに対して、明示的な擬似ラベルを割り当てることなくプロトタイプ駆動のフレームワークであるHiTPro(Hierarchical Temporal Prototyping)を提案する。 HiTProは、まず差別的なフレームレベルの特徴を抽出し、それらを堅牢なトラックレットレベルの表現に集約する効率的な時間認識機能エンコーダから始まる。これらの機能に基づいて、HiTProはまず、時間的に分割されたサブトラックレットの機能を集約することで、カメラ内トラックレットプロトタイピングを通じて、信頼できるカメラ内プロトタイプを構築する。階層的クロスプロトタイプアライメントを通じて、我々は2段階の正の鉱業プロセスを行い、動的閾値戦略とソフトウェイトアサインメントによって強化された、モダリティ内連合からモダリティ間マッチングへと進展する。最後に、階層的コントラスト学習(Hierarchical Contrastive Learning)は、カメラ内識別(intra-camera discrimination)、カメラ間同モード一貫性(cross-camera same-modality consistency)、および相互モダリティ不変(cross-modality invariance)という3つのレベルにまたがる特徴-プロトタイプアライメントを段階的に最適化する。 HITSZ-VCMとBUPTCampusの広範囲にわたる実験により、HiTProは、完全に教師されていない設定下で最先端の性能を達成し、適応されたベースラインを大幅に上回り、将来の研究の強力なベースラインを確立することが実証された。

論文の概要: Temporal Prototyping and Hierarchical Alignment for Unsupervised Video-based Visible-Infrared Person Re-Identification

関連論文リスト