Fugu-MT 論文翻訳(概要): DINOv2 Driven Gait Representation Learning for Video-Based Visible-Infrared Person Re-identification

論文の概要: DINOv2 Driven Gait Representation Learning for Video-Based Visible-Infrared Person Re-identification

arxiv url: http://arxiv.org/abs/2511.04281v1
Date: Thu, 06 Nov 2025 11:21:13 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-07 20:17:53.40213
Title: DINOv2 Driven Gait Representation Learning for Video-Based Visible-Infrared Person Re-identification
Title（参考訳）: DINOv2-Driven Gait Representation Learning for Video-based Visible-Infrared Person Re-identification
Authors: Yujie Yang, Shuang Li, Jun Ye, Neng Dong, Fan Li, Huafeng Li,
Abstract要約: Visible-Infrared person re-identification (VVI-ID) は、視界と赤外線を横断する同じ歩行者をモダリティから回収することを目的としている。これらの課題に対処するために、DINOv2の豊富な視覚的優位性を活用して、外観に相補的な歩行特徴を学習するゲイト表現学習フレームワークを提案する。具体的にはセマンティック・アウェア・シルエット・ゲイトラーニング(GL)モデルを提案する。
参考スコア（独自算出の注目度）: 30.593882551803855
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video-based Visible-Infrared person re-identification (VVI-ReID) aims to retrieve the same pedestrian across visible and infrared modalities from video sequences. Existing methods tend to exploit modality-invariant visual features but largely overlook gait features, which are not only modality-invariant but also rich in temporal dynamics, thus limiting their ability to model the spatiotemporal consistency essential for cross-modal video matching. To address these challenges, we propose a DINOv2-Driven Gait Representation Learning (DinoGRL) framework that leverages the rich visual priors of DINOv2 to learn gait features complementary to appearance cues, facilitating robust sequence-level representations for cross-modal retrieval. Specifically, we introduce a Semantic-Aware Silhouette and Gait Learning (SASGL) model, which generates and enhances silhouette representations with general-purpose semantic priors from DINOv2 and jointly optimizes them with the ReID objective to achieve semantically enriched and task-adaptive gait feature learning. Furthermore, we develop a Progressive Bidirectional Multi-Granularity Enhancement (PBMGE) module, which progressively refines feature representations by enabling bidirectional interactions between gait and appearance streams across multiple spatial granularities, fully leveraging their complementarity to enhance global representations with rich local details and produce highly discriminative features. Extensive experiments on HITSZ-VCM and BUPT datasets demonstrate the superiority of our approach, significantly outperforming existing state-of-the-art methods.
Abstract（参考訳）: Visible-Infrared person re-identification (VVI-ReID) は、ビデオシーケンスから可視および赤外線モダリティを越えて同じ歩行者を検索することを目的としている。既存の手法では、モダリティ不変の視覚的特徴を生かしがちであるが、主にモダリティ不変のだけでなく、時間的ダイナミクスに富む歩行的特徴を概ね見落としているため、モダリティ間マッチングに不可欠な時空間一貫性をモデル化する能力は制限される。これらの課題に対処するために、DINOv2の豊富な視覚的先入観を生かしたDINOv2-Driven Gait Representation Learning (DinoGRL) フレームワークを提案する。具体的には、DINOv2から汎用的なセマンティック・セマンティック・セマンティック・セマンティック・アウェア・シルエット・アンド・ゲイト・ラーニング(SASGL)モデルを導入し、それらをReID目標と共同で最適化し、セマンティック・エンリッチでタスク適応的な歩行特徴学習を実現する。さらに,複数の空間的粒度にまたがる歩行ストリームと外観ストリームの双方向相互作用を可能とし,その相補性を十分に活用し,豊かな局所的詳細でグローバル表現を強化し,高度に識別できる特徴表現を段階的に洗練する,プログレッシブ・双方向多角性拡張(PBMGE)モジュールを開発した。 HITSZ-VCMとBUPTデータセットの大規模な実験は、我々のアプローチの優位性を示し、既存の最先端手法を著しく上回っている。

論文の概要: DINOv2 Driven Gait Representation Learning for Video-Based Visible-Infrared Person Re-identification

関連論文リスト