Fugu-MT 論文翻訳(概要): FitPro: A Zero-Shot Framework for Interactive Text-based Pedestrian Retrieval in Open World

論文の概要: FitPro: A Zero-Shot Framework for Interactive Text-based Pedestrian Retrieval in Open World

arxiv url: http://arxiv.org/abs/2509.16674v1
Date: Sat, 20 Sep 2025 12:55:18 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-23 18:58:15.916901
Title: FitPro: A Zero-Shot Framework for Interactive Text-based Pedestrian Retrieval in Open World
Title（参考訳）: FitPro:オープンワールドにおけるインタラクティブテキストベースの歩行者検索のためのゼロショットフレームワーク
Authors: Zengli Luo, Canlong Zhang, Xiaochun Lu, Zhixin Li,
Abstract要約: FitProはオープンワールドのインタラクティブなTPRフレームワークで、セマンティック理解とクロスシーン適応性を強化している。 FitProには、FCD(Feature Contrastive Decoding)、ISM(Incrmental Semantic Mining)、QHR(Query-aware Hierarchical Retrieval)の3つの革新的なコンポーネントがある。
参考スコア（独自算出の注目度）: 13.089848592467675
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-based Pedestrian Retrieval (TPR) aims to retrieve specific target pedestrians in visual scenes according to natural language descriptions. Although existing methods have achieved progress under constrained settings, interactive retrieval in the open-world scenario still suffers from limited model generalization and insufficient semantic understanding. To address these challenges, we propose FitPro, an open-world interactive zero-shot TPR framework with enhanced semantic comprehension and cross-scene adaptability. FitPro has three innovative components: Feature Contrastive Decoding (FCD), Incremental Semantic Mining (ISM), and Query-aware Hierarchical Retrieval (QHR). The FCD integrates prompt-guided contrastive decoding to generate high-quality structured pedestrian descriptions from denoised images, effectively alleviating semantic drift in zero-shot scenarios. The ISM constructs holistic pedestrian representations from multi-view observations to achieve global semantic modeling in multi-turn interactions,thereby improving robustness against viewpoint shifts and fine-grained variations in descriptions. The QHR dynamically optimizes the retrieval pipeline according to query types, enabling efficient adaptation to multi-modal and multi-view inputs. Extensive experiments on five public datasets and two evaluation protocols demonstrate that FitPro significantly overcomes the generalization limitations and semantic modeling constraints of existing methods in interactive retrieval, paving the way for practical deployment. The code and data will be released at https://github.com/ lilo4096/FitPro-Interactive-Person-Retrieval.
Abstract（参考訳）: テキストベースのPedestrian Retrieval(TPR)は、自然言語による記述に基づいて、視覚的なシーンで特定の歩行者を検索することを目的としている。既存の手法は制約された設定下で進歩を遂げているが、オープンワールドシナリオにおける対話的検索は、モデル一般化の制限と意味理解の不十分さに悩まされている。これらの課題に対処するため、我々は、セマンティック理解とクロスシーン適応性を強化したオープンワールドインタラクティブなゼロショットTPRフレームワークであるFitProを提案する。 FitProには、FCD(Feature Contrastive Decoding)、ISM(Incrmental Semantic Mining)、QHR(Query-aware Hierarchical Retrieval)の3つの革新的なコンポーネントがある。 FCDは、プロンプト誘導されたコントラストデコーディングを統合して、画像から高品質な構造化された歩行者記述を生成し、ゼロショットシナリオにおけるセマンティックドリフトを効果的に緩和する。 ISMは、多視点の観察から総合的な歩行者表現を構築し、多方向の相互作用におけるグローバルな意味モデリングを実現し、視点シフトに対する堅牢性を改善し、説明のきめ細かい変化を改善する。 QHRはクエリタイプに応じて検索パイプラインを動的に最適化し、マルチモーダルおよびマルチビュー入力への効率的な適応を可能にする。 5つの公開データセットと2つの評価プロトコルに関する大規模な実験により、FitProはインタラクティブ検索における既存のメソッドの一般化制限とセマンティックモデリング制約を著しく克服し、実用的なデプロイメントの道を開いた。コードとデータはhttps://github.com/ lilo4096/FitPro-Interactive-Person-Retrievalで公開される。

論文の概要: FitPro: A Zero-Shot Framework for Interactive Text-based Pedestrian Retrieval in Open World

関連論文リスト