Fugu-MT 論文翻訳(概要): CONQUER: Context-Aware Representation with Query Enhancement for Text-Based Person Search

論文の概要: CONQUER: Context-Aware Representation with Query Enhancement for Text-Based Person Search

arxiv url: http://arxiv.org/abs/2601.18625v1
Date: Mon, 26 Jan 2026 16:01:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-27 15:23:08.918974
Title: CONQUER: Context-Aware Representation with Query Enhancement for Text-Based Person Search
Title（参考訳）: CONQUER: テキストに基づく人物検索のためのクエリ拡張によるコンテキスト認識表現
Authors: Zequn Xie,
Abstract要約: テキストベースパーソンサーチ(TBPS)は,大規模ギャラリーから自然言語による歩行者画像の検索を目的としている。トレーニング中のクロスモーダルアライメントを強化し、推論時にクエリを適応的に精製することで、これらの課題に対処するために設計された2段階のフレームワークであるConQUERを紹介する。 CUHK-PEDES、ICFG-PEDES、RSTPReidの実験では、ConQUERはランク1の精度とmAPの両方で強いベースラインを一貫して上回っている。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-Based Person Search (TBPS) aims to retrieve pedestrian images from large galleries using natural language descriptions. This task, essential for public safety applications, is hindered by cross-modal discrepancies and ambiguous user queries. We introduce CONQUER, a two-stage framework designed to address these challenges by enhancing cross-modal alignment during training and adaptively refining queries at inference. During training, CONQUER employs multi-granularity encoding, complementary pair mining, and context-guided optimal matching based on Optimal Transport to learn robust embeddings. At inference, a plug-and-play query enhancement module refines vague or incomplete queries via anchor selection and attribute-driven enrichment, without requiring retraining of the backbone. Extensive experiments on CUHK-PEDES, ICFG-PEDES, and RSTPReid demonstrate that CONQUER consistently outperforms strong baselines in both Rank-1 accuracy and mAP, yielding notable improvements in cross-domain and incomplete-query scenarios. These results highlight CONQUER as a practical and effective solution for real-world TBPS deployment. Source code is available at https://github.com/zqxie77/CONQUER.
Abstract（参考訳）: テキストベースパーソンサーチ(TBPS)は,大規模ギャラリーから自然言語による歩行者画像の検索を目的としている。このタスクは、公共の安全に不可欠なもので、クロスモーダルな不一致とあいまいなユーザクエリによって妨げられています。トレーニング中のクロスモーダルアライメントを強化し、推論時にクエリを適応的に精製することで、これらの課題に対処するために設計された2段階のフレームワークであるConQUERを紹介する。トレーニング中、CONQUERはマルチグラニュラリティエンコーディング、補完的なペアマイニング、および最適輸送に基づくコンテキスト誘導最適マッチングを使用して、堅牢な埋め込みを学習する。推測では、プラグアンドプレイクエリ拡張モジュールは、バックボーンの再トレーニングを必要とせず、アンカーセレクションと属性駆動のエンリッチメントを通じて曖昧または不完全なクエリを洗練する。 CUHK-PEDES、ICFG-PEDES、RSTPReidの広範囲にわたる実験により、ConQUERはランク1の精度とmAPの両方において強いベースラインを一貫して上回り、クロスドメインおよび不完全クエリのシナリオにおいて顕著な改善をもたらすことを示した。これらの結果から,ConQUERは実世界のTBPSデプロイメントにおいて,実用的で効果的なソリューションであることがわかった。ソースコードはhttps://github.com/zqxie77/CONQUER.comで入手できる。

論文の概要: CONQUER: Context-Aware Representation with Query Enhancement for Text-Based Person Search

関連論文リスト