Fugu-MT 論文翻訳(概要): Learning to Align Generative Appearance Priors for Fine-grained Image Retrieval

論文の概要: Learning to Align Generative Appearance Priors for Fine-grained Image Retrieval

arxiv url: http://arxiv.org/abs/2605.09859v1
Date: Mon, 11 May 2026 01:35:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:50.458396
Title: Learning to Align Generative Appearance Priors for Fine-grained Image Retrieval
Title（参考訳）: きめ細かい画像検索のための生成的出現前処理の学習
Authors: Shijie Wang, Yadan Luo, Zijian Wang, Xin Yu, Zi Huang,
Abstract要約: GAPanは、カテゴリー予測から外観モデリングへの学習目標を再構成するアライメントネットワークである。 GAPanは、広く使われている細粒度ベンチマークと粗粒度ベンチマークの両方で最先端のパフォーマンスを実現する。
参考スコア（独自算出の注目度）: 54.09324791167742
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fine-grained image retrieval (FGIR) typically relies on supervision from seen categories to learn discriminative embeddings for retrieving unseen categories. However, such supervision often biases retrieval models toward the semantics of seen categories rather than the underlying appearance characteristics that generalize across categories, thereby limiting retrieval performance on unseen categories. To tackle this, we propose GAPan, a Generative Appearance Prior alignment network that reformulates the learning objective from category prediction toward appearance modeling. Technically, GAPan treats retrieval features with an invertible density model based on normalizing flows. In the forward direction, the flow maps all instance features into a latent density space, where each seen category is modeled by a class-conditional Gaussian prior and optimized via exact likelihood estimation. This formulation preserves richer appearance details by leveraging the invertible property of the flows. In the reverse direction, samples from the high-density regions of these learned priors are mapped back to the feature space to produce appearance-aware anchors that reflect intra-category variation. These anchors supervise a prior-driven alignment objective that aligns retrieval embeddings with category-specific appearance distributions, thereby improving generalization to unseen categories. Evaluations demonstrate that our GAPan achieves state-of-the-art performance on both widely-used fine- and coarse-grained benchmarks.
Abstract（参考訳）: きめ細かい画像検索(FGIR)は、通常、目に見えないカテゴリを検索するために識別的埋め込みを学ぶために、目に見えないカテゴリの監督に依存する。しかし、このような監督は、カテゴリ全体にわたって一般化される基本的な外観特性よりも、目に見えないカテゴリのセマンティクスに対する検索モデルに偏りがあるため、目に見えないカテゴリでの検索性能が制限されることが多い。そこで本研究では,学習対象をカテゴリー予測から外観モデルへ変換するGAPanを提案する。技術的には、GAPanは正規化フローに基づいた非可逆密度モデルで検索機能を扱います。前方方向では、フローは全てのインスタンス特徴を潜在密度空間にマッピングし、各カテゴリーはクラス条件ガウスによって事前にモデル化され、正確な推定によって最適化される。この定式化は、フローの可逆性を活用することにより、よりリッチな外観の詳細を保存する。逆方向では、これらの学習前の高密度領域からのサンプルを特徴空間にマッピングし、カテゴリ内変動を反映した外観認識アンカーを生成する。これらのアンカーは、検索埋め込みをカテゴリ固有の外観分布と整合させる事前駆動アライメントの目的を監督し、その結果、目に見えないカテゴリへの一般化を改善する。我々のGAPanは、広く使われている細粒度ベンチマークと粗粒度ベンチマークの両方で最先端のパフォーマンスを実現していることを示す。

論文の概要: Learning to Align Generative Appearance Priors for Fine-grained Image Retrieval

関連論文リスト