Fugu-MT 論文翻訳(概要): Prompt Tuning for CLIP on the Pretrained Manifold

論文の概要: Prompt Tuning for CLIP on the Pretrained Manifold

arxiv url: http://arxiv.org/abs/2602.19198v1
Date: Sun, 22 Feb 2026 13:58:41 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-24 17:42:02.52757
Title: Prompt Tuning for CLIP on the Pretrained Manifold
Title（参考訳）: Pretrained Manifold における CLIP のプロンプトチューニング
Authors: Xi Yang, Yuanrong Xu, Weigang Zhang, Guangming Lu, David Zhang, Jie Wen,
Abstract要約: 事前学習された多様体上で即時チューニングを行うフレームワークであるManiPTを提案する。 ManiPTはテキストと画像の両モードにコサイン一貫性の制約を導入する。また, インクリメンタルな修正を強制する構造バイアスを導入し, 伝達可能な方向に沿った適応を導く。
参考スコア（独自算出の注目度）: 53.797958617168966
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Prompt tuning introduces learnable prompt vectors that adapt pretrained vision-language models to downstream tasks in a parameter-efficient manner. However, under limited supervision, prompt tuning alters pretrained representations and drives downstream features away from the pretrained manifold toward directions that are unfavorable for transfer. This drift degrades generalization. To address this limitation, we propose ManiPT, a framework that performs prompt tuning on the pretrained manifold. ManiPT introduces cosine consistency constraints in both the text and image modalities to confine the learned representations within the pretrained geometric neighborhood. Furthermore, we introduce a structural bias that enforces incremental corrections, guiding the adaptation along transferable directions to mitigate reliance on shortcut learning. From a theoretical perspective, ManiPT alleviates overfitting tendencies under limited data. Our experiments cover four downstream settings: unseen-class generalization, few-shot classification, cross-dataset transfer, and domain generalization. Across these settings, ManiPT achieves higher average performance than baseline methods. Notably, ManiPT provides an explicit perspective on how prompt tuning overfits under limited supervision.
Abstract（参考訳）: プロンプトチューニングは、事前学習された視覚言語モデルをパラメータ効率のよい方法で下流タスクに適応させる学習可能なプロンプトベクトルを導入している。しかし、限定的な監督の下で、プロンプトチューニングは事前訓練された表現を変更し、事前訓練された多様体から転送に好ましくない方向に向かって下流の特徴を駆動する。このドリフトは一般化を低下させる。この制限に対処するため、事前訓練された多様体上で即時チューニングを行うフレームワークであるManiPTを提案する。 ManiPTはテキストと画像のモダリティの両方にコサイン一貫性の制約を導入し、学習された表現を事前訓練された幾何学的近傍に閉じ込める。さらに, 漸進的な修正を実施する構造バイアスを導入し, 伝達可能な方向に沿った適応を誘導し, ショートカット学習への依存を軽減する。理論的には、ManiPTは限られたデータの下で過度に適合する傾向を緩和する。実験では, 未知のクラス一般化, 少数ショット分類, クロスデータセット転送, ドメイン一般化の4つのダウンストリーム設定について検討した。これらの設定全体で、ManiPTはベースラインメソッドよりも平均的なパフォーマンスを達成する。特に、ManiPTは、限られた監督下での迅速なチューニングの過度な適合について、明確な視点を提供する。

論文の概要: Prompt Tuning for CLIP on the Pretrained Manifold

関連論文リスト