Fugu-MT 論文翻訳(概要): Enhancing Visual Prompting through Expanded Transformation Space and Overfitting Mitigation

論文の概要: Enhancing Visual Prompting through Expanded Transformation Space and Overfitting Mitigation

arxiv url: http://arxiv.org/abs/2510.07823v1
Date: Thu, 09 Oct 2025 06:08:15 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-10 17:54:14.899624
Title: Enhancing Visual Prompting through Expanded Transformation Space and Overfitting Mitigation
Title（参考訳）: 拡張されたトランスフォーメーション空間による視覚的プロンプトの強化と過度な緩和
Authors: Shohei Enomoto,
Abstract要約: ビジュアルプロンプト(VP)は、トレーニング済みの視覚モデルを下流タスクに適応させるための、パラメータ効率の良い微調整手法として期待されている。本稿では,ACAVP(Affine, Color, Additive Visual Prompting)を提案する。 ACAVPはVP法間で最先端の精度を達成し、平均精度で線形探索を超越し、分布シフトに優れたロバスト性を示す。
参考スコア（独自算出の注目度）: 0.9137554315375919
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual prompting (VP) has emerged as a promising parameter-efficient fine-tuning approach for adapting pre-trained vision models to downstream tasks without modifying model parameters. Despite offering advantages like negligible computational overhead and compatibility with black-box models, conventional VP methods typically achieve lower accuracy than other adaptation approaches. Our analysis reveals two critical limitations: the restricted expressivity of simple additive transformation and a tendency toward overfitting when the parameter count increases. To address these challenges, we propose ACAVP (Affine, Color, and Additive Visual Prompting), which enhances VP's expressive power by introducing complementary transformation operations: affine transformation for creating task-specific prompt regions while preserving original image information, and color transformation for emphasizing task-relevant visual features. Additionally, we identify that overfitting is a critical issue in VP training and introduce TrivialAugment as an effective data augmentation, which not only benefits our approach but also significantly improves existing VP methods, with performance gains of up to 12 percentage points on certain datasets. This demonstrates that appropriate data augmentation is universally beneficial for VP training. Extensive experiments across twelve diverse image classification datasets with two different model architectures demonstrate that ACAVP achieves state-of-the-art accuracy among VP methods, surpasses linear probing in average accuracy, and exhibits superior robustness to distribution shifts, all while maintaining minimal computational overhead during inference.
Abstract（参考訳）: ビジュアルプロンプト(VP)は、トレーニング済みの視覚モデルをモデルパラメータを変更することなく下流のタスクに適応するための、有望なパラメータ効率の微調整アプローチとして登場した。無視可能な計算オーバーヘッドやブラックボックスモデルとの互換性といった利点があるが、従来のVP法は一般に他の適応手法よりも精度が低い。本分析では, 単純加法変換の制限された表現率と, パラメータ数の増加に伴う過度適合傾向の2つの限界を明らかにした。これらの課題に対処するため,ACAVP (Affine, Color, and Additive Visual Prompting) を提案する。Affine transformation for create task-specific prompt region whileserving original image information, and color transformation for em emphasissizing task-relevant visual features。さらに、オーバーフィッティングはVPトレーニングにおいて重要な問題であり、TrivialAugmentを効果的なデータ拡張として導入する。これは、適切なデータ拡張がVPトレーニングに普遍的に有益であることを示している。 2つの異なるモデルアーキテクチャを持つ12の多様な画像分類データセットに対する大規模な実験により、ACAVPはVP法の間で最先端の精度を達成し、平均精度で線形探索を超越し、推論中の最小の計算オーバーヘッドを維持しながら、分布シフトに対して優れたロバスト性を示す。

論文の概要: Enhancing Visual Prompting through Expanded Transformation Space and Overfitting Mitigation

関連論文リスト