Fugu-MT 論文翻訳(概要): Less is More: On the Feature Redundancy of Pretrained Models When Transferring to Few-shot Tasks

論文の概要: Less is More: On the Feature Redundancy of Pretrained Models When Transferring to Few-shot Tasks

arxiv url: http://arxiv.org/abs/2310.03843v1
Date: Thu, 5 Oct 2023 19:00:49 GMT
ステータス: 翻訳完了
システム内更新日: 2023-10-12 19:00:06.020325
Title: Less is More: On the Feature Redundancy of Pretrained Models When Transferring to Few-shot Tasks
Title（参考訳）: 以下:Few-shotタスクへの転送時の事前訓練モデルの特徴冗長性について
Authors: Xu Luo, Difan Zou, Lianli Gao, Zenglin Xu, Jingkuan Song
Abstract要約: 事前訓練されたモデルを下流タスクに転送することは、ターゲットデータと線形探索を行うのと同じくらい簡単である。線形探索では, 下流データが少ない場合に, 事前学習した特徴が極めて冗長であることを示す。
参考スコア（独自算出の注目度）: 120.23328563831704
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transferring a pretrained model to a downstream task can be as easy as conducting linear probing with target data, that is, training a linear classifier upon frozen features extracted from the pretrained model. As there may exist significant gaps between pretraining and downstream datasets, one may ask whether all dimensions of the pretrained features are useful for a given downstream task. We show that, for linear probing, the pretrained features can be extremely redundant when the downstream data is scarce, or few-shot. For some cases such as 5-way 1-shot tasks, using only 1\% of the most important feature dimensions is able to recover the performance achieved by using the full representation. Interestingly, most dimensions are redundant only under few-shot settings and gradually become useful when the number of shots increases, suggesting that feature redundancy may be the key to characterizing the "few-shot" nature of few-shot transfer problems. We give a theoretical understanding of this phenomenon and show how dimensions with high variance and small distance between class centroids can serve as confounding factors that severely disturb classification results under few-shot settings. As an attempt at solving this problem, we find that the redundant features are difficult to identify accurately with a small number of training samples, but we can instead adjust feature magnitude with a soft mask based on estimated feature importance. We show that this method can generally improve few-shot transfer performance across various pretrained models and downstream datasets.
Abstract（参考訳）: 事前訓練されたモデルを下流タスクに移すことは、事前訓練されたモデルから抽出された凍結した特徴に線形分類器を訓練するターゲットデータに対して線形探索を行うのと同じくらい簡単である。事前学習データセットと下流データセットの間に大きなギャップが存在する可能性があるため、事前学習された特徴のすべての次元が特定の下流タスクに役立つかどうかを問うことができる。線形探索では, 下流データが少ない場合や, 少数ショットの場合, 事前学習した機能は極めて冗長であることを示す。 5-way 1-shotタスクのようないくつかのケースでは、最も重要な機能次元の1\%しか使用せず、完全な表現を使って達成したパフォーマンスを回復できる。興味深いことに、ほとんどの次元は数ショット設定でのみ冗長であり、ショットの数が増えると徐々に有用になる。本研究では, この現象を理論的に理解し, クラスセントロイド間の高分散と小距離の寸法が, 数ショット条件下での分類結果を著しく乱す要因となることを示す。この問題を解決するために,少数のトレーニングサンプルを用いて,冗長な特徴を正確に識別することは困難であるが,その代わりに,推定特徴量に基づいてソフトマスクを用いて特徴量を調整することができる。本手法は,様々な事前学習モデルおよび下流データセット間でのショット転送性能を向上できることを示す。

関連論文リスト

These Are Not All the Features You Are Looking For: A Fundamental Bottleneck in Supervised Pretraining [10.749875317643031]
トランスファーラーニングは、最新の機械学習の基盤であり、幅広いデータで事前訓練されたモデルを、最小限の新しいデータで新しいタスクに適応する方法を約束する。本研究では,各タスクに対する事前学習混合物からのモデル伝達を評価し,事前学習した特徴がタスク固有の直接訓練のパフォーマンスに適合するかどうかを評価する。ディープラーニングモデルでは、トレーニング中に同様の機能をエンコードすると、ネットワークが新しい機能を学習できないという、基本的な制限を特定します。
論文参考訳（メタデータ） (2025-06-23T01:04:29Z)
On the Connection between Pre-training Data Diversity and Fine-tuning Robustness [66.30369048726145]
下流の有効ロバスト性に影響を与える主な要因はデータ量である。各種自然および合成データソースから抽出した事前学習分布について,本研究の成果を示す。
論文参考訳（メタデータ） (2023-07-24T05:36:19Z)
Task-Robust Pre-Training for Worst-Case Downstream Adaptation [62.05108162160981]
プレトレーニングは下流のタスクに移行することで大きな成功を収めた。本稿では,下流タスクに対する一様性能を保証するモデルについて,事前学習について考察する。
論文参考訳（メタデータ） (2023-06-21T07:43:23Z)
Optimal transfer protocol by incremental layer defrosting [66.76153955485584]
トランスファーラーニングは、限られた量のデータでモデルトレーニングを可能にする強力なツールである。最も単純な転送学習プロトコルは、データリッチなソースタスクで事前訓練されたネットワークの機能抽出層を凍結する。このプロトコルは、しばしば準最適であり、事前学習されたネットワークの小さな部分を凍結したままにしておくと、最大の性能向上が達成される可能性がある。
論文参考訳（メタデータ） (2023-03-02T17:32:11Z)
On Measuring the Intrinsic Few-Shot Hardness of Datasets [49.37562545777455]
トレーニング済みのモデルに対して、データセットに固有の数ショットの硬さを示す。そこで我々は,数発の学習が可能な直感をとらえる,シンプルで軽量な尺度"Spread"を提案する。我々の測定基準は、既存の硬さの概念に比べて数発の硬さを考慮し、計算が8～100倍高速である。
論文参考訳（メタデータ） (2022-11-16T18:53:52Z)
Task-Customized Self-Supervised Pre-training with Scalable Dynamic Routing [76.78772372631623]
セルフ教師付き事前トレーニングの一般的な実践は、できるだけ多くのデータを使用することである。しかし、特定のダウンストリームタスクでは、事前トレーニングで無関係なデータを含むと、ダウンストリームのパフォーマンスが低下する可能性がある。異なるタスクのための事前トレーニングで、異なるダウンストリームタスクにカスタマイズされたデータセットを使用することは、重荷であり、実現不可能である。
論文参考訳（メタデータ） (2022-05-26T10:49:43Z)
Revisiting the Updates of a Pre-trained Model for Few-shot Learning [11.871523410051527]
我々は2つの人気のある更新手法、微調整と線形探索を比較した。試料数の増加に伴い, 微調整は線形探索より優れていることがわかった。
論文参考訳（メタデータ） (2022-05-13T08:47:06Z)
GDC- Generalized Distribution Calibration for Few-Shot Learning [5.076419064097734]
大規模なラベル付きデータセットを組み立てるのにかなりの時間と労力を要するため、機械学習において重要な問題となるショットラーニングはほとんどない。ほとんどの少数ショット学習アルゴリズムは、2つの制限の1つに悩まされている。そこで本研究では,全大クラスの重み付きランダム変数として分類する際,少数ショット分布を推定する一般サンプリング手法を提案する。
論文参考訳（メタデータ） (2022-04-11T16:22:53Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。