Fugu-MT 論文翻訳(概要): Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model

論文の概要: Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model

arxiv url: http://arxiv.org/abs/2310.17653v2
Date: Mon, 26 Feb 2024 18:58:43 GMT
ステータス: 翻訳完了
システム内更新日: 2024-02-28 21:57:15.325011
Title: Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model
Title（参考訳）: ファンタスティック・ゲインと発見の場所:事前学習されたモデル間の一般知識伝達の存在と展望
Authors: Karsten Roth, Lukas Thede, Almut Sophia Koepke, Oriol Vinyals, Olivier H\'enaff, Zeynep Akata
Abstract要約: 事前訓練されたモデルの任意のペアリングに対して、一方のモデルは他方では利用できない重要なデータコンテキストを抽出する。このような「補的」な知識を,性能劣化を伴わずに,あるモデルから別のモデルへ伝達できるかどうかを検討する。
参考スコア（独自算出の注目度）: 74.62272538148245
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Training deep networks requires various design decisions regarding for instance their architecture, data augmentation, or optimization. In this work, we find these training variations to result in networks learning unique feature sets from the data. Using public model libraries comprising thousands of models trained on canonical datasets like ImageNet, we observe that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other -- independent of overall performance. Given any arbitrary pairing of pretrained models and no external rankings (such as separate test sets, e.g. due to data privacy), we investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation -- a task made particularly difficult as additional knowledge can be contained in stronger, equiperformant or weaker models. Yet facilitating robust transfer in scenarios agnostic to pretrained model pairings would unlock auxiliary gains and knowledge fusion from any model repository without restrictions on model and problem specifics - including from weaker, lower-performance models. This work therefore provides an initial, in-depth exploration on the viability of such general-purpose knowledge transfer. Across large-scale experiments, we first reveal the shortcomings of standard knowledge distillation techniques, and then propose a much more general extension through data partitioning for successful transfer between nearly all pretrained models, which we show can also be done unsupervised. Finally, we assess both the scalability and impact of fundamental model properties on successful model-agnostic knowledge transfer.
Abstract（参考訳）: ディープネットワークのトレーニングには、アーキテクチャやデータ拡張、最適化など、さまざまな設計上の決定が必要である。本研究では、これらのトレーニングのバリエーションが、データからユニークな特徴集合を学習するネットワークになることを示す。 Using public model libraries comprising thousands of models trained on canonical datasets like ImageNet, we observe that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other -- independent of overall performance. Given any arbitrary pairing of pretrained models and no external rankings (such as separate test sets, e.g. due to data privacy), we investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation -- a task made particularly difficult as additional knowledge can be contained in stronger, equiperformant or weaker models. しかし、事前訓練されたモデルペアリングに非依存なシナリオでの堅牢な転送を容易にすることは、モデルや問題固有の制約なしに、モデルリポジトリから補助的なゲインと知識の融合を解き放ちます。したがって、この研究は、そのような汎用的な知識伝達の実現可能性に関する、初期的で詳細な調査を提供する。大規模実験において,我々はまず,標準知識蒸留技術の欠点を明らかにし,さらにデータ分割によるより汎用的な拡張を提案する。最後に,モデルに依存しない知識伝達における基本モデル特性のスケーラビリティと影響を評価する。

関連論文リスト

Using External knowledge to Enhanced PLM for Semantic Matching [38.125341836302525]
本稿では,事前学習された意味的関連性判別モデルを強化するために,外部知識を用いる。 10個の公開データセットに対する実験結果から,本手法が一貫した性能向上を実現することが示された。
論文参考訳（メタデータ） (2025-05-10T11:33:48Z)
The Extrapolation Power of Implicit Models [2.3526338188342653]
暗黙のモデルは、アウト・オブ・ディストリビューション、地理的、時間的シフトといった様々な外挿シナリオでテストに投入される。我々の実験は暗黙のモデルで常に大きな性能上の優位性を証明している。
論文参考訳（メタデータ） (2024-07-19T16:01:37Z)
Complementary Learning for Real-World Model Failure Detection [15.779651238128562]
そこでは、異なる訓練パラダイムから学習特性を用いてモデルエラーを検出する。我々は,制御的かつ自己管理的な方法で,点群における意味的および予測的動作ラベルを学習することにより,我々のアプローチを実証する。大規模定性解析を行い、ライダー点雲にラベル付き異常を持つ最初のデータセットであるLidarCODAを提示する。
論文参考訳（メタデータ） (2024-07-19T13:36:35Z)
Encapsulating Knowledge in One Prompt [56.31088116526825]
KiOPは、元のモデルを変更したり、トレーニングデータにアクセスする必要なしに、さまざまなモデルからの知識を単独のプロンプトにカプセル化する。実用性の観点から、このパラダイムは、データアクセス不能なコンテキストにおけるVisual Promptの有効性を証明している。様々なデータセットとモデルを用いた実験は、提案したKiOP知識伝達パラダイムの有効性を示す。
論文参考訳（メタデータ） (2024-07-16T16:35:23Z)
Pre-trained Recommender Systems: A Causal Debiasing Perspective [19.712997823535066]
本研究では,異なるドメインから抽出した汎用ユーザ・イテムインタラクションデータをトレーニングすることで,ユニバーサルインタラクションパターンをキャプチャする汎用レコメンデータを開発する。実験により,提案モデルにより,ゼロショットと少数ショットの学習環境での推薦性能が大幅に向上する可能性が示唆された。
論文参考訳（メタデータ） (2023-10-30T03:37:32Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
微調整された事前学習言語モデルは、下流のNLPモデルを構築するための主要なパラダイムとなっている。これは、より優れた単一モデルを生み出すために、個々のモデル間で知識を融合させる障壁を生み出します。パラメータ空間のモデルをマージするデータレス知識融合法を提案する。
論文参考訳（メタデータ） (2022-12-19T20:46:43Z)
Investigating Ensemble Methods for Model Robustness Improvement of Text Classifiers [66.36045164286854]
既存のバイアス機能を分析し、すべてのケースに最適なモデルが存在しないことを実証します。適切なバイアスモデルを選択することで、より洗練されたモデル設計でベースラインよりもロバスト性が得られる。
論文参考訳（メタデータ） (2022-10-28T17:52:10Z)
Synthetic Model Combination: An Instance-wise Approach to Unsupervised Ensemble Learning [92.89846887298852]
ラベル付きデータのトレーニングセットから学ぶ機会のない、新しいテストデータに対する予測を検討する。専門家モデルのセットと予測へのアクセスと、トレーニングに使用するデータセットに関する制限された情報を提供すること。
論文参考訳（メタデータ） (2022-10-11T10:20:31Z)
Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models [89.44031286278347]
本稿では,モデルハブからの知識伝達を可能にするハブパスウェイフレームワークを提案する。提案するフレームワークは、目的のタスク固有の損失でエンドツーエンドにトレーニングすることができる。コンピュータビジョンおよび強化学習タスクの実験結果は、このフレームワークが最先端のパフォーマンスを達成することを示す。
論文参考訳（メタデータ） (2022-06-08T08:00:12Z)
Exploring Strategies for Generalizable Commonsense Reasoning with Pre-trained Models [62.28551903638434]
モデルの一般化と精度に及ぼす3つの異なる適応法の影響を計測する。 2つのモデルを用いた実験では、微調整はタスクの内容と構造の両方を学習することで最もうまく機能するが、過度に適合し、新しい答えへの限定的な一般化に苦しむ。我々は、プレフィックスチューニングのような代替適応手法が同等の精度を持つのを観察するが、解を見落とさずに一般化し、対数分割に対してより堅牢である。
論文参考訳（メタデータ） (2021-09-07T03:13:06Z)
End-to-End Weak Supervision [15.125993628007972]
下流モデルを直接学習するためのエンドツーエンドアプローチを提案する。下流テストセットにおけるエンドモデル性能の観点から,先行作業よりも性能が向上したことを示す。
論文参考訳（メタデータ） (2021-07-05T19:10:11Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。