Fugu-MT 論文翻訳(概要): Semiparametric Language Models Are Scalable Continual Learners

論文の概要: Semiparametric Language Models Are Scalable Continual Learners

arxiv url: http://arxiv.org/abs/2303.01421v1
Date: Thu, 2 Mar 2023 17:15:02 GMT
ステータス: 翻訳完了
システム内更新日: 2023-03-03 13:14:40.551317
Title: Semiparametric Language Models Are Scalable Continual Learners
Title（参考訳）: 半パラメトリック言語モデルはスケーラブルな連続学習者である
Authors: Guangyue Peng, Tao Ge, Si-Qing Chen, Furu Wei, Houfeng Wang
Abstract要約: セミパラメトリック言語モデル(LM)は、新しいテキストデータから継続的に学習する上で有望であることを示す。 Selective Memorization(SeMem)と呼ばれるシンプルで直感的なアプローチを提案する。 SeMemは、モデルが苦労する可能性のある難しいサンプルのみを記憶している。
参考スコア（独自算出の注目度）: 83.74414880208334
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Semiparametric language models (LMs) have shown promise in continuously learning from new text data by combining a parameterized neural LM with a growable non-parametric memory for memorizing new content. However, conventional semiparametric LMs will finally become prohibitive for computing and storing if they are applied to continual learning over streaming data, because the non-parametric memory grows linearly with the amount of data they learn from over time. To address the issue of scalability, we present a simple and intuitive approach called Selective Memorization (SeMem), which only memorizes difficult samples that the model is likely to struggle with. We demonstrate that SeMem improves the scalability of semiparametric LMs for continual learning over streaming data in two ways: (1) data-wise scalability: as the model becomes stronger through continual learning, it will encounter fewer difficult cases that need to be memorized, causing the growth of the non-parametric memory to slow down over time rather than growing at a linear rate with the size of training data; (2) model-wise scalability: SeMem allows a larger model to memorize fewer samples than its smaller counterpart because it is rarer for a larger model to encounter incomprehensible cases, resulting in a non-parametric memory that does not scale linearly with model size. We conduct extensive experiments in language modeling and downstream tasks to test SeMem's results, showing SeMem enables a semiparametric LM to be a scalable continual learner with little forgetting.
Abstract（参考訳）: 半パラメトリック言語モデル(LM)は、パラメータ化されたニューラルLMと成長可能な非パラメトリックメモリを組み合わせて新しいコンテンツを記憶することで、新しいテキストデータから継続的に学習することを示す。しかし,非パラメトリックメモリは時間とともに学習するデータ量とともに線形に成長するので,ストリーミングデータによる連続的な学習に適用された場合,従来のセミパラメトリックLMは計算と記憶が禁じられるようになる。スケーラビリティの問題に対処するため、我々はsemem(selective memorization)と呼ばれるシンプルで直感的なアプローチを提示します。 We demonstrate that SeMem improves the scalability of semiparametric LMs for continual learning over streaming data in two ways: (1) data-wise scalability: as the model becomes stronger through continual learning, it will encounter fewer difficult cases that need to be memorized, causing the growth of the non-parametric memory to slow down over time rather than growing at a linear rate with the size of training data; (2) model-wise scalability: SeMem allows a larger model to memorize fewer samples than its smaller counterpart because it is rarer for a larger model to encounter incomprehensible cases, resulting in a non-parametric memory that does not scale linearly with model size. 本稿では,SeMemの結果をテストするために,言語モデリングと下流タスクの広範な実験を行い,セミパラメトリックLMを,ほとんど忘れずに拡張性のある連続学習者として実現できることを実証した。

関連論文リスト

LESA: Learnable LLM Layer Scaling-Up [57.0510934286449]
LLM(Large Language Models)をスクラッチからトレーニングするには膨大な計算資源が必要であるため、非常に高価である。モデルスケーリングアップは、より小さなモデルのパラメータを活用してより大きなモデルを作成することで、有望なソリューションを提供する。深度スケールアップのための新しい学習方法である textbfLESA を提案する。
論文参考訳（メタデータ） (2025-02-19T14:58:48Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
本稿では,リソース制限シナリオに対するSHERLと呼ばれる革新的なMETL戦略を提案する。初期経路では、中間出力は反冗長動作によって統合される。遅延ルートでは、最小限の遅延事前トレーニングされたレイヤを利用することで、メモリオーバーヘッドのピーク需要を軽減できる。
論文参考訳（メタデータ） (2024-07-10T10:22:35Z)
CAMELoT: Towards Large Language Models with Training-Free Consolidated Associative Memory [38.429707659685974]
大規模言語モデル(LLM)は、メモリとランタイムのコストが高いため、長い入力シーケンスを扱うのに苦労する。本稿では,事前学習した(凍結した)注意に基づくLCMに再学習せずに結合可能な連想記憶モジュールを提案する。 CAMELoTと呼ばれるこのアーキテクチャは、128トークンの小さなコンテキストウィンドウでも優れたパフォーマンスを示している。
論文参考訳（メタデータ） (2024-02-21T01:00:17Z)
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
分散を低減した行列生成のために, WTA-CRS と呼ばれる新しい非バイアス推定系を提案する。我々の研究は、チューニング変換器の文脈において、提案した推定器が既存のものよりも低い分散を示すという理論的および実験的証拠を提供する。
論文参考訳（メタデータ） (2023-05-24T15:52:08Z)
Can recurrent neural networks learn process model structure? [0.2580765958706854]
本稿では,適合度,精度,一般化のために,変分に基づく再サンプリングとカスタムメトリクスを組み合わせた評価フレームワークを提案する。 LSTMは、単純化されたプロセスデータであっても、プロセスモデル構造を学ぶのに苦労する可能性があることを確認します。また,トレーニング中にLSTMで見られる情報量が減少すると,一般化や精度の低下が生じた。
論文参考訳（メタデータ） (2022-12-13T08:40:01Z)
A Memory Transformer Network for Incremental Learning [64.0410375349852]
本研究では,モデルが学習する時間とともに,新しいデータクラスが観察される学習環境であるクラスインクリメンタルラーニングについて検討する。素直な問題定式化にもかかわらず、クラス増分学習への分類モデルの素直な適用は、これまで見られたクラスの「破滅的な忘れ込み」をもたらす。これは、過去のデータのサブセットをメモリバンクに保存し、将来のタスクをトレーニングする際の忘れの防止にそれを活用することで、破滅的な忘れの問題を克服するものだ。
論文参考訳（メタデータ） (2022-10-10T08:27:28Z)
Incremental Online Learning Algorithms Comparison for Gesture and Visual Smart Sensors [68.8204255655161]
本稿では,加速度センサデータに基づくジェスチャー認識と画像分類の2つの実例として,最先端の4つのアルゴリズムを比較した。以上の結果から,これらのシステムの信頼性と小型メモリMCUへのデプロイの可能性が確認された。
論文参考訳（メタデータ） (2022-09-01T17:05:20Z)
Quantifying Memorization Across Neural Language Models [61.58529162310382]
大規模言語モデル(LM)は、トレーニングデータの一部を記憶するために示され、適切に誘導されると、記憶されたデータを冗長に出力する。これは、暗記がプライバシーを侵害し(ユーザーデータをエクスポーティングする)、実用性を低下させ(繰り返し覚えやすいテキストは、しばしば品質が低い)、公平性を損なうため、望ましくない。本稿では、LMが記憶されたトレーニングデータを出力する度合いを定量化する3つの対数線形関係について述べる。
論文参考訳（メタデータ） (2022-02-15T18:48:31Z)
Evolving Metric Learning for Incremental and Decremental Features [45.696514400861275]
インクリメンタルおよびデクリメンタル機能のための新しいオンラインEvolving Metric Learningモデルを開発した。我々のモデルはスムーズなワッサーシュタイン距離を組み込むことで、インスタンスと特徴の進化を同時に扱うことができる。ワンショットケースでの課題に対処するだけでなく、モデルをマルチショットシナリオに拡張します。
論文参考訳（メタデータ） (2020-06-27T10:29:38Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。