Fugu-MT 論文翻訳(概要): A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models

論文の概要: A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models

arxiv url: http://arxiv.org/abs/2210.05211v1
Date: Tue, 11 Oct 2022 07:26:34 GMT
ステータス: 翻訳完了
システム内更新日: 2022-10-12 15:35:52.426897
Title: A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models
Title（参考訳）: 勝者決定:スパースとロバストな事前学習言語モデルを目指して
Authors: Yuanxin Liu, Fandong Meng, Zheng Lin, Jiangnan Li, Peng Fu, Yanan Cao, Weiping Wang, Jie Zhou
Abstract要約: 大規模言語モデル(PLM)はメモリフットプリントと計算の点で非効率である。 PLMはデータセットバイアスに頼り、アウト・オブ・ディストリビューション(OOD)データへの一般化に苦慮する傾向にある。最近の研究では、スパースワークはパフォーマンスを損なうことなくスパースワークに置き換えることができることが示されている。
参考スコア（独自算出の注目度）: 53.87983344862402
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the remarkable success of pre-trained language models (PLMs), they still face two challenges: First, large-scale PLMs are inefficient in terms of memory footprint and computation. Second, on the downstream tasks, PLMs tend to rely on the dataset bias and struggle to generalize to out-of-distribution (OOD) data. In response to the efficiency problem, recent studies show that dense PLMs can be replaced with sparse subnetworks without hurting the performance. Such subnetworks can be found in three scenarios: 1) the fine-tuned PLMs, 2) the raw PLMs and then fine-tuned in isolation, and even inside 3) PLMs without any parameter fine-tuning. However, these results are only obtained in the in-distribution (ID) setting. In this paper, we extend the study on PLMs subnetworks to the OOD setting, investigating whether sparsity and robustness to dataset bias can be achieved simultaneously. To this end, we conduct extensive experiments with the pre-trained BERT model on three natural language understanding (NLU) tasks. Our results demonstrate that \textbf{sparse and robust subnetworks (SRNets) can consistently be found in BERT}, across the aforementioned three scenarios, using different training and compression methods. Furthermore, we explore the upper bound of SRNets using the OOD information and show that \textbf{there exist sparse and almost unbiased BERT subnetworks}. Finally, we present 1) an analytical study that provides insights on how to promote the efficiency of SRNets searching process and 2) a solution to improve subnetworks' performance at high sparsity. The code is available at https://github.com/llyx97/sparse-and-robust-PLM.
Abstract（参考訳）: 事前訓練された言語モデル(PLM)の顕著な成功にもかかわらず、彼らはまだ2つの課題に直面している。第二に、下流タスクでは、PLMはデータセットバイアスに依存し、アウト・オブ・ディストリビューション(OOD)データへの一般化に苦労する傾向がある。この効率問題に対して、最近の研究では、高密度PLMは、性能を損なうことなくスパースサブネットに置き換えることができることが示されている。このようなサブネットワークは3つのシナリオで見ることができる。 1) 微調整plm。 2)生のPLMを分離して微調整し、内部でも 3)パラメータを微調整しないPLM。しかし,これらの結果は,in-distribution (id) 設定でのみ得られる。本稿では,PLMsサブネットワークの研究をOOD設定に拡張し,データセットバイアスに対する空間性とロバスト性を同時に達成できるかどうかを検討する。この目的のために,3つの自然言語理解(NLU)タスクに対して,事前学習したBERTモデルを用いた広範な実験を行った。以上の3つのシナリオにおいて,異なるトレーニング手法と圧縮手法を用いて, BERT において, \textbf{sparse and robust subnetworks (SRNets) が一貫して見られることを示す。さらに、OOD情報を用いてSRNetの上界を探索し、 \textbf{there are sparse and almost unbiased BERT subnetworks} を示す。最後に 1)SRNetの探索プロセスの効率向上に関する知見を提供する分析的研究 2) サブネットワークの性能を高い間隔で向上させるソリューション。コードはhttps://github.com/llyx97/sparse-and-robust-plmで入手できる。

関連論文リスト

e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs [49.01449646799905]
既存の推論モデルでは外挿がうまく行われていないことが示される。レシピ e3 は AIME'25 と HMMT'25 のスコアに基づいて最もよく知られた 1.7B モデルを生成する。 e3-1.7Bモデルは、高いpass@1スコアを得るだけでなく、ベースモデルよりもpass@kを改善する。
論文参考訳（メタデータ） (2025-06-10T17:52:42Z)
Mitigating Forgetting in LLM Fine-Tuning via Low-Perplexity Token Learning [61.99353167168545]
LLM生成データによる微調整により,目標タスクの性能が向上し,非目標タスクの劣化が低減されることを示す。微調整後のLSMにおける破滅的忘れを緩和するために、トークンの難易度低減に基づく経験的説明を提供する最初の研究である。
論文参考訳（メタデータ） (2025-01-24T08:18:56Z)
Sample-Efficient Alignment for LLMs [29.477421976548015]
本研究では,大規模言語モデル (LLM) と人選好を協調させる手法について検討した。我々はトンプソンサンプリングに基づく統一アルゴリズムを導入し、2つの異なるLCMアライメントシナリオでその応用を強調した。その結果,SEAはオラクルの嗜好と高いサンプル効率の整合性を達成し,近年のLCMの活発な探査方法よりも優れていることがわかった。
論文参考訳（メタデータ） (2024-11-03T09:18:28Z)
Making Pre-trained Language Models both Task-solvers and Self-calibrators [52.98858650625623]
プレトレーニング言語モデル(PLM)は、様々な現実世界のシステムのバックボーンとして機能する。以前の研究は、余分なキャリブレーションタスクを導入することでこの問題を緩和できることを示している。課題に対処するためのトレーニングアルゴリズムLM-TOASTを提案する。
論文参考訳（メタデータ） (2023-07-21T02:51:41Z)
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations [111.88727295707454]
本稿では,NLP分野におけるアウト・オブ・ディストリビューション(OOD)のロバスト性に関する研究を再検討する。本稿では, 明確な分化と分散の困難さを保証するための, ベンチマーク構築プロトコルを提案する。我々は,OODロバスト性の分析と評価のための事前学習言語モデルの実験を行った。
論文参考訳（メタデータ） (2023-06-07T17:47:03Z)
ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction [56.790794611002106]
大規模言語モデル(LLM)は、様々な自然言語処理(NLP)タスクにおいて、文脈内学習による顕著な結果を示している。 ICL-D3IEと呼ばれるシンプルだが効果的なテキスト内学習フレームワークを提案する。具体的には、ハードトレーニング文書から最も困難で独特なセグメントをハードデモとして抽出する。
論文参考訳（メタデータ） (2023-03-09T06:24:50Z)
Boosting Low-Data Instance Segmentation by Unsupervised Pre-training with Saliency Prompt [103.58323875748427]
この研究は、低データ体制のための新しい教師なし事前学習ソリューションを提供する。近年のPrompting技術の成功に触発されて,QEISモデルを強化した新しい事前学習手法を導入する。実験結果から,本手法は3つのデータセット上でのいくつかのQEISモデルを大幅に向上させることが示された。
論文参考訳（メタデータ） (2023-02-02T15:49:03Z)
Compressing And Debiasing Vision-Language Pre-Trained Models for Visual Question Answering [25.540831728925557]
本稿では,スパースとロバストワークを探索することにより,視覚言語事前学習モデルを同時に圧縮・デバイアスできるかどうかを検討する。以上の結果から, 疎水性, 頑健性は, 偏りに満ちた部分と競合することが明らかとなった。車だ
論文参考訳（メタデータ） (2022-10-26T08:25:03Z)
Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training [55.43088293183165]
近年の研究では、BERTのような事前学習言語モデル(PLM)には、元のPLMと同じような変換学習性能を持つマッチングワークが含まれていることが示されている。本稿では, BERTworksがこれらの研究で示された以上の可能性を秘めていることを示す。我々は、サブネットワークの普遍的な転送可能性を維持することを目的として、事前学習タスクのモデル重みよりも二項マスクを訓練する。
論文参考訳（メタデータ） (2022-04-24T08:42:47Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。