Fugu-MT 論文翻訳(概要): On the generalization of language models from in-context learning and finetuning: a controlled study

論文の概要: On the generalization of language models from in-context learning and finetuning: a controlled study

arxiv url: http://arxiv.org/abs/2505.00661v1
Date: Thu, 01 May 2025 17:02:27 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-02 19:15:55.378033
Title: On the generalization of language models from in-context learning and finetuning: a controlled study
Title（参考訳）: 文脈内学習と微調整による言語モデルの一般化について--制御された研究
Authors: Andrew K. Lampinen, Arslan Chaudhry, Stephanie C. Y. Chan, Cody Wild, Diane Wan, Alex Ku, Jörg Bornschein, Razvan Pascanu, Murray Shanahan, James L. McClelland,
Abstract要約: 言語モデルの文脈内学習は、異なる帰納バイアスを示し、場合によってはより一般化できることを示す。本研究では,微調整データに文脈内推論を追加することによって,微調整による一般化を改善する手法を提案する。この結果は,言語モデルにおける学習様式の違いによる帰納バイアスの理解に影響を及ぼす。
参考スコア（独自算出の注目度）: 36.384796130439035
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models exhibit exciting capabilities, yet can show surprisingly narrow generalization from finetuning -- from failing to generalize to simple reversals of relations they are trained on, to missing logical deductions that can be made from trained information. These failures to generalize from fine-tuning can hinder practical application of these models. However, language models' in-context learning shows different inductive biases, and can generalize better in some of these cases. Here, we explore these differences in generalization between in-context- and fine-tuning-based learning. To do so, we constructed several novel datasets to evaluate and improve models' ability to generalize from finetuning data. The datasets are constructed to isolate the knowledge in the dataset from that in pretraining, to create clean tests of generalization. We expose pretrained large models to controlled subsets of the information in these datasets -- either in context, or through fine-tuning -- and evaluate their performance on test sets that require various types of generalization. We find overall that in data-matched settings, in-context learning can generalize more flexibly than fine-tuning (though we also find some qualifications of prior findings, such as cases when fine-tuning can generalize to reversals embedded in a larger structure of knowledge). We build on these findings to propose a method to enable improved generalization from fine-tuning: adding in-context inferences to finetuning data. We show that this method improves generalization across various splits of our datasets and other benchmarks. Our results have implications for understanding the inductive biases of different modes of learning in language models, and practically improving their performance.
Abstract（参考訳）: 大規模な言語モデルは、エキサイティングな能力を示すが、微調整から、訓練されている関係の単純な逆転への一般化の失敗、訓練された情報から得られる論理的推論の欠如まで、驚くほど狭い一般化を示すことができる。これらの微調整による一般化の失敗は、これらのモデルの実用的な応用を妨げる。しかし、言語モデルの文脈内学習は、異なる帰納バイアスを示し、これらのケースではより一般化することができる。本稿では,テキスト内学習と微調整学習の一般化におけるこれらの違いについて考察する。そこで我々は,微調整データからモデルを一般化する能力を評価・改善するために,いくつかの新しいデータセットを構築した。データセットは、データセットの知識と事前トレーニングの知識を分離するために構築され、一般化のクリーンなテストを作成する。プレトレーニング済みの大規模モデルを、コンテキストか微調整によって、これらのデータセットの情報の制御されたサブセットに公開し、様々な種類の一般化を必要とするテストセットでそれらのパフォーマンスを評価する。全体として、データマッチング設定では、文脈内学習は微調整よりも柔軟に一般化できる(ただし、微調整がより大きな知識構造に埋め込まれた逆転に一般化できる場合など、事前発見の資格もいくつかある)。これらの結果に基づいて、細調整データにコンテキスト内推論を追加することによって、微調整による一般化を改善する手法を提案する。本研究では,本手法により,データセットおよび他のベンチマークの様々な分割における一般化が向上することを示す。本研究の結果は,言語モデルにおける学習様式の帰納的バイアスの理解と,その性能向上に寄与すると考えられる。

論文の概要: On the generalization of language models from in-context learning and finetuning: a controlled study

関連論文リスト