Fugu-MT 論文翻訳(概要): Investigation into In-Context Learning Capabilities of Transformers

論文の概要: Investigation into In-Context Learning Capabilities of Transformers

arxiv url: http://arxiv.org/abs/2604.25858v1
Date: Tue, 28 Apr 2026 16:57:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-29 16:49:17.964212
Title: Investigation into In-Context Learning Capabilities of Transformers
Title（参考訳）: 変圧器の文脈内学習能力の検討
Authors: Rushil Chandrupatla, Leo Bangayan, Sebastian Leng, Arya Mazumdar,
Abstract要約: トランスフォーマーは、文脈内学習(ICL)の強力な能力を示した本研究では,二項分類タスクにおける文脈内学習の体系的研究を行う。本結果は,文脈内学習がいつ成功し,いつ失敗するかを決定する上で,次元性,信号強度,文脈情報の重要性を強調した。
参考スコア（独自算出の注目度）: 14.14937947207076
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformers have demonstrated a strong ability for in-context learning (ICL), enabling models to solve previously unseen tasks using only example input output pairs provided at inference time. While prior theoretical work has established conditions under which transformers can perform linear classification in-context, the empirical scaling behavior governing when this mechanism succeeds remains insufficiently characterized. In this paper, we conduct a systematic empirical study of in-context learning for Gaussian-mixture binary classification tasks. Building on the theoretical framework of Frei and Vardi (2024), we analyze how in-context test accuracy depends on three fundamental factors: the input dimension, the number of in-context examples, and the number of pre-training tasks. Using a controlled synthetic setup and a linear in-context classifier formulation, we isolate the geometric conditions under which models successfully infer task structure from context alone. We additionally investigate the emergence of benign overfitting, where models memorize noisy in-context labels while still achieving strong generalization performance on clean test data. Through extensive sweeps across dimensionality, sequence length, task diversity, and signal-to-noise regimes, we identify the parameter regions in which this phenomenon arises and characterize how it depends on data geometry and training exposure. Our results provide a comprehensive empirical map of scaling behavior in in-context classification, highlighting the critical role of dimensionality, signal strength, and contextual information in determining when in-context learning succeeds and when it fails.
Abstract（参考訳）: トランスフォーマーは、インコンテキスト学習(ICL)の強力な能力を示しており、推論時に提供される入力出力ペアの例のみを使用して、それまで目に見えないタスクをモデルが解決することができる。従来の理論的研究は、変換器が文脈内で線形に分類できる条件を確立してきたが、この機構が成功すると決定される経験的スケーリング行動は、まだ不十分な特性を保っている。本稿では,ガウス混合二項分類タスクにおける文脈内学習の系統的研究を行う。 Frei and Vardi (2024) の理論的枠組みに基づいて、インコンテキストテストの精度が、入力次元、インコンテキストサンプルの数、事前学習タスクの数という3つの基本的な要因に依存するかを分析する。制御された合成セットアップと線形インコンテキスト分類器の定式化を用いて、モデルがコンテキストのみからタスク構造を推測することのできる幾何学的条件を分離する。さらに、クリーンなテストデータ上で強力な一般化性能を保ちながら、コンテキスト内ラベルのノイズを記憶する良性オーバーフィッティングの出現について検討する。次元, シーケンス長, タスクの多様性, 信号と雑音の規則を網羅して, この現象が生じるパラメータ領域を特定し, データ幾何やトレーニングの露出に依存するかを特徴付ける。本結果は,文脈内学習がいつ成功し,いつ失敗するかを決定する上での,次元性,信号強度,文脈情報といった重要な役割を強調し,文脈内分類におけるスケーリング行動の包括的マップを提供する。

論文の概要: Investigation into In-Context Learning Capabilities of Transformers

関連論文リスト