Fugu-MT 論文翻訳(概要): In-Context Language Learning: Architectures and Algorithms

論文の概要: In-Context Language Learning: Architectures and Algorithms

arxiv url: http://arxiv.org/abs/2401.12973v2
Date: Tue, 30 Jan 2024 18:59:34 GMT
ステータス: 翻訳完了
システム内更新日: 2024-01-31 18:09:10.403654
Title: In-Context Language Learning: Architectures and Algorithms
Title（参考訳）: インコンテキスト言語学習: アーキテクチャとアルゴリズム
Authors: Ekin Aky\"urek, Bailin Wang, Yoon Kim, Jacob Andreas
Abstract要約: 我々は、文脈言語学習(ICLL)において、私たちが用語する新しいモデル問題群(英語版)のレンズを通してICLを研究する。我々は,通常のICLLタスクにおいて,多種多様なニューラルシーケンスモデルを評価する。
参考スコア（独自算出の注目度）: 73.93205821154605
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large-scale neural language models exhibit a remarkable capacity for in-context learning (ICL): they can infer novel functions from datasets provided as input. Most of our current understanding of when and how ICL arises comes from LMs trained on extremely simple learning problems like linear regression and associative recall. There remains a significant gap between these model problems and the "real" ICL exhibited by LMs trained on large text corpora, which involves not just retrieval and function approximation but free-form generation of language and other structured outputs. In this paper, we study ICL through the lens of a new family of model problems we term in context language learning (ICLL). In ICLL, LMs are presented with a set of strings from a formal language, and must generate additional strings from the same language. We focus on in-context learning of regular languages generated by random finite automata. We evaluate a diverse set of neural sequence models (including several RNNs, Transformers, and state-space model variants) on regular ICLL tasks, aiming to answer three questions: (1) Which model classes are empirically capable of ICLL? (2) What algorithmic solutions do successful models implement to perform ICLL? (3) What architectural changes can improve ICLL in less performant models? We first show that Transformers significantly outperform neural sequence models with recurrent or convolutional representations on ICLL tasks. Next, we provide evidence that their ability to do so relies on specialized "n-gram heads" (higher-order variants of induction heads) that compute input-conditional next-token distributions. Finally, we show that hard-wiring these heads into neural models improves performance not just on ICLL, but natural language modeling -- improving the perplexity of 340M-parameter models by up to 1.14 points (6.7%) on the SlimPajama dataset.
Abstract（参考訳）: 大規模ニューラルネットワークモデルは、インコンテキスト学習(ICL)において顕著な能力を示し、入力として提供されるデータセットから新しい関数を推論することができる。現在のICLの理解のほとんどは、線形回帰や連想的リコールといった極めて単純な学習問題に基づいて訓練されたLMから来ています。これらのモデル問題と、大きなテキストコーパスでトレーニングされたlmsによって提示された「本物の」iclの間には、大きなギャップが残っている。本稿では、文脈言語学習(icll)におけるモデル問題の新たなファミリーのレンズを通して、iclについて検討する。 icllでは、lmsは形式言語からの文字列の集合で示され、同じ言語から追加文字列を生成する必要がある。ランダム有限オートマトンによって生成される正規言語の文脈内学習に焦点をあてる。我々は,通常のicllタスクにおける様々なニューラルネットワークモデル(複数のrnn,トランスフォーマー,状態空間モデルを含む)の評価を行い,(1)どのモデルクラスがicllを経験的に利用できるか,という3つの疑問に答えることを目的としている。 (2) 成功したモデルがicllを実行するために実装するアルゴリズム的ソリューションは何か? 3) パフォーマンスの低いモデルでicllを改善できるアーキテクチャ変更は何ですか? まず、トランスフォーマーがicllタスクで繰り返しあるいは畳み込み表現を持つニューラルネットワークモデルを大幅に上回ることを示す。次に,入力条件次値分布を計算する特殊なn-gramヘッド (higher-order variants of induction head) にその能力が依存していることを示す。最後に、これらのヘッドをニューラルネットワークモデルに切り換えることで、ICLLだけでなく、自然言語モデリング -- SlimPajamaデータセットで最大1.14ポイント(6.7%)まで340Mパラメータモデルの複雑度を改善する -- のパフォーマンスが向上することを示す。

論文の概要: In-Context Language Learning: Architectures and Algorithms

関連論文リスト