Fugu-MT 論文翻訳(概要): Learning Transformer Programs

論文の概要: Learning Transformer Programs

arxiv url: http://arxiv.org/abs/2306.01128v2
Date: Tue, 31 Oct 2023 00:47:31 GMT
ステータス: 翻訳完了
システム内更新日: 2023-11-01 23:53:15.118216
Title: Learning Transformer Programs
Title（参考訳）: 学習用トランスフォーマープログラム
Authors: Dan Friedman, Alexander Wettig, Danqi Chen
Abstract要約: 設計によって機械的に解釈可能なトランスフォーマーの訓練手順を導入する。人書きプログラムをTransformerにコンパイルする代わりに、勾配に基づく最適化を用いてトレーニングできる改良されたTransformerを設計する。 Transformer Programsは適切なソリューションを自動的に見つけ、同等のサイズの標準のTransformerと同等に動作する。
参考スコア（独自算出の注目度）: 78.9509560355733
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent research in mechanistic interpretability has attempted to reverse-engineer Transformer models by carefully inspecting network weights and activations. However, these approaches require considerable manual effort and still fall short of providing complete, faithful descriptions of the underlying algorithms. In this work, we introduce a procedure for training Transformers that are mechanistically interpretable by design. We build on RASP [Weiss et al., 2021], a programming language that can be compiled into Transformer weights. Instead of compiling human-written programs into Transformers, we design a modified Transformer that can be trained using gradient-based optimization and then automatically converted into a discrete, human-readable program. We refer to these models as Transformer Programs. To validate our approach, we learn Transformer Programs for a variety of problems, including an in-context learning task, a suite of algorithmic problems (e.g. sorting, recognizing Dyck languages), and NLP tasks including named entity recognition and text classification. The Transformer Programs can automatically find reasonable solutions, performing on par with standard Transformers of comparable size; and, more importantly, they are easy to interpret. To demonstrate these advantages, we convert Transformers into Python programs and use off-the-shelf code analysis tools to debug model errors and identify the "circuits" used to solve different sub-problems. We hope that Transformer Programs open a new path toward the goal of intrinsically interpretable machine learning.
Abstract（参考訳）: 機械的解釈可能性に関する最近の研究は、ネットワークの重みとアクティベーションを慎重に調べることでリバースエンジニアリングトランスフォーマーモデルを試みている。しかし、これらのアプローチにはかなりの手作業が必要であり、基礎となるアルゴリズムの完全な忠実な記述を提供するには不足している。本研究では,設計によって機械的に解釈可能なトランスフォーマーの訓練手順を紹介する。私たちは、Transformerの重みにコンパイルできるプログラミング言語であるRASP [Weiss et al., 2021] をベースにしています。人書きプログラムをTransformerにコンパイルする代わりに、勾配に基づく最適化を用いてトレーニングし、自動的に個別の人間可読プログラムに変換できる改良されたTransformerを設計する。これらのモデルをTransformer Programsと呼ぶ。提案手法を検証するために,テキスト内学習タスク,アルゴリズム上の問題(例えば,Dyck言語を分類,認識する),名前付きエンティティ認識やテキスト分類を含むNLPタスクなど,さまざまな問題に対してTransformer Programsを学習する。トランスフォーマープログラムは、同等の大きさの標準トランスフォーマーと同等の性能で実行することで、合理的なソリューションを自動的に見つけることができる。これらの利点を実証するために、トランスフォーマーをpythonプログラムに変換し、既製のコード解析ツールを使用してモデルエラーをデバッグし、さまざまなサブ問題を解くために使用される"サーキット"を特定します。トランスフォーマープログラムが、本質的に解釈可能な機械学習の目標に向けて、新たな道を開くことを願っている。

論文の概要: Learning Transformer Programs

関連論文リスト