Fugu-MT 論文翻訳(概要): CoFrGeNet: Continued Fraction Architectures for Language Generation

論文の概要: CoFrGeNet: Continued Fraction Architectures for Language Generation

arxiv url: http://arxiv.org/abs/2601.21766v1
Date: Thu, 29 Jan 2026 14:16:39 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-30 16:22:49.885131
Title: CoFrGeNet: Continued Fraction Architectures for Language Generation
Title（参考訳）: CoFrGeNet: 言語生成のための継続的なフラクションアーキテクチャ
Authors: Amit Dhurandhar, Vijil Chenthamarakshan, Dennis Wei, Tejaswini Pedapati, Karthikeyan Natesan Ramamurthy, Rahul Nair,
Abstract要約: 連続分数にインスパイアされた生成モデリングのための新しい関数クラスを導入する。トランスフォーマーブロックにおけるマルチヘッドアテンションとフィードフォワードネットワークを置き換えることができる,この関数クラスに基づく新しいアーキテクチャコンポーネントを設計する。私たちのコンポーネントは、トレーニングや推論手順の変更をほとんど必要としないプラグイン代替物です。
参考スコア（独自算出の注目度）: 36.20981075573288
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformers are arguably the preferred architecture for language generation. In this paper, inspired by continued fractions, we introduce a new function class for generative modeling. The architecture family implementing this function class is named CoFrGeNets - Continued Fraction Generative Networks. We design novel architectural components based on this function class that can replace Multi-head Attention and Feed-Forward Networks in Transformer blocks while requiring much fewer parameters. We derive custom gradient formulations to optimize the proposed components more accurately and efficiently than using standard PyTorch-based gradients. Our components are a plug-in replacement requiring little change in training or inference procedures that have already been put in place for Transformer-based models thus making our approach easy to incorporate in large industrial workflows. We experiment on two very different transformer architectures GPT2-xl (1.5B) and Llama3 (3.2B), where the former we pre-train on OpenWebText and GneissWeb, while the latter we pre-train on the docling data mix which consists of nine different datasets. Results show that the performance on downstream classification, Q\& A, reasoning and text understanding tasks of our models is competitive and sometimes even superior to the original models with $\frac{2}{3}$ to $\frac{1}{2}$ the parameters and shorter pre-training time. We believe that future implementations customized to hardware will further bring out the true potential of our architectures.
Abstract（参考訳）: トランスフォーマーは間違いなく、言語生成に好まれるアーキテクチャである。本稿では,連続分数にインスパイアされた生成モデルのための新しい関数クラスを提案する。この関数クラスを実装するアーキテクチャファミリは、CoFrGeNets - Continued Fraction Generative Networksと呼ばれている。トランスフォーマーブロックにおけるマルチヘッドアテンションとフィードフォワードネットワークを置き換えることができる関数クラスに基づく新しいアーキテクチャコンポーネントを設計し、パラメータをはるかに少なくする。我々は、標準のPyTorch勾配よりも、提案した成分をより正確に効率的に最適化するためのカスタム勾配定式化を導出する。当社のコンポーネントは,Transformerベースのモデルにすでに導入されているトレーニングや推論手順の変更をほとんど必要とせずに,プラグインの置き換えです。 GPT2-xl(1.5B)とLlama3(3.2B)という2つの非常に異なるトランスフォーマーアーキテクチャを実験し、前者はOpenWebTextとGneissWebで事前トレーニングを行い、後者は9つの異なるデータセットからなるドクリングデータミックスで事前トレーニングする。その結果, 下流分類, Q\&A, 推論, テキスト理解タスクのパフォーマンスは競争力があり, 時として, パラメータが$\frac{2}{3}$から$\frac{1}{2}$$よりも優れ, 事前学習時間も短いことがわかった。ハードウェアにカスタマイズされた将来の実装は、私たちのアーキテクチャの真の可能性をさらに引き出すだろうと信じています。

論文の概要: CoFrGeNet: Continued Fraction Architectures for Language Generation

関連論文リスト