Fugu-MT 論文翻訳(概要): BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion

論文の概要: BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion

arxiv url: http://arxiv.org/abs/2605.11577v1
Date: Tue, 12 May 2026 06:02:59 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-13 21:48:56.616082
Title: BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion
Title（参考訳）: BitLM: Bitwise Continuous Diffusionによるマルチトークン言語生成のアンロック
Authors: Shaobin Zhuang, Yuang Ai, Jiaming Han, Xiaohui Li, Huaibo Huang, Xiangyu Yue, Xuefeng Hu, Kun Xu, Yali Wang, Hao Chen,
Abstract要約: BitLMは、各トークンを固定長バイナリコードとして表現する言語モデルである。大語彙のソフトマックスをビットワイズデノイングに置き換えることで、BitLMはトークン生成をコンパクトなバイナリ空間における反復的なコミットメントとして再設定する。
参考スコア（独自算出の注目度）: 47.110252169791075
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autoregressive language models generate text one token at a time, yet natural language is inherently structured in multi-token units, including phrases, n-grams, and collocations that carry meaning jointly. This one-token bottleneck limits both the expressiveness of the model during pre-training and its throughput at inference time. Existing remedies such as speculative decoding or diffusion-based language models either leave the underlying bottleneck intact or sacrifice the causal structure essential to language modeling. We propose BitLM, a language model that represents each token as a fixed-length binary code and employs a lightweight diffusion head to denoise multiple tokens in parallel within each block. Crucially, BitLM preserves left-to-right causal attention across blocks while making joint lexical decisions within each block, combining the reliability of autoregressive modeling with the parallelism of iterative refinement. By replacing the large-vocabulary softmax with bitwise denoising, BitLM reframes token generation as iterative commitment in a compact binary space, enabling more efficient pre-training and substantially faster inference without altering the causal foundation that makes language models effective. Our results demonstrate that the one-token-at-a-time paradigm is not a fundamental requirement but an interface choice, and that changing it can yield a stronger and faster language model. We hope BitLM points toward a promising direction for next-generation language model architectures.
Abstract（参考訳）: 自動回帰言語モデルは一度に1つのトークンをテキストとして生成するが、自然言語は本質的に、フレーズ、n-gram、コロケーションなど、複数のトークン単位で構成されている。この一点のボトルネックは、事前トレーニング中のモデルの表現力と、推論時のスループットの両方を制限する。投機的復号化や拡散に基づく言語モデルのような既存の治療法は、根底にあるボトルネックをそのまま残すか、言語モデリングに不可欠な因果構造を犠牲にする。本稿では,各トークンを固定長バイナリコードとして表現する言語モデルBitLMを提案する。重要なことは、BitLMはブロック間の左から右への因果的注意を保ちながら、各ブロック内の連接的な語彙決定を行い、自己回帰モデリングの信頼性と反復的洗練の並列性を組み合わせたものである。 BitLMは、大語彙のソフトマックスをビット単位のデノーミングに置き換えることで、トークン生成をコンパクトなバイナリ空間における反復的なコミットメントとして再設定し、言語モデルを効果的にするための因果基盤を変更することなく、より効率的な事前学習と実質的に高速な推論を可能にする。本稿の結果から,1-token-at-a-timeパラダイムは基本的な要件ではなくインターフェースの選択であり,その変更によってより強く,より高速な言語モデルが得られることが示された。 BitLMが次世代の言語モデルアーキテクチャにとって有望な方向に向かっていることを願っています。

論文の概要: BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion

関連論文リスト