Fugu-MT 論文翻訳(概要): HoloByte: Continuous Hyperspherical Distillation for Tokenizer-Free Modeling

論文の概要: HoloByte: Continuous Hyperspherical Distillation for Tokenizer-Free Modeling

arxiv url: http://arxiv.org/abs/2603.16917v1
Date: Tue, 10 Mar 2026 20:35:36 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-19 18:32:57.25288
Title: HoloByte: Continuous Hyperspherical Distillation for Tokenizer-Free Modeling
Title（参考訳）: HoloByte: トケナイザーフリーモデリングのための超球面連続蒸留
Authors: Vladimer Khasia,
Abstract要約: TextbfHoloByte: 連続超球形蒸留を利用した厳密なトークンフリーフレームワークを紹介する。ホロバイトは離散バイト列を固定容量チャンクに分割し、連続で厳密な有界超球面多様体に射影する。これらの結果から, 連続超球形蒸留は語彙内配列モデリングの数学的に厳密で, 計算学的に抽出可能な基礎として確立された。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sequence modeling universally relies on discrete subword tokenization to circumvent the $\mathcal{O}(N^2)$ computational intractability of native byte-level attention. However, this heuristic quantization imposes artificial morphological boundaries, enforces vocabulary dependence, and fractures the continuity of the optimization landscape. To resolve this dichotomy, we introduce \textbf{HoloByte}: a strictly tokenizer-free framework utilizing Continuous Hyperspherical Distillation. HoloByte partitions discrete byte sequences into fixed-capacity chunks and projects them into a continuous, strictly bounded hyperspherical manifold via an invertible, dimension-preserving orthogonal rotation operator. This spatial superposition allows a macroscopic transformer to operate exclusively on compressed continuous representations, formally reducing the exact attention time complexity from $\mathcal{O}(N^2D)$ to $\mathcal{O}\left( \frac{N^2}{W^2}D + ND^2 \right)$. A localized causal micro-decoder subsequently unbinds these representations to compute exact byte-level distributions. To govern this continuous trajectory, we propose a dual-objective formulation incorporating a mathematically precise Holographic Latent Mean Squared Error, which strictly bounds the gradient and guarantees asymptotic stability. Theoretically, we derive the minimal embedding dimension $D = Ω(W \ln |\mathcal{V}|)$ required to ensure error-free discrete recovery from the continuous manifold. Empirically, under strictly matched parameter constraints, HoloByte is systematically outperforming a comparable discrete Byte-Pair Encoding (BPE) baseline. These results establish Continuous Hyperspherical Distillation as a mathematically rigorous and computationally tractable foundation for vocabulary-invariant sequence modeling. The code is available at https://github.com/VladimerKhasia/HoloByte
Abstract（参考訳）: シークエンスモデリングは、ネイティブバイトレベルの注意の$\mathcal{O}(N^2)$計算の難しさを回避するために、離散的なサブワードトークン化に依存している。しかし、このヒューリスティック量子化は、人工的な形態境界を課し、語彙依存を強制し、最適化景観の連続性を破る。この二分法を解決するために、連続超球形蒸留を利用した厳密なトークン化のないフレームワークである「textbf{HoloByte}」を紹介した。ホロバイトは離散バイト列を固定容量チャンクに分割し、それを可逆、次元保存直交回転作用素を通じて連続で厳密な有界超球面多様体に射影する。この空間的重ね合わせにより、マクロ変換器は圧縮された連続表現のみに作用し、正確に注意時間複雑性を$\mathcal{O}(N^2D)$から$\mathcal{O}\left( \frac{N^2}{W^2}D + ND^2 \right)$に縮める。その後、局所化された因果マイクロデコーダがこれらの表現をアンバインドして、正確なバイトレベルの分布を計算する。この連続軌道を決定するために、数学的に精密なホログラフィック潜在平均正方形誤差を組み込んだ二重目的形定式化を提案し、勾配を厳密に拘束し、漸近安定性を保証する。理論的には、連続多様体から誤差のない離散的回復を保証するために必要となる最小埋め込み次元 $D = Ω(W \ln |\mathcal{V}|)$ を導出する。経験的に、厳密なパラメータ制約の下では、HoloByteは、同等の離散Byte-Pair Encoding (BPE)ベースラインを体系的に上回っている。これらの結果は、連続超球形蒸留を語彙不変配列モデリングのための数学的に厳密で計算的に抽出可能な基礎として確立する。コードはhttps://github.com/VladimerKhasia/HoloByteで入手できる。

論文の概要: HoloByte: Continuous Hyperspherical Distillation for Tokenizer-Free Modeling

関連論文リスト