Fugu-MT 論文翻訳(概要): Micro-Diffusion Compression - Binary Tree Tweedie Denoising for Online Probability Estimation

論文の概要: Micro-Diffusion Compression - Binary Tree Tweedie Denoising for Online Probability Estimation

arxiv url: http://arxiv.org/abs/2603.08771v2
Date: Wed, 11 Mar 2026 16:12:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-12 14:12:44.247015
Title: Micro-Diffusion Compression - Binary Tree Tweedie Denoising for Online Probability Estimation
Title（参考訳）: 微小拡散圧縮 -オンライン確率推定のための二分木ツイーディ
Authors: Roberto Tacconelli,
Abstract要約: 適応統計モデルにより生成される確率推定を改善するために, マイクロ拡散復調層を導入する。 Midicothは、適応型PPMモデル、長距離マッチングモデル、トレーベースワードモデル、高階コンテキストモデル、マイクロ拡散デノイザの5つの完全オンラインコンポーネントを最終段階として組み合わせている。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present Midicoth, a lossless compression system that introduces a micro-diffusion denoising layer for improving probability estimates produced by adaptive statistical models. In compressors such as Prediction by Partial Matching (PPM), probability estimates are smoothed by a prior to handle sparse observations. When contexts have been seen only a few times, this prior dominates the prediction and produces distributions that are significantly flatter than the true source distribution, leading to compression inefficiency. Midicoth addresses this limitation by treating prior smoothing as a shrinkage process and applying a reverse denoising step that corrects predicted probabilities using empirical calibration statistics. To make this correction data-efficient, the method decomposes each byte prediction into a hierarchy of binary decisions along a bitwise tree. This converts a single 256-way calibration problem into a sequence of binary calibration tasks, enabling reliable estimation of correction terms from relatively small numbers of observations. The denoising process is applied in multiple successive steps, allowing each stage to refine residual prediction errors left by the previous one. The micro-diffusion layer operates as a lightweight post-blend calibration stage applied after all model predictions have been combined, allowing it to correct systematic biases in the final probability distribution. Midicoth combines five fully online components: an adaptive PPM model, a long-range match model, a trie-based word model, a high-order context model, and the micro-diffusion denoiser applied as the final stage.
Abstract（参考訳）: 適応統計モデルにより生成される確率推定を改善するためのマイクロ拡散復調層を導入する無損失圧縮システムであるMidicothを提案する。部分マッチングによる予測(PPM)のような圧縮機では、確率推定はスパース観測に先立って滑らかになる。コンテクストがほんの数回しか見られていない場合、この先行は予測を支配し、真のソース分布よりもかなり平坦な分布を生成し、圧縮の非効率をもたらす。 Midicothはこの制限に対処するため、事前平滑化を縮小過程として扱い、経験的キャリブレーション統計を用いて予測確率を補正するリバース・デノナイジング・ステップを適用する。この補正データを効率よくするために、各バイト予測をビットワイズツリーに沿ってバイナリ決定の階層に分解する。これにより、1つの256ウェイキャリブレーション問題をバイナリキャリブレーションタスクのシーケンスに変換し、比較的少数の観測結果から修正項の信頼性の高い推定を可能にする。復調処理は、複数のステップで適用され、各ステージは、前のステップが残した残差予測誤差を洗練できる。マイクロ拡散層は、全てのモデル予測が組み合わされた後に適用される軽量なポストブレンドキャリブレーション段階として機能し、最終的な確率分布の体系的バイアスを補正する。 Midicothは、適応型PPMモデル、長距離マッチングモデル、トレーベースワードモデル、高階コンテキストモデル、マイクロ拡散デノイザの5つの完全オンラインコンポーネントを最終段階として組み合わせている。

論文の概要: Micro-Diffusion Compression - Binary Tree Tweedie Denoising for Online Probability Estimation

関連論文リスト