Fugu-MT 論文翻訳(概要): DiBA: Diagonal and Binary Matrix Approximation for Neural Network Weight Compression

論文の概要: DiBA: Diagonal and Binary Matrix Approximation for Neural Network Weight Compression

arxiv url: http://arxiv.org/abs/2605.05994v1
Date: Thu, 07 May 2026 10:46:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-08 22:27:11.704486
Title: DiBA: Diagonal and Binary Matrix Approximation for Neural Network Weight Compression
Title（参考訳）: DiBA: ニューラルネットワークの重み圧縮のための対角行列近似
Authors: Nobutaka Ono,
Abstract要約: 線形層、1時間1ドルの畳み込み、アテンションプロジェクション、埋め込み層を含む現代のネットワークの多くのコンポーネントは、密度の高い行列重みを持つ。 DiBAは$AinmathbbRmtimes n$ by $widehat A=D_1B_DBD_3$を近似する。 DiBARD (DiBA with Retuning only Diagonal Factor) は、密度の高い行列層をDiBA因子で置き換え、バイナリ行列を凍結し、下流データ上の対角成分のみをリチューニングする。
参考スコア（独自算出の注目度）: 8.81314696375596
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we propose DiBA (Diagonal and Binary Matrix Approximation), a compact matrix factorization for neural network weight compression. Many components of modern networks, including linear layers, $1\times1$ convolutions, attention projections, and embedding layers, have dense matrix weights. DiBA approximates $A\in\mathbb{R}^{m\times n}$ by $\widehat A=D_1B_1D_2B_2D_3$, where $D_1,D_2,D_3$ are diagonal matrices and $B_1,B_2$ are $0/1$ binary matrices. The intermediate dimension $k$ controls the trade-off between theoretical storage and approximation accuracy. For matrix-vector products, DiBA decomposes dense multiplication into three element-wise scaling operations and two binary mixing operations, reducing the floating-point multiplication count from $mn$ to $m+k+n$. For optimization, we introduce DiBA-Greedy, an alternating solver that combines closed-form least-squares updates for the diagonal factors with exact one-bit improvement tests for the binary factors. We also introduce DiBARD (DiBA with Retuning only Diagonal factors), which replaces dense-matrix layers by DiBA factors, freezes the binary matrices, and retunes only the diagonal entries on downstream data. This preserves compact binary mixing without discrete search during adaptation. On 40 dense weight matrices extracted from public pretrained models, DiBA-Greedy yields consistent SNR improvements as the theoretical storage ratio increases. After DiBA replacement in two component-replacement studies, DiBARD improves DistilBERT/WikiText masked-token accuracy from 0.4447 to 0.5210 and Speech Commands test accuracy for an Audio Spectrogram Transformer from 0.7684 to 0.9781 without reoptimizing the binary factors.
Abstract（参考訳）: 本稿では、ニューラルネットワークの重み圧縮のためのコンパクト行列分解法であるDiBA(Diagonal and Binary Matrix Approximation)を提案する。線形層、$1\times1$畳み込み、アテンションプロジェクション、埋め込み層を含む現代のネットワークの多くのコンポーネントは、密度の高い行列重みを持つ。 DiBAは$A\in\mathbb{R}^{m\times n}$ by $\widehat A=D_1B_1D_2B_2D_3$, ここで$D_1,D_2,D_3$は対角行列、$B_1,B_2$は$0/1$バイナリ行列である。中間次元$k$は理論記憶と近似精度の間のトレードオフを制御する。行列ベクトル積に対して、DiBAは密乗を3つの要素スケール演算と2つのバイナリミキシング演算に分解し、浮動小数点乗算数を$mn$から$m+k+n$に減らす。最適化のために、直交型最小二乗の更新を二乗係数の正確な1ビット改善テストと組み合わせた交互化解法であるDiBA-Greedyを導入する。また、DiBARD (DiBA with Retuning only Diagonal Factor) を導入し、密度行列層をDiBA因子で置き換え、バイナリ行列を凍結し、下流データ上の対角成分のみをリチューンする。これにより、適応中に離散探索することなく、コンパクトなバイナリミキシングが保たれる。一般の事前訓練モデルから抽出した40の高密度重量行列では, 理論記憶比が増加するにつれて, DiBA-Greedyは一貫したSNRの改善をもたらす。 DiBA を2つのコンポーネント置換研究で置き換えた後、DiBARD は DistilBERT/WikiText のマスク入力精度を 0.4447 から 0.5210 に改善し、音声スペクトログラム変換器の音声コマンドを 0.7684 から 0.9781 に再最適化することなくテスト精度を 0.7684 から 0.9781 に改善した。

論文の概要: DiBA: Diagonal and Binary Matrix Approximation for Neural Network Weight Compression

関連論文リスト