Gibbs randomness-compression proposition: An efficient deep learning
- URL: http://arxiv.org/abs/2505.23869v2
- Date: Mon, 30 Jun 2025 12:42:09 GMT
- Title: Gibbs randomness-compression proposition: An efficient deep learning
- Authors: M. Süzen,
- Abstract summary: We introduce it Gibbs randomness-compression proposition, signifying randomness-compression relationship via Gibbs entropy.<n>The DTC framework provides a promising approach for massively energy- and resource-efficient deep learning training.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A proposition that connects randomness and compression is put forward via Gibbs entropy over set of measurement vectors associated with a compression process. The proposition states that a lossy compression process is equivalent to {\it directed randomness} that preserves information content. The proposition originated from the observed behaviour in newly proposed {\it Dual Tomographic Compression} (DTC) compress-train framework. This is akin to tomographic reconstruction of layer weight matrices via building compressed sensed projections, via so-called {\it weight rays}. This tomographic approach is applied to previous and next layers in a dual fashion, that triggers neuronal-level pruning. This novel model compress-train scheme appears in iterative fashion and acts as a smart neural architecture search, The experiments demonstrated the utility of this dual-tomography producing state-of-the-art performance with efficient compression during training, accelerating and supporting lottery ticket hypothesis. However, random compress-train iterations having similar performance demonstrated the connection between randomness and compression from statistical physics perspective, we formulated the so-called {\it Gibbs randomness-compression proposition}, signifying randomness-compression relationship via Gibbs entropy. Practically, the DTC framework provides a promising approach for massively energy- and resource-efficient deep learning training.
Related papers
- Unified Scaling Laws for Compressed Representations [69.72517034565467]
We investigate whether a unified scaling framework can accurately predict model performance when training occurs over various compressed representations.<n>Our main finding is demonstrating both theoretically and empirically that there exists a simple "capacity" metric.<n>We extend our formulation to directly compare the accuracy potential of different compressed formats, and to derive better algorithms for training over sparse-quantized formats.
arXiv Detail & Related papers (2025-06-02T16:52:51Z) - Problem-dependent convergence bounds for randomized linear gradient compression [4.656302602746228]
We study the impact of compression on convergence associated with the objective function.<n>We find that the impact of compression on convergence can be in terms of a quantified matrix associated with the objective function.
arXiv Detail & Related papers (2024-11-19T22:26:42Z) - Hyper-Compression: Model Compression via Hyperfunction [20.47369296713829]
We propose the so-called hyper-compression that turns the model compression into the issue of parameter representation via a hyperfunction.<n>This suggests a novel mechanism for model compression, substantially different from the existing pruning, quantization, distillation, and decomposition.<n>We show that hyper-compression enjoys the following textbfPNAS merits: 1) textbfPreferable compression ratio; 2) textbfNo post-hoc retraining; 3) textbfAffordable inference time; and 4) textbfShort compression time
arXiv Detail & Related papers (2024-09-01T02:57:41Z) - Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework.
Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z) - Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence.
We find that gradients require milder compression rates than activations.
Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z) - Implicit Compressibility of Overparametrized Neural Networks Trained
with Heavy-Tailed SGD [31.61477313262589]
We consider a one-hidden-layer neural network trained with gradient descent (SGD)
We show that if we inject additive heavy-tailed noise to the iterates at each, for any compression rate, there exists a level of overparametrization such that the output of the algorithm will be compressible with high probability.
arXiv Detail & Related papers (2023-06-13T20:37:02Z) - Unified Multivariate Gaussian Mixture for Efficient Neural Image
Compression [151.3826781154146]
latent variables with priors and hyperpriors is an essential problem in variational image compression.
We find inter-correlations and intra-correlations exist when observing latent variables in a vectorized perspective.
Our model has better rate-distortion performance and an impressive $3.18times$ compression speed up.
arXiv Detail & Related papers (2022-03-21T11:44:17Z) - Post-Training Quantization for Cross-Platform Learned Image Compression [15.67527732099067]
It has been witnessed that learned image compression has outperformed conventional image coding techniques.
One of the most critical issues that need to be considered is the non-deterministic calculation.
We propose to solve this problem by introducing well-developed post-training quantization.
arXiv Detail & Related papers (2022-02-15T15:41:12Z) - On Effects of Compression with Hyperdimensional Computing in Distributed
Randomized Neural Networks [6.25118865553438]
We propose a model for distributed classification based on randomized neural networks and hyperdimensional computing.
In this work, we propose a more flexible approach to compression and compare it to conventional compression algorithms, dimensionality reduction, and quantization techniques.
arXiv Detail & Related papers (2021-06-17T22:02:40Z) - An Efficient Statistical-based Gradient Compression Technique for
Distributed Training Systems [77.88178159830905]
Sparsity-Inducing Distribution-based Compression (SIDCo) is a threshold-based sparsification scheme that enjoys similar threshold estimation quality to deep gradient compression (DGC)
Our evaluation shows SIDCo speeds up training by up to 41:7%, 7:6%, and 1:9% compared to the no-compression baseline, Topk, and DGC compressors, respectively.
arXiv Detail & Related papers (2021-01-26T13:06:00Z) - Linear Convergent Decentralized Optimization with Compression [50.44269451541387]
Existing decentralized algorithms with compression mainly focus on compressing DGD-type algorithms.
Motivated by primal-dual algorithms, this paper proposes first underlineLinunderlineEAr convergent.
underlineDecentralized with compression, LEAD.
arXiv Detail & Related papers (2020-07-01T04:35:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.