Related papers: LCS: Learning Compressible Subspaces for Adaptive Network Compression at Inference Time

LCS: Learning Compressible Subspaces for Adaptive Network Compression at Inference Time

URL: http://arxiv.org/abs/2110.04252v1
Date: Fri, 8 Oct 2021 17:03:34 GMT
Title: LCS: Learning Compressible Subspaces for Adaptive Network Compression at Inference Time
Authors: Elvis Nunez, Maxwell Horton, Anish Prabhu, Anurag Ranjan, Ali Farhadi, Mohammad Rastegari
Abstract summary: We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models. We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity. Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
Score: 57.52251547365967
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When deploying deep learning models to a device, it is traditionally assumed that available computational resources (compute, memory, and power) remain static. However, real-world computing systems do not always provide stable resource guarantees. Computational resources need to be conserved when load from other processes is high or battery power is low. Inspired by recent works on neural network subspaces, we propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models that range from highly efficient to highly accurate. Our models require no retraining, thus our subspace of models can be deployed entirely on-device to allow adaptive network compression at inference time. We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity. We achieve accuracies on-par with standard models when testing our uncompressed models, and maintain high accuracy for sparsity rates above 90% when testing our compressed models. We also demonstrate that our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.

Related papers

Mixed Precision Training of Neural ODEs [1.3382837742547355]
This paper presents a mixed precision training framework for neural ODEs.<n>It combines explicit ODE solvers with a custom backpropagation scheme.<n>It achieves approximately 50% memory reduction and up to 2x speedup while maintaining accuracy comparable to single-precision training.
arXiv Detail & Related papers (2025-10-27T16:32:56Z)
Deep Hierarchical Learning with Nested Subspace Networks [53.71337604556311]
We propose Nested Subspace Networks (NSNs) for large neural networks.<n>NSNs enable a single model to be dynamically and granularly adjusted across a continuous spectrum of compute budgets.<n>We show that NSNs can be surgically applied to pre-trained LLMs and unlock a smooth and predictable compute-performance frontier.
arXiv Detail & Related papers (2025-09-22T15:13:14Z)
Towards Scalable Lottery Ticket Networks using Genetic Algorithms [2.9144530940712707]
This work explores the usage of genetic algorithms for identifying strong lottery ticketworks.<n>We find that for instances of binary and multi-class classification tasks, our approach achieves better accuracies and sparsity levels than the current state-of-the-art.
arXiv Detail & Related papers (2025-08-12T12:03:21Z)
Private Training & Data Generation by Clustering Embeddings [74.00687214400021]
Differential privacy (DP) provides a robust framework for protecting individual data.<n>We introduce a novel principled method for DP synthetic image embedding generation.<n> Empirically, a simple two-layer neural network trained on synthetically generated embeddings achieves state-of-the-art (SOTA) classification accuracy.
arXiv Detail & Related papers (2025-06-20T00:17:14Z)
Toward Efficient Convolutional Neural Networks With Structured Ternary Patterns [1.1965844936801797]
Convolutional neural networks (ConvNets) exert severe demands on local device resources. This brief presents work toward utilizing static convolutional filters to design efficient ConvNet architectures.
arXiv Detail & Related papers (2024-07-20T10:18:42Z)
Diffusion-Model-Assisted Supervised Learning of Generative Models for Density Estimation [10.793646707711442]
We present a framework for training generative models for density estimation. We use the score-based diffusion model to generate labeled data. Once the labeled data are generated, we can train a simple fully connected neural network to learn the generative model in the supervised manner.
arXiv Detail & Related papers (2023-10-22T23:56:19Z)
Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks. We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z)
Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST) IST is a recently proposed and highly effective technique for solving the aforementioned problems. We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z)
Robust low-rank training via approximate orthonormal constraints [2.519906683279153]
We introduce a robust low-rank training algorithm that maintains the network's weights on the low-rank matrix manifold. The resulting model reduces both training and inference costs while ensuring well-conditioning and thus better adversarial robustness, without compromising model accuracy.
arXiv Detail & Related papers (2023-06-02T12:22:35Z)
Deep learning model compression using network sensitivity and gradients [3.52359746858894]
We present model compression algorithms for both non-retraining and retraining conditions. In the first case, we propose the Bin & Quant algorithm for compression of the deep learning models using the sensitivity of the network parameters. In the second case, we propose our novel gradient-weighted k-means clustering algorithm (GWK)
arXiv Detail & Related papers (2022-10-11T03:02:40Z)
AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural Networks [78.62086125399831]
We present a general approach called Alternating Compressed/DeCompressed (AC/DC) training of deep neural networks (DNNs) AC/DC outperforms existing sparse training methods in accuracy at similar computational budgets. An important property of AC/DC is that it allows co-training of dense and sparse models, yielding accurate sparse-dense model pairs at the end of the training process.
arXiv Detail & Related papers (2021-06-23T13:23:00Z)
Training Deep Neural Networks with Constrained Learning Parameters [4.917317902787792]
A significant portion of deep learning tasks would run on edge computing systems. We propose the Combinatorial Neural Network Training Algorithm (CoNNTrA) CoNNTrA trains deep learning models with ternary learning parameters on the MNIST, Iris and ImageNet data sets. Our results indicate that CoNNTrA models use 32x less memory and have errors at par with the Backpropagation models.
arXiv Detail & Related papers (2020-09-01T16:20:11Z)
Large-Scale Gradient-Free Deep Learning with Recursive Local Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources. Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize. We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.