Block-wise Dynamic Sparseness
- URL: http://arxiv.org/abs/2001.04686v1
- Date: Tue, 14 Jan 2020 10:03:21 GMT
- Title: Block-wise Dynamic Sparseness
- Authors: Amir Hadifar, Johannes Deleu, Chris Develder, and Thomas Demeester
- Abstract summary: We present a new method for emphdynamic sparseness, whereby part of the computations are omitted dynamically, based on the input.
Our method achieves similar language modeling perplexities as the dense baseline, at half the computational cost at inference time.
- Score: 20.801638768447948
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural networks have achieved state of the art performance across a wide
variety of machine learning tasks, often with large and computation-heavy
models. Inducing sparseness as a way to reduce the memory and computation
footprint of these models has seen significant research attention in recent
years. In this paper, we present a new method for \emph{dynamic sparseness},
whereby part of the computations are omitted dynamically, based on the input.
For efficiency, we combined the idea of dynamic sparseness with block-wise
matrix-vector multiplications. In contrast to static sparseness, which
permanently zeroes out selected positions in weight matrices, our method
preserves the full network capabilities by potentially accessing any trained
weights. Yet, matrix vector multiplications are accelerated by omitting a
pre-defined fraction of weight blocks from the matrix, based on the input.
Experimental results on the task of language modeling, using recurrent and
quasi-recurrent models, show that the proposed method can outperform a
magnitude-based static sparseness baseline. In addition, our method achieves
similar language modeling perplexities as the dense baseline, at half the
computational cost at inference time.
Related papers
- An Efficient Training Algorithm for Models with Block-wise Sparsity [6.882042556551613]
We propose an efficient training algorithm to decrease both computation and memory costs during training and inference.
Our algorithms can decrease the computation and memory costs significantly without a performance drop compared to baselines.
arXiv Detail & Related papers (2025-03-27T19:14:27Z) - Data-freeWeight Compress and Denoise for Large Language Models [101.53420111286952]
We propose a novel approach termed Data-free Joint Rank-k Approximation for compressing the parameter matrices.
We achieve a model pruning of 80% parameters while retaining 93.43% of the original performance without any calibration data.
arXiv Detail & Related papers (2024-02-26T05:51:47Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - Randomized Polar Codes for Anytime Distributed Machine Learning [66.46612460837147]
We present a novel distributed computing framework that is robust to slow compute nodes, and is capable of both approximate and exact computation of linear operations.
We propose a sequential decoding algorithm designed to handle real valued data while maintaining low computational complexity for recovery.
We demonstrate the potential applications of this framework in various contexts, such as large-scale matrix multiplication and black-box optimization.
arXiv Detail & Related papers (2023-09-01T18:02:04Z) - Distributive Pre-Training of Generative Modeling Using Matrix-Product
States [0.0]
We consider an alternative training scheme utilizing basic tensor network operations, e.g., summation and compression.
The training algorithm is based on compressing the superposition state constructed from all the training data in product state representation.
We benchmark the algorithm on the MNIST dataset and show reasonable results for generating new images and classification tasks.
arXiv Detail & Related papers (2023-06-26T15:46:08Z) - Look-ups are not (yet) all you need for deep learning inference [0.0]
Fast approximations to matrix multiplication have the potential to dramatically reduce the cost of neural network inference.
Recent work on approximate matrix multiplication proposed to replace costly multiplications with table-lookups by fitting a fast hash function from training data.
In this work, we propose improvements to this previous work, targeted to the deep learning inference setting.
arXiv Detail & Related papers (2022-07-12T19:46:23Z) - Variable Bitrate Neural Fields [75.24672452527795]
We present a dictionary method for compressing feature grids, reducing their memory consumption by up to 100x.
We formulate the dictionary optimization as a vector-quantized auto-decoder problem which lets us learn end-to-end discrete neural representations in a space where no direct supervision is available.
arXiv Detail & Related papers (2022-06-15T17:58:34Z) - Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via
GDPA Linearization [59.87663954467815]
Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer.
In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph.
Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.
arXiv Detail & Related papers (2021-09-10T07:01:15Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise
Sparsity [12.643043455369297]
We propose an algorithm-software co-designed pruning method that achieves latency speedups on existing dense architectures.
We implement and evaluate the sparsity pattern on GPU tensor core, achieving a 1.95x speedup over the dense model.
arXiv Detail & Related papers (2020-08-29T16:27:41Z) - Relative gradient optimization of the Jacobian term in unsupervised deep
learning [9.385902422987677]
Learning expressive probabilistic models correctly describing the data is a ubiquitous problem in machine learning.
Deep density models have been widely used for this task, but their maximum likelihood based training requires estimating the log-determinant of the Jacobian.
We propose a new approach for exact training of such neural networks.
arXiv Detail & Related papers (2020-06-26T16:41:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.