Related papers: Block-wise Dynamic Sparseness

Block-wise Dynamic Sparseness

URL: http://arxiv.org/abs/2001.04686v1
Date: Tue, 14 Jan 2020 10:03:21 GMT
Title: Block-wise Dynamic Sparseness
Authors: Amir Hadifar, Johannes Deleu, Chris Develder, and Thomas Demeester
Abstract summary: We present a new method for emphdynamic sparseness, whereby part of the computations are omitted dynamically, based on the input. Our method achieves similar language modeling perplexities as the dense baseline, at half the computational cost at inference time.
Score: 20.801638768447948
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural networks have achieved state of the art performance across a wide variety of machine learning tasks, often with large and computation-heavy models. Inducing sparseness as a way to reduce the memory and computation footprint of these models has seen significant research attention in recent years. In this paper, we present a new method for \emph{dynamic sparseness}, whereby part of the computations are omitted dynamically, based on the input. For efficiency, we combined the idea of dynamic sparseness with block-wise matrix-vector multiplications. In contrast to static sparseness, which permanently zeroes out selected positions in weight matrices, our method preserves the full network capabilities by potentially accessing any trained weights. Yet, matrix vector multiplications are accelerated by omitting a pre-defined fraction of weight blocks from the matrix, based on the input. Experimental results on the task of language modeling, using recurrent and quasi-recurrent models, show that the proposed method can outperform a magnitude-based static sparseness baseline. In addition, our method achieves similar language modeling perplexities as the dense baseline, at half the computational cost at inference time.

Related papers

An Efficient Training Algorithm for Models with Block-wise Sparsity [6.882042556551613]
We propose an efficient training algorithm to decrease both computation and memory costs during training and inference. Our algorithms can decrease the computation and memory costs significantly without a performance drop compared to baselines.
arXiv Detail & Related papers (2025-03-27T19:14:27Z)
Data-freeWeight Compress and Denoise for Large Language Models [101.53420111286952]
We propose a novel approach termed Data-free Joint Rank-k Approximation for compressing the parameter matrices. We achieve a model pruning of 80% parameters while retaining 93.43% of the original performance without any calibration data.
arXiv Detail & Related papers (2024-02-26T05:51:47Z)
An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks. The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions. We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z)
Randomized Polar Codes for Anytime Distributed Machine Learning [66.46612460837147]
We present a novel distributed computing framework that is robust to slow compute nodes, and is capable of both approximate and exact computation of linear operations. We propose a sequential decoding algorithm designed to handle real valued data while maintaining low computational complexity for recovery. We demonstrate the potential applications of this framework in various contexts, such as large-scale matrix multiplication and black-box optimization.
arXiv Detail & Related papers (2023-09-01T18:02:04Z)
Distributive Pre-Training of Generative Modeling Using Matrix-Product States [0.0]
We consider an alternative training scheme utilizing basic tensor network operations, e.g., summation and compression. The training algorithm is based on compressing the superposition state constructed from all the training data in product state representation. We benchmark the algorithm on the MNIST dataset and show reasonable results for generating new images and classification tasks.
arXiv Detail & Related papers (2023-06-26T15:46:08Z)
Look-ups are not (yet) all you need for deep learning inference [0.0]
Fast approximations to matrix multiplication have the potential to dramatically reduce the cost of neural network inference. Recent work on approximate matrix multiplication proposed to replace costly multiplications with table-lookups by fitting a fast hash function from training data. In this work, we propose improvements to this previous work, targeted to the deep learning inference setting.
arXiv Detail & Related papers (2022-07-12T19:46:23Z)
Variable Bitrate Neural Fields [75.24672452527795]
We present a dictionary method for compressing feature grids, reducing their memory consumption by up to 100x. We formulate the dictionary optimization as a vector-quantized auto-decoder problem which lets us learn end-to-end discrete neural representations in a space where no direct supervision is available.
arXiv Detail & Related papers (2022-06-15T17:58:34Z)
Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via GDPA Linearization [59.87663954467815]
Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer. In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph. Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.
arXiv Detail & Related papers (2021-09-10T07:01:15Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity [12.643043455369297]
We propose an algorithm-software co-designed pruning method that achieves latency speedups on existing dense architectures. We implement and evaluate the sparsity pattern on GPU tensor core, achieving a 1.95x speedup over the dense model.
arXiv Detail & Related papers (2020-08-29T16:27:41Z)
Relative gradient optimization of the Jacobian term in unsupervised deep learning [9.385902422987677]
Learning expressive probabilistic models correctly describing the data is a ubiquitous problem in machine learning. Deep density models have been widely used for this task, but their maximum likelihood based training requires estimating the log-determinant of the Jacobian. We propose a new approach for exact training of such neural networks.
arXiv Detail & Related papers (2020-06-26T16:41:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.