What Operations can be Performed Directly on Compressed Arrays, and with What Error?
- URL: http://arxiv.org/abs/2406.11209v1
- Date: Mon, 17 Jun 2024 05:01:09 GMT
- Title: What Operations can be Performed Directly on Compressed Arrays, and with What Error?
- Authors: Tripti Agarwal, Harvey Dam, Dorra Ben Khalifa, Matthieu Martel, P. Sadayappan, Ganesh Gopalakrishnan,
- Abstract summary: We develop a lossy compressor that allows a dozen fairly fundamental operations directly on compressed data.
We evaluate it on three non-trivial applications, choosing different number systems for internal representation.
- Score: 1.3307486544794784
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In response to the rapidly escalating costs of computing with large matrices and tensors caused by data movement, several lossy compression methods have been developed to significantly reduce data volumes. Unfortunately, all these methods require the data to be decompressed before further computations are done. In this work, we develop a lossy compressor that allows a dozen fairly fundamental operations directly on compressed data while offering good compression ratios and modest errors. We implement a new compressor PyBlaz based on the familiar GPU-powered PyTorch framework, and evaluate it on three non-trivial applications, choosing different number systems for internal representation. Our results demonstrate that the compressed-domain operations achieve good scalability with problem sizes while incurring errors well within acceptable limits. To our best knowledge, this is the first such lossy compressor that supports compressed-domain operations while achieving acceptable performance as well as error.
Related papers
- Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning [63.43972993473501]
Token compression expedites the training and inference of Vision Transformers (ViTs)
However, when applied to downstream tasks, compression degrees are mismatched between training and inference stages.
We propose a model arithmetic framework to decouple the compression degrees between the two stages.
arXiv Detail & Related papers (2024-08-13T10:36:43Z) - Compression of Structured Data with Autoencoders: Provable Benefit of
Nonlinearities and Depth [83.15263499262824]
We prove that gradient descent converges to a solution that completely disregards the sparse structure of the input.
We show how to improve upon Gaussian performance for the compression of sparse data by adding a denoising function to a shallow architecture.
We validate our findings on image datasets, such as CIFAR-10 and MNIST.
arXiv Detail & Related papers (2024-02-07T16:32:29Z) - Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence.
We find that gradients require milder compression rates than activations.
Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z) - Lossy and Lossless (L$^2$) Post-training Model Size Compression [12.926354646945397]
We propose a post-training model size compression method that combines lossy and lossless compression in a unified way.
Our method can achieve a stable $10times$ compression ratio without sacrificing accuracy and a $20times$ compression ratio with minor accuracy loss in a short time.
arXiv Detail & Related papers (2023-08-08T14:10:16Z) - DiffRate : Differentiable Compression Rate for Efficient Vision
Transformers [98.33906104846386]
Token compression aims to speed up large-scale vision transformers (e.g. ViTs) by pruning (dropping) or merging tokens.
DiffRate is a novel token compression method that has several appealing properties prior arts do not have.
arXiv Detail & Related papers (2023-05-29T10:15:19Z) - Scalable Hybrid Learning Techniques for Scientific Data Compression [6.803722400888276]
Scientists require compression techniques that accurately preserve derived quantities of interest (QoIs)
This paper presents a physics-informed compression technique implemented as an end-to-end, scalable, GPU-based pipeline for data compression.
arXiv Detail & Related papers (2022-12-21T03:00:18Z) - A Fast Transformer-based General-Purpose Lossless Compressor [19.5544227045828]
We introduce transformer into deep learning compressors to build history-dependencies in parallel.
Existing transformer is too heavy in computation and incompatible to compression tasks.
Byte-grouping and Shared-ffn schemes are proposed to fully utilize the capacity of the single-layer transformer.
arXiv Detail & Related papers (2022-03-30T07:46:19Z) - Exploring Autoencoder-based Error-bounded Compression for Scientific
Data [14.724393511470225]
We develop an error-bounded autoencoder-based framework in terms of the SZ model.
We optimize the compression quality for the main stages in our designed AE-based error-bounded compression framework.
arXiv Detail & Related papers (2021-05-25T07:53:32Z) - Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models.
We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z) - Analyzing and Mitigating JPEG Compression Defects in Deep Learning [69.04777875711646]
We present a unified study of the effects of JPEG compression on a range of common tasks and datasets.
We show that there is a significant penalty on common performance metrics for high compression.
arXiv Detail & Related papers (2020-11-17T20:32:57Z) - Optimal Gradient Compression for Distributed and Federated Learning [9.711326718689492]
Communication between computing nodes in distributed learning is typically an unavoidable burden.
Recent advances in communication-efficient training algorithms have reduced this bottleneck by using compression techniques.
In this paper, we investigate the fundamental trade-off between the number of bits needed to encode compressed vectors and the compression error.
arXiv Detail & Related papers (2020-10-07T07:58:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.