Related papers: ZipNN: Lossless Compression for AI Models

ZipNN: Lossless Compression for AI Models

URL: http://arxiv.org/abs/2411.05239v1
Date: Thu, 07 Nov 2024 23:28:23 GMT
Title: ZipNN: Lossless Compression for AI Models
Authors: Moshik Hershcovitch, Andrew Wood, Leshem Choshen, Guy Girmonsky, Roy Leibovitz, Ilias Ennmouri, Michal Malka, Peter Chin, Swaminathan Sundararaman, Danny Harnik,
Abstract summary: We present ZipNN a lossless compression tailored to neural networks. On popular models (e.g. Llama 3) ZipNN shows space savings that are over 17% better than vanilla compression. We estimate that these methods could save over an ExaByte per month of network traffic downloaded from a large model hub like Hugging Face.
Score: 10.111136691015554
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the growth of model sizes and the scale of their deployment, their sheer size burdens the infrastructure requiring more network and more storage to accommodate these. While there is a vast model compression literature deleting parts of the model weights for faster inference, we investigate a more traditional type of compression - one that represents the model in a compact form and is coupled with a decompression algorithm that returns it to its original form and size - namely lossless compression. We present ZipNN a lossless compression tailored to neural networks. Somewhat surprisingly, we show that specific lossless compression can gain significant network and storage reduction on popular models, often saving 33% and at times reducing over 50% of the model size. We investigate the source of model compressibility and introduce specialized compression variants tailored for models that further increase the effectiveness of compression. On popular models (e.g. Llama 3) ZipNN shows space savings that are over 17% better than vanilla compression while also improving compression and decompression speeds by 62%. We estimate that these methods could save over an ExaByte per month of network traffic downloaded from a large model hub like Hugging Face.

Related papers

Huff-LLM: End-to-End Lossless Compression for Efficient LLM Inference [19.59857352852377]
Large language models (LLMs) have continued to rapidly increase in size. This has exacerbated the difficulty in running state of the art LLMs on small, edge devices. We propose Huff-LLM, a method that lets users store LLM weights in compressed format.
arXiv Detail & Related papers (2025-02-02T21:23:42Z)
Fast Feedforward 3D Gaussian Splatting Compression [55.149325473447384]
3D Gaussian Splatting (FCGS) is an optimization-free model that can compress 3DGS representations rapidly in a single feed-forward pass. FCGS achieves a compression ratio of over 20X while maintaining fidelity, surpassing most per-scene SOTA optimization-based methods.
arXiv Detail & Related papers (2024-10-10T15:13:08Z)
Unified Low-rank Compression Framework for Click-through Rate Prediction [15.813889566241539]
We propose a unified low-rank decomposition framework for compressing CTR prediction models. Our framework can achieve better performance than the original model. Our framework can be applied to embedding tables and layers in various CTR prediction models.
arXiv Detail & Related papers (2024-05-28T13:06:32Z)
Lossless and Near-Lossless Compression for Foundation Models [11.307357041746865]
We investigate the source of model compressibility, introduce compression variants tailored for models and categorize models to compressibility groups. We estimate that these methods could save over an ExaByte per month of network traffic downloaded from a large model hub like HuggingFace.
arXiv Detail & Related papers (2024-04-05T16:52:55Z)
Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence. We find that gradients require milder compression rates than activations. Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z)
Ultra Dual-Path Compression For Joint Echo Cancellation And Noise Suppression [38.09558772881095]
Under fixed compression ratios, dual-path compression combining both the time and frequency methods will give further performance improvement. Proposed models show competitive performance compared with fast FullSubNet and DeepNetFilter.
arXiv Detail & Related papers (2023-08-21T21:36:56Z)
Lossy and Lossless (L$^2$) Post-training Model Size Compression [12.926354646945397]
We propose a post-training model size compression method that combines lossy and lossless compression in a unified way. Our method can achieve a stable $10times$ compression ratio without sacrificing accuracy and a $20times$ compression ratio with minor accuracy loss in a short time.
arXiv Detail & Related papers (2023-08-08T14:10:16Z)
Deep Lossy Plus Residual Coding for Lossless and Near-lossless Image Compression [85.93207826513192]
We propose a unified and powerful deep lossy plus residual (DLPR) coding framework for both lossless and near-lossless image compression. We solve the joint lossy and residual compression problem in the approach of VAEs. In the near-lossless mode, we quantize the original residuals to satisfy a given $ell_infty$ error bound.
arXiv Detail & Related papers (2022-09-11T12:11:56Z)
PILC: Practical Image Lossless Compression with an End-to-end GPU Oriented Neural Framework [88.18310777246735]
We propose an end-to-end image compression framework that achieves 200 MB/s for both compression and decompression with a single NVIDIA Tesla V100 GPU. Experiments show that our framework compresses better than PNG by a margin of 30% in multiple datasets.
arXiv Detail & Related papers (2022-06-10T03:00:10Z)
Estimating the Resize Parameter in End-to-end Learned Image Compression [50.20567320015102]
We describe a search-free resizing framework that can further improve the rate-distortion tradeoff of recent learned image compression models. Our results show that our new resizing parameter estimation framework can provide Bjontegaard-Delta rate (BD-rate) improvement of about 10% against leading perceptual quality engines.
arXiv Detail & Related papers (2022-04-26T01:35:02Z)
Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models. We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.