Related papers: Hyper-Compression: Model Compression via Hyperfunction

Hyper-Compression: Model Compression via Hyperfunction

URL: http://arxiv.org/abs/2409.00592v2
Date: Sat, 14 Dec 2024 07:52:04 GMT
Title: Hyper-Compression: Model Compression via Hyperfunction
Authors: Fenglei Fan, Juntong Fan, Dayang Wang, Jingbo Zhang, Zelin Dong, Shijun Zhang, Ge Wang, Tieyong Zeng,
Abstract summary: We propose the so-called hyper-compression, inspired by the parsimonious relationship between genotype and phenotype.<n>It compresses LLaMA2-7B in an hour and achieves close-to-int4-quantization performance, without retraining.<n>Our work can facilitate the harmony between the scaling law and the stagnation of hardware upgradation.
Score: 20.47369296713829
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rapid growth of large models' size has far outpaced that of GPU memory. To bridge this gap, inspired by the parsimonious relationship between genotype and phenotype, we turn the model compression problem into the issue of parameter representation to propose the so-called hyper-compression. The hyper-compression uses a hyperfunction to represent the parameters of the target network per ergodic theory, that addresses the following approximation problem: if a low-dimensional dynamic system can fill the high-dimensional space eventually. Empirically, the proposed hyper-compression enjoys the following merits: 1) \textbf{P}referable compression ratio; 2) \textbf{N}o post-hoc retraining; 3) \textbf{A}ffordable inference time; and 4) \textbf{S}hort compression time. It compresses LLaMA2-7B in an hour and achieves close-to-int4-quantization performance, without retraining and with a performance drop of less than 1\%. Our work can facilitate the harmony between the scaling law and the stagnation of hardware upgradation in terms of saving both computation and data. We have open-sourced our \href{https://github.com/Juntongkuki/Hyper-Compression.git}{code} for readers' free download and evaluation.

Related papers

A General Error-Theoretical Analysis Framework for Constructing Compression Strategies [3.1316260533944007]
We propose a Compression Error Theory (CET) framework to determine the optimal compression level for each layer. Specifically, on the ResNet-34 model, CET achieves nearly 11$times$ parameter compression while even surpassing performance comparable to the original model.
arXiv Detail & Related papers (2025-02-19T06:12:43Z)
Robust and Transferable Backdoor Attacks Against Deep Image Compression With Selective Frequency Prior [118.92747171905727]
This paper introduces a novel frequency-based trigger injection model for launching backdoor attacks with multiple triggers on learned image compression models. We design attack objectives tailored to diverse scenarios, including: 1) degrading compression quality in terms of bit-rate and reconstruction accuracy; 2) targeting task-driven measures like face recognition and semantic segmentation. Experiments show that our trigger injection models, combined with minor modifications to encoder parameters, successfully inject multiple backdoors and their triggers into a single compression model.
arXiv Detail & Related papers (2024-12-02T15:58:40Z)
EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search [33.86918407429272]
We propose a new and general approach for dynamic compression that is provably optimal in a given input range. We show that these theoretical guarantees lead to highly competitive practical performance for dynamic compression of Llama, Mistral and Phi models.
arXiv Detail & Related papers (2024-10-18T17:46:37Z)
Fast Feedforward 3D Gaussian Splatting Compression [55.149325473447384]
3D Gaussian Splatting (FCGS) is an optimization-free model that can compress 3DGS representations rapidly in a single feed-forward pass. FCGS achieves a compression ratio of over 20X while maintaining fidelity, surpassing most per-scene SOTA optimization-based methods.
arXiv Detail & Related papers (2024-10-10T15:13:08Z)
MCNC: Manifold-Constrained Reparameterization for Neural Compression [21.70510507535041]
We present a novel model compression method, which we term Manifold-Constrained Neural Compression (MCNC) By constraining the parameter space to our proposed manifold, we can identify high-quality solutions. Our method significantly outperforms state-of-the-art baselines in terms of compression, accuracy, and/or model reconstruction time.
arXiv Detail & Related papers (2024-06-27T16:17:26Z)
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression [43.048684907893104]
This paper focuses on task-agnostic prompt compression for better generalizability and efficiency. We formulate prompt compression as a token classification problem to guarantee the faithfulness of the compressed prompt to the original one. Our approach leads to lower latency by explicitly learning the compression objective with smaller models such as XLM-RoBERTa-large and mBERT.
arXiv Detail & Related papers (2024-03-19T17:59:56Z)
"Lossless" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach [49.744093838327615]
We provide a novel compression approach to wide and fully-connected emphdeep neural nets. Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme.
arXiv Detail & Related papers (2024-03-01T03:46:28Z)
Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence. We find that gradients require milder compression rates than activations. Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z)
Ultra Dual-Path Compression For Joint Echo Cancellation And Noise Suppression [38.09558772881095]
Under fixed compression ratios, dual-path compression combining both the time and frequency methods will give further performance improvement. Proposed models show competitive performance compared with fast FullSubNet and DeepNetFilter.
arXiv Detail & Related papers (2023-08-21T21:36:56Z)
DiffRate : Differentiable Compression Rate for Efficient Vision Transformers [98.33906104846386]
Token compression aims to speed up large-scale vision transformers (e.g. ViTs) by pruning (dropping) or merging tokens. DiffRate is a novel token compression method that has several appealing properties prior arts do not have.
arXiv Detail & Related papers (2023-05-29T10:15:19Z)
Generative Adversarial Networks for Spatio-Spectral Compression of Hyperspectral Images [5.1333521217181755]
deep learning models for the compression of hyperspectral images (HSIs) We introduce two new models: HiFiC using Squeeze and Excitation (SE) blocks (denoted as HiFi_CSESE$); and HiFiC with 3DSSCs (denoted as HiFiC_3D$) in the framework of compression HSIs.
arXiv Detail & Related papers (2023-05-15T10:23:14Z)
Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression [151.3826781154146]
latent variables with priors and hyperpriors is an essential problem in variational image compression. We find inter-correlations and intra-correlations exist when observing latent variables in a vectorized perspective. Our model has better rate-distortion performance and an impressive $3.18times$ compression speed up.
arXiv Detail & Related papers (2022-03-21T11:44:17Z)
Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models. We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z)
Reliable Model Compression via Label-Preservation-Aware Loss Functions [14.368823297066276]
We present a framework that uses a teacher-student learning paradigm to better preserve labels. We obtain a significant reduction of up to 4.1X in the number of mismatches between the compressed and reference models.
arXiv Detail & Related papers (2020-12-03T00:00:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.