Forget the Data and Fine-Tuning! Just Fold the Network to Compress
- URL: http://arxiv.org/abs/2502.10216v1
- Date: Fri, 14 Feb 2025 15:10:43 GMT
- Title: Forget the Data and Fine-Tuning! Just Fold the Network to Compress
- Authors: Dong Wang, Haris Šikić, Lothar Thiele, Olga Saukh,
- Abstract summary: We introduce model folding, a novel data-free model compression technique that merges structurally similar neurons across layers.
We show that model folding achieves comparable performance to data-driven compression techniques and outperforms recently proposed data-free methods.
This approach is particularly effective for compressing large-scale models, making it suitable for deployment in resource-constrained environments.
- Score: 13.611551223875194
- License:
- Abstract: We introduce model folding, a novel data-free model compression technique that merges structurally similar neurons across layers, significantly reducing the model size without the need for fine-tuning or access to training data. Unlike existing methods, model folding preserves data statistics during compression by leveraging k-means clustering, and using novel data-free techniques to prevent variance collapse or explosion. Our theoretical framework and experiments across standard benchmarks, including ResNet18 and LLaMA-7B, demonstrate that model folding achieves comparable performance to data-driven compression techniques and outperforms recently proposed data-free methods, especially at high sparsity levels. This approach is particularly effective for compressing large-scale models, making it suitable for deployment in resource-constrained environments.
Related papers
- Choose Your Model Size: Any Compression by a Single Gradient Descent [9.074689052563878]
We present Any Compression via Iterative Pruning (ACIP)
ACIP is an algorithmic approach to determine a compression-performance trade-off from a single gradient descent run.
We show that ACIP seamlessly complements common quantization-based compression techniques.
arXiv Detail & Related papers (2025-02-03T18:40:58Z) - You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning [20.62274005080048]
PruneNet is a novel model compression method that reformulates model pruning as a policy learning process.
It can compress the LLaMA-2-7B model in just 15 minutes, achieving over 80% retention of its zero-shot performance.
On complex multitask language understanding tasks, PruneNet demonstrates its robustness by preserving up to 80% performance of the original model.
arXiv Detail & Related papers (2025-01-25T18:26:39Z) - GoDe: Gaussians on Demand for Progressive Level of Detail and Scalable Compression [13.616981296093932]
We propose a novel, model-agnostic technique that organizes Gaussians into several hierarchical layers.
This method, combined with recent approach of compression of 3DGS, allows a single model to instantly scale across several compression ratios.
We validate our approach on typical datasets and benchmarks, showcasing low distortion and substantial gains in terms of scalability and adaptability.
arXiv Detail & Related papers (2025-01-23T11:05:45Z) - Accelerated Methods with Compressed Communications for Distributed Optimization Problems under Data Similarity [55.03958223190181]
We propose the first theoretically grounded accelerated algorithms utilizing unbiased and biased compression under data similarity.
Our results are of record and confirmed by experiments on different average losses and datasets.
arXiv Detail & Related papers (2024-12-21T00:40:58Z) - Convex Distillation: Efficient Compression of Deep Networks via Convex Optimization [46.18363767705346]
Deployment of large and complex convex networks on resource-constrained devices significant challenges due to their demands.
We introduce novel distillation technique that efficiently compresses model via this model via this paper.
Our approach enables performance comparable to original model without requiring any post-processing.
arXiv Detail & Related papers (2024-10-09T06:04:52Z) - Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence.
We find that gradients require milder compression rates than activations.
Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Riemannian Low-Rank Model Compression for Federated Learning with
Over-the-Air Aggregation [2.741266294612776]
Low-rank model compression is a widely used technique for reducing the computational load when training machine learning models.
Existing compression techniques are not directly applicable to efficient over-the-air (OTA) aggregation in federated learning systems.
We propose a novel manifold optimization formulation for low-rank model compression in FL that does not relax the low-rank constraint.
arXiv Detail & Related papers (2023-06-04T18:32:50Z) - DualCF: Efficient Model Extraction Attack from Counterfactual
Explanations [57.46134660974256]
Cloud service providers have launched Machine-Learning-as-a-Service platforms to allow users to access large-scale cloudbased models via APIs.
Such extra information inevitably causes the cloud models to be more vulnerable to extraction attacks.
We propose a novel simple yet efficient querying strategy to greatly enhance the querying efficiency to steal a classification model.
arXiv Detail & Related papers (2022-05-13T08:24:43Z) - ClusterQ: Semantic Feature Distribution Alignment for Data-Free
Quantization [111.12063632743013]
We propose a new and effective data-free quantization method termed ClusterQ.
To obtain high inter-class separability of semantic features, we cluster and align the feature distribution statistics.
We also incorporate the intra-class variance to solve class-wise mode collapse.
arXiv Detail & Related papers (2022-04-30T06:58:56Z) - What do Compressed Large Language Models Forget? Robustness Challenges
in Model Compression [68.82486784654817]
We study two popular model compression techniques including knowledge distillation and pruning.
We show that compressed models are significantly less robust than their PLM counterparts on adversarial test sets.
We develop a regularization strategy for model compression based on sample uncertainty.
arXiv Detail & Related papers (2021-10-16T00:20:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.