Related papers: Structural Dropout for Model Width Compression

Structural Dropout for Model Width Compression

URL: http://arxiv.org/abs/2205.06906v1
Date: Fri, 13 May 2022 21:50:57 GMT
Title: Structural Dropout for Model Width Compression
Authors: Julian Knodt
Abstract summary: Existing ML models are known to be highly over-parametrized, and use significantly more resources than required for a given task. We propose a method that requires only a single training session for the original model and a set of compressed models. The proposed approach is a "structural" dropout that prunes all elements in the hidden state above a randomly chosen index, forcing the model to learn an importance ordering over its features.
Score: 1.52292571922932
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing ML models are known to be highly over-parametrized, and use significantly more resources than required for a given task. Prior work has explored compressing models offline, such as by distilling knowledge from larger models into much smaller ones. This is effective for compression, but does not give an empirical method for measuring how much the model can be compressed, and requires additional training for each compressed model. We propose a method that requires only a single training session for the original model and a set of compressed models. The proposed approach is a "structural" dropout that prunes all elements in the hidden state above a randomly chosen index, forcing the model to learn an importance ordering over its features. After learning this ordering, at inference time unimportant features can be pruned while retaining most accuracy, reducing parameter size significantly. In this work, we focus on Structural Dropout for fully-connected layers, but the concept can be applied to any kind of layer with unordered features, such as convolutional or attention layers. Structural Dropout requires no additional pruning/retraining, but requires additional validation for each possible hidden sizes. At inference time, a non-expert can select a memory versus accuracy trade-off that best suits their needs, across a wide range of highly compressed versus more accurate models.

Related papers

Pruning General Large Language Models into Customized Expert Models [78.96891010334852]
Large language models (LLMs) have revolutionized natural language processing, yet their substantial model sizes often require substantial computational resources.<n>We propose a method to prune a large general model into a smaller lightweight expert model.<n>By identifying and pruning irrelevant neurons of each dimension, $textttCus-Prun$ creates expert models without any post-training.
arXiv Detail & Related papers (2025-06-03T07:47:30Z)
LESA: Learnable LLM Layer Scaling-Up [57.0510934286449]
Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive. Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones. We propose textbfLESA, a novel learnable method for depth scaling-up.
arXiv Detail & Related papers (2025-02-19T14:58:48Z)
Representing Model Weights with Language using Tree Experts [39.90685550999956]
This paper learns to represent models within a joint space that embeds both model weights and language. We introduce Probing Experts (ProbeX), a theoretically motivated, lightweight probing method. Our results show that ProbeX can effectively map the weights of large models into a shared weight-language embedding space.
arXiv Detail & Related papers (2024-10-17T17:17:09Z)
Application Specific Compression of Deep Learning Models [0.8875650122536799]
Large Deep Learning models are compressed and deployed for specific applications. Our goal is to customize the model compression process to create a compressed model that will perform better for the target application. We have experimented with the BERT family of models for three applications: Extractive QA, Natural Language Inference, and Paraphrase Identification.
arXiv Detail & Related papers (2024-09-09T06:55:38Z)
When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Models [61.363259848264725]
Inheritune is a simple and effective training recipe for building smaller, more efficient language models.<n>We show that Inheritune trained models, despite having significantly fewer layers, can match or even outperform their larger counterparts.
arXiv Detail & Related papers (2024-04-12T17:53:34Z)
Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning [86.15009879251386]
We propose a novel architecture and method of explainable classification with Concept Bottleneck Models (CBM) CBMs require an additional set of concepts to leverage. We show a significant increase in accuracy using sparse hidden layers in CLIP-based bottleneck models.
arXiv Detail & Related papers (2024-04-04T09:43:43Z)
LaCo: Large Language Model Pruning via Layer Collapse [56.92068213969036]
Large language models (LLMs) based on transformer are witnessing a notable trend of size expansion. Existing methods such as model quantization, knowledge distillation, and model pruning are constrained by various issues. We propose a concise layer-wise structured pruner called textitLayer Collapse (LaCo), in which rear model layers collapse into a prior layer.
arXiv Detail & Related papers (2024-02-17T04:16:30Z)
Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence. We find that gradients require milder compression rates than activations. Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z)
Deep learning model compression using network sensitivity and gradients [3.52359746858894]
We present model compression algorithms for both non-retraining and retraining conditions. In the first case, we propose the Bin & Quant algorithm for compression of the deep learning models using the sensitivity of the network parameters. In the second case, we propose our novel gradient-weighted k-means clustering algorithm (GWK)
arXiv Detail & Related papers (2022-10-11T03:02:40Z)
Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models [7.6356407698088]
Pruning unnecessary parameters has emerged as a simple and effective method for compressing large models. We show that optimizing for flat minima consistently leads to greater compressibility of parameters compared to standard Adam optimization.
arXiv Detail & Related papers (2022-05-25T11:54:37Z)
Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models. Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z)
NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search [100.71365025972258]
We propose NAS-BERT, an efficient method for BERT compression. NAS-BERT trains a big supernet on a search space and outputs multiple compressed models with adaptive sizes and latency. Experiments on GLUE and SQuAD benchmark datasets demonstrate that NAS-BERT can find lightweight models with better accuracy than previous approaches.
arXiv Detail & Related papers (2021-05-30T07:20:27Z)
Self-Supervised GAN Compression [32.21713098893454]
We show that a standard model compression technique, weight pruning, cannot be applied to GANs using existing methods. We then develop a self-supervised compression technique which uses the trained discriminator to supervise the training of a compressed generator. We show that this framework has a compelling performance to high degrees of sparsity, can be easily applied to new tasks and models, and enables meaningful comparisons between different pruning granularities.
arXiv Detail & Related papers (2020-07-03T04:18:54Z)
Dynamic Model Pruning with Feedback [64.019079257231]
We propose a novel model compression method that generates a sparse trained model without additional overhead. We evaluate our method on CIFAR-10 and ImageNet, and show that the obtained sparse models can reach the state-of-the-art performance of dense models.
arXiv Detail & Related papers (2020-06-12T15:07:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.