ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language
Models
- URL: http://arxiv.org/abs/2310.02998v2
- Date: Fri, 26 Jan 2024 18:45:29 GMT
- Title: ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language
Models
- Authors: Yi-Lin Sung, Jaehong Yoon, Mohit Bansal
- Abstract summary: Large Vision-Language Models (LVLMs) can understand the world comprehensively by integrating rich information from different modalities.
LVLMs are often problematic due to their massive computational/energy costs and carbon consumption.
We propose Efficient Coarse-to-Fine LayerWise Pruning (ECoFLaP), a two-stage coarse-to-fine weight pruning approach for LVLMs.
- Score: 70.45441031021291
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Vision-Language Models (LVLMs) can understand the world comprehensively
by integrating rich information from different modalities, achieving remarkable
advancements on various multimodal downstream tasks. However, deploying LVLMs
is often problematic due to their massive computational/energy costs and carbon
consumption. Such issues make it infeasible to adopt conventional iterative
global pruning, which is costly due to computing the Hessian matrix of the
entire large model for sparsification. Alternatively, several studies have
recently proposed layer-wise pruning approaches to avoid the expensive
computation of global pruning and efficiently compress model weights according
to their importance within a layer. However, they often suffer from suboptimal
model compression due to their lack of a global perspective. To address this
limitation in recent efficient pruning methods for large models, we propose
Efficient Coarse-to-Fine LayerWise Pruning (ECoFLaP), a two-stage
coarse-to-fine weight pruning approach for LVLMs. We first determine the
sparsity ratios of different layers or blocks by leveraging the global
importance score, which is efficiently computed based on the zeroth-order
approximation of the global model gradients. Then, the model performs local
layer-wise unstructured weight pruning based on globally-informed sparsity
ratios. We validate our proposed method across various multimodal and unimodal
models and datasets, demonstrating significant performance improvements over
prevalent pruning techniques in the high-sparsity regime.
Related papers
- Determining Layer-wise Sparsity for Large Language Models Through a Theoretical Perspective [55.90119819642064]
We address the challenge of determining the layer-wise sparsity rates of large language models (LLMs) through a theoretical perspective.
This refers to the cumulative effect of reconstruction errors throughout the sparsification process.
We derive a simple yet effective approach to layer-wise sparsity allocation that mitigates this issue.
arXiv Detail & Related papers (2025-02-20T17:51:10Z) - LESA: Learnable LLM Layer Scaling-Up [57.0510934286449]
Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive.
Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones.
We propose textbfLESA, a novel learnable method for depth scaling-up.
arXiv Detail & Related papers (2025-02-19T14:58:48Z) - Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models [56.00251589760559]
Large language models (LLMs) can act as gradient priors in a zero-shot setting.
We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.
Experiments indicate that LM-GC surpasses existing state-of-the-art lossless compression methods.
arXiv Detail & Related papers (2024-09-26T13:38:33Z) - DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs [46.443316184807145]
We introduce Dynamic Layer Operations (DLO), a novel approach for vertically scaling transformer-based Large Language Models (LLMs)
Unlike traditional Mixture-of-Experts (MoE) methods that focus on extending the model width, our approach targets model depth, addressing the redundancy observed across layer representations for various input samples.
Experimental results demonstrate that DLO not only outperforms the original unscaled models but also achieves comparable results to densely expanded models with significantly improved efficiency.
arXiv Detail & Related papers (2024-07-03T18:34:08Z) - Optimization of geological carbon storage operations with multimodal latent dynamic model and deep reinforcement learning [1.8549313085249324]
This study introduces the multimodal latent dynamic (MLD) model, a deep learning framework for fast flow prediction and well control optimization in GCS.
Unlike existing models, the MLD supports diverse input modalities, allowing comprehensive data interactions.
The approach outperforms traditional methods, achieving the highest NPV while reducing computational resources by over 60%.
arXiv Detail & Related papers (2024-06-07T01:30:21Z) - Retrieval-based Knowledge Transfer: An Effective Approach for Extreme
Large Language Model Compression [64.07696663255155]
Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks.
However, the massive size of these models poses huge challenges for their deployment in real-world applications.
We introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT) which effectively transfers the knowledge of LLMs to extremely small-scale models.
arXiv Detail & Related papers (2023-10-24T07:58:20Z) - FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only
Quantization for LLMs [9.072821427818557]
Large Language Models (LLMs) have achieved state-of-the-art performance across various language tasks but pose challenges for practical deployment.
We propose an efficient weight-only quantization method that reduces memory consumption and accelerates inference for LLMs.
We evaluate our approach on large-scale open source models such as OPT-175B and internal MoE models, showcasing minimal accuracy loss while achieving up to 3.65 times higher throughput.
arXiv Detail & Related papers (2023-08-16T23:57:41Z) - Local-Global Transformer Enhanced Unfolding Network for Pan-sharpening [13.593522290577512]
Pan-sharpening aims to increase the spatial resolution of the low-resolution multispectral (LrMS) image with the guidance of the corresponding panchromatic (PAN) image.
Although deep learning (DL)-based pan-sharpening methods have achieved promising performance, most of them have a two-fold deficiency.
arXiv Detail & Related papers (2023-04-28T03:34:36Z) - Layer-adaptive sparsity for the Magnitude-based Pruning [88.37510230946478]
We propose a novel importance score for global pruning, coined layer-adaptive magnitude-based pruning (LAMP) score.
LAMP consistently outperforms popular existing schemes for layerwise sparsity selection.
arXiv Detail & Related papers (2020-10-15T09:14:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.