Related papers: IG-Pruning: Input-Guided Block Pruning for Large Language Models

IG-Pruning: Input-Guided Block Pruning for Large Language Models

URL: http://arxiv.org/abs/2511.02213v1
Date: Tue, 04 Nov 2025 03:05:54 GMT
Title: IG-Pruning: Input-Guided Block Pruning for Large Language Models
Authors: Kangyu Qiao, Shaolei Zhang, Yang Feng,
Abstract summary: We propose IG-Pruning, a novel input-aware block-wise pruning method that dynamically selects layer masks at inference time.<n> Experimental results demonstrate that our method consistently outperforms state-of-the-art static depth pruning methods.
Score: 34.984986323797976
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the growing computational demands of large language models (LLMs), efficient inference has become increasingly critical for practical deployment. Depth pruning has emerged as a promising approach for reducing the computational costs of large language models by removing transformer layers. However, existing methods typically rely on fixed block masks, which can lead to suboptimal performance across different tasks and inputs. In this paper, we propose IG-Pruning, a novel input-aware block-wise pruning method that dynamically selects layer masks at inference time. Our approach consists of two stages: (1) Discovering diverse mask candidates through semantic clustering and L0 optimization, and (2) Implementing efficient dynamic pruning without the need for extensive training. Experimental results demonstrate that our method consistently outperforms state-of-the-art static depth pruning methods, making it particularly suitable for resource-constrained deployment scenarios.

Related papers

DynaAct: Large Language Model Reasoning with Dynamic Action Spaces [58.298135359318024]
We propose a novel framework named textscDynaAct for automatically constructing a compact action space.<n>Our approach significantly improves overall performance, while maintaining efficient inference without introducing substantial latency.
arXiv Detail & Related papers (2025-11-11T09:47:13Z)
Sparse Training Scheme for Multimodal LLM [26.81140959413325]
Multimodal Large Language Models (MLLMs) have demonstrated outstanding performance across a variety of domains.<n>We propose a novel training-efficient framework based on sparse representations, termed the Sparse Training Scheme (STS)<n>This scheme consists of two key components: the Visual Token, which reduces the information load by compressing visual tokens, and the Layer Dynamic Skipper, which mitigates the computational overhead by skipping unnecessary layers in the language model during both forward and backward passes.
arXiv Detail & Related papers (2025-09-16T11:33:20Z)
Objective Soups: Multilingual Multi-Task Modeling for Speech Processing [69.52720282028385]
Training a single model for multilingual, multi-task speech processing (MSP) is severely hampered by conflicting objectives between tasks.<n>This paper investigates three multi-objective MSP formulations, which we refer to as textbfobjective soup recipes.<n>Our work demonstrates that hierarchical MOO is a more effective and scalable approach for building state-of-the-art MSP models.
arXiv Detail & Related papers (2025-08-12T07:01:09Z)
LOP: Learning Optimal Pruning for Efficient On-Demand MLLMs Scaling [52.1366057696919]
LOP is an efficient neural pruning framework that learns optimal pruning strategies from the target pruning constraint.<n>LOP approach trains autoregressive neural networks (NNs) to directly predict layer-wise pruning strategies adaptive to the target pruning constraint.<n> Experimental results show that LOP outperforms state-of-the-art pruning methods in various metrics while achieving up to three orders of magnitude speedup.
arXiv Detail & Related papers (2025-06-15T12:14:16Z)
Dual-Priv Pruning : Efficient Differential Private Fine-Tuning in Multimodal Large Language Models [21.598534853947676]
We propose a framework that employs two complementary pruning mechanisms for Differential Privacy (DP) fine-tuning in MLLMs.<n>Our approach consistently utilizes less memory than standard DP-SGD.<n>To the best of our knowledge, we are the first to explore DP fine-tuning in MLLMs.
arXiv Detail & Related papers (2025-06-08T10:33:01Z)
SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling [16.742839354514512]
We introduce SkipGPT, a dynamic layer pruning framework to optimize large language models.<n>We show that SkipGPT reduces over 40% of model parameters while matching or exceeding the performance of the original dense model.
arXiv Detail & Related papers (2025-06-04T17:26:31Z)
Efficient Multi-modal Long Context Learning for Training-free Adaptation [96.21248144937627]
This paper introduces Efficient Multi-Modal Long Context Learning (EMLoC)<n>It embeds demonstration examples directly into the model input.<n>It condenses long-context multimodal inputs into compact, task-specific memory representations.
arXiv Detail & Related papers (2025-05-26T10:49:44Z)
Instruction-Following Pruning for Large Language Models [58.329978053711024]
We move beyond the traditional static pruning approach of determining a fixed pruning mask for a model.<n>In our method, the pruning mask is input-dependent and adapts dynamically based on the information described in a user instruction.<n>Our approach, termed "instruction-following pruning", introduces a sparse mask predictor that takes the user instruction as input and dynamically selects the most relevant model parameters for the given task.
arXiv Detail & Related papers (2025-01-03T20:19:14Z)
Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient [57.9629676017527]
We propose an optimization-based structural pruning that learns the pruning masks in a probabilistic space directly by optimizing the loss of the pruned model.<n>We achieve this by learning an underlying Bernoulli distribution to sample binary pruning masks.<n>Experiments conducted on LLaMA, LLaMA-2, LLaMA-3, Vicuna, and Mistral models demonstrate the promising performance of our method in efficiency and effectiveness.
arXiv Detail & Related papers (2024-06-15T09:31:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.