MI-PRUN: Optimize Large Language Model Pruning via Mutual Information
- URL: http://arxiv.org/abs/2601.07212v1
- Date: Mon, 12 Jan 2026 05:06:01 GMT
- Title: MI-PRUN: Optimize Large Language Model Pruning via Mutual Information
- Authors: Hao Zhang, Zhibin Zhang, Guangxin Wu, He Chen, Jiafeng Guo, Xueqi Cheng,
- Abstract summary: We propose a mutual information based pruning method MI-PRUN for Large Language Models.<n>We leverage mutual information to identify redundant blocks by evaluating transitions in hidden states.<n>We also develop the Fast-Block-Select algorithm, which iteratively updates block combinations to achieve a globally optimal solution.
- Score: 73.6518842907835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have become indispensable across various domains, but this comes at the cost of substantial computational and memory resources. Model pruning addresses this by removing redundant components from models. In particular, block pruning can achieve significant compression and inference acceleration. However, existing block pruning methods are often unstable and struggle to attain globally optimal solutions. In this paper, we propose a mutual information based pruning method MI-PRUN for LLMs. Specifically, we leverages mutual information to identify redundant blocks by evaluating transitions in hidden states. Additionally, we incorporate the Data Processing Inequality (DPI) to reveal the relationship between the importance of entire contiguous blocks and that of individual blocks. Moreover, we develop the Fast-Block-Select algorithm, which iteratively updates block combinations to achieve a globally optimal solution while significantly improving the efficiency. Extensive experiments across various models and datasets demonstrate the stability and effectiveness of our method.
Related papers
- Block removal for large language models through constrained binary optimization [0.28564598766688487]
This paper formulates block removal as a constrained binary optimization problem that can be mapped to a physical system.<n>We show that our approach outperforms state-of-the-art block-removal methods across several benchmarks.<n>We illustrate this generality on the recent NVIDIA-Nemotron-3-Nano-30B-A3B-FP8 model, which exhibits a highly inhomogeneous and challenging block structure.
arXiv Detail & Related papers (2026-01-29T19:46:39Z) - Blockwise Flow Matching: Improving Flow Matching Models For Efficient High-Quality Generation [33.177998521195114]
Flow Matching models have pushed the boundaries of high-fidelity data generation across a wide range of domains.<n>We propose Blockwise Flow Matching (BFM), a novel framework that partitions the generative trajectory into multiple temporal segments.<n>BFM achieves 2.1x to 4.9x accelerations in inference complexity at comparable generation performance.
arXiv Detail & Related papers (2025-10-24T05:41:23Z) - FMIP: Joint Continuous-Integer Flow For Mixed-Integer Linear Programming [52.52020895303244]
Mixed-Integer Linear Programming (MILP) is a foundational tool for complex decision-making problems.<n>We propose Joint Continuous-Integer Flow for Mixed-Integer Linear Programming (FMIP), which is the first generative framework that models joint distribution of both integer and continuous variables for MILP solutions.<n>FMIP is fully compatible with arbitrary backbone networks and various downstream solvers, making it well-suited for a broad range of real-world MILP applications.
arXiv Detail & Related papers (2025-07-31T10:03:30Z) - PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection [68.8373788348678]
Visual instruction tuning adapts pre-trained Multimodal Large Language Models to follow human instructions.<n>PRISM is the first training-free framework for efficient visual instruction selection.<n>It reduces the end-to-end time for data selection and model tuning to just 30% of conventional pipelines.
arXiv Detail & Related papers (2025-02-17T18:43:41Z) - MOFHEI: Model Optimizing Framework for Fast and Efficient Homomorphically Encrypted Neural Network Inference [0.8388591755871735]
Homomorphic Encryption (HE) enables us to perform machine learning tasks over encrypted data.<n>We propose MOFHEI, a framework that optimize the model to make HE-based neural network inference, fast and efficient.<n>Our framework achieves up to 98% pruning ratio on LeNet, eliminating up to 93% of the required HE operations for performing PI.
arXiv Detail & Related papers (2024-12-10T22:44:54Z) - MILP-StuDio: MILP Instance Generation via Block Structure Decomposition [55.79888361191114]
Mixed-integer linear programming (MILP) is one of the most popular mathematical formulations with numerous applications.
We propose a novel MILP generation framework, called Block Structure Decomposition (MILP-StuDio), to generate high-quality instances by preserving the block structures.
arXiv Detail & Related papers (2024-10-30T08:33:27Z) - MOLA: Enhancing Industrial Process Monitoring Using Multi-Block Orthogonal Long Short-Term Memory Autoencoder [3.7028696448588487]
We introduce MOLA: a Multi-block Orthogonal Long short-term memory Autoencoder paradigm, to conduct accurate, reliable fault detection of industrial processes.<n>We propose a multi-block monitoring structure, which categorizes the process variables into multiple blocks by leveraging expert process knowledge.<n>We demonstrate the efficiency and effectiveness of our MOLA framework by applying it to the Tennessee Eastman Process.
arXiv Detail & Related papers (2024-10-10T00:49:43Z) - High-Dimensional Distributed Sparse Classification with Scalable Communication-Efficient Global Updates [50.406127962933915]
We develop solutions to problems which enable us to learn a communication-efficient distributed logistic regression model.
In our experiments we demonstrate a large improvement in accuracy over distributed algorithms with only a few distributed update steps needed.
arXiv Detail & Related papers (2024-07-08T19:34:39Z) - Learning Pseudo-Backdoors for Mixed Integer Programs [48.36587539004464]
We propose a machine learning approach for solving Mixed Programs (MIP) by learning to prioritize a set of decision variables, which we call pseudo-backdoors, for branching that results in faster solution times.
Our approach takes inspiration from the concept of strong backdoors, which corresponds to a small set of variables such that only branching on these variables yields an optimal integral solution and a proof of optimality.
A key advantage of pseudo-backdoors over strong backdoors is that they are much amenable to data-driven identification or prediction.
arXiv Detail & Related papers (2021-06-09T13:59:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.