Related papers: Block removal for large language models through constrained binary optimization

Block removal for large language models through constrained binary optimization

URL: http://arxiv.org/abs/2602.00161v1
Date: Thu, 29 Jan 2026 19:46:39 GMT
Title: Block removal for large language models through constrained binary optimization
Authors: David Jansen, Roman Rausch, David Montero, Roman Orus,
Abstract summary: This paper formulates block removal as a constrained binary optimization problem that can be mapped to a physical system.<n>We show that our approach outperforms state-of-the-art block-removal methods across several benchmarks.<n>We illustrate this generality on the recent NVIDIA-Nemotron-3-Nano-30B-A3B-FP8 model, which exhibits a highly inhomogeneous and challenging block structure.
Score: 0.28564598766688487
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Compressing resource-intensive large language models by removing whole transformer blocks is a seemingly simple idea, but identifying which blocks to remove constitutes an exponentially difficult combinatorial problem. In this paper, we formulate block removal as a constrained binary optimization problem that can be mapped to a physical system (Ising model), whose energies are a strong proxy for downstream model performance. This formulation enables an efficient ranking of a large number of candidate block-removal configurations and yields many high-quality, non-trivial solutions beyond consecutive regions. We demonstrate that our approach outperforms state-of-the-art block-removal methods across several benchmarks, with performance gains persisting after short retraining, and reaching improvements of up to 6 points on the MMLU benchmark. Our method requires only forward and backward passes for a few active parameters, together with an (at least approximate) Ising solver, and can be readily applied to any architecture. We illustrate this generality on the recent NVIDIA-Nemotron-3-Nano-30B-A3B-FP8 model, which exhibits a highly inhomogeneous and challenging block structure.

Related papers

MI-PRUN: Optimize Large Language Model Pruning via Mutual Information [73.6518842907835]
We propose a mutual information based pruning method MI-PRUN for Large Language Models.<n>We leverage mutual information to identify redundant blocks by evaluating transitions in hidden states.<n>We also develop the Fast-Block-Select algorithm, which iteratively updates block combinations to achieve a globally optimal solution.
arXiv Detail & Related papers (2026-01-12T05:06:01Z)
BAMBO: Construct Ability and Efficiency LLM Pareto Set via Bayesian Adaptive Multi-objective Block-wise Optimization [4.196004665145396]
BAMBO (Bayesian Adaptive Multi-objective Block-wise Optimization) is a novel framework that automatically constructs the Large Language Models (LLMs)<n>Formulated as a 1D clustering problem, this strategy leverages a dynamic programming approach to optimally balance intra-blockvolume and inter-block information distribution.
arXiv Detail & Related papers (2025-12-10T15:32:56Z)
MOBIUS: Big-to-Mobile Universal Instance Segmentation via Multi-modal Bottleneck Fusion and Calibrated Decoder Pruning [91.90342432541138]
Scaling up model size and training data has advanced foundation models for instance-level perception.<n>High computational cost limits adoption on resource-constrained platforms.<n>We introduce a new benchmark for efficient segmentation on both high-performance computing platforms and mobile devices.
arXiv Detail & Related papers (2025-10-16T18:00:00Z)
FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing [59.12511498024836]
We present a method to prune large language models (LLMs) that selectively prunes model blocks based on an importance score.<n>We propose a principled metric to replace each pruned block using a weight-sharing mechanism.<n> Empirical evaluations demonstrate substantial performance gains over existing methods.
arXiv Detail & Related papers (2025-01-24T18:46:37Z)
MultiPruner: Balanced Structure Removal in Foundation Models [1.8434042562191815]
Recently, state-of-the-art approaches for pruning large pre-trained models (LPMs) have demonstrated that the training-free removal of non-critical residual blocks in Transformers is viable for reducing model size.<n>We extend BlockPruner and propose MultiPruner, a pruning approach that surpasses recent training-free pruning methods by adopting a multidimensional, iterative, fine-grained pruning strategy.
arXiv Detail & Related papers (2025-01-17T04:24:31Z)
Scalable iterative pruning of large language and vision models using block coordinate descent [0.31410859223862103]
Pruning neural networks, which involves removing a fraction of their weights, can often maintain high accuracy while significantly reducing model complexity, at least up to a certain limit.<n>We present a neural network pruning technique that builds upon the Combinatorial Brain Surgeon, but solves an optimization problem over a subset of the network weights in an iterative, block-wise manner.
arXiv Detail & Related papers (2024-11-26T17:54:02Z)
Symmetric Tensor Networks for Generative Modeling and Constrained Combinatorial Optimization [72.41480594026815]
Constrained optimization problems abound in industry, from portfolio optimization to logistics. One of the major roadblocks in solving these problems is the presence of non-trivial hard constraints which limit the valid search space. In this work, we encode arbitrary integer-valued equality constraints of the form Ax=b, directly into U(1) symmetric networks (TNs) and leverage their applicability as quantum-inspired generative models.
arXiv Detail & Related papers (2022-11-16T18:59:54Z)
Accurate and Lightweight Image Super-Resolution with Model-Guided Deep Unfolding Network [63.69237156340457]
We present and advocate an explainable approach toward SISR named model-guided deep unfolding network (MoG-DUN) MoG-DUN is accurate (producing fewer aliasing artifacts), computationally efficient (with reduced model parameters), and versatile (capable of handling multiple degradations) The superiority of the proposed MoG-DUN method to existing state-of-theart image methods including RCAN, SRDNF, and SRFBN is substantiated by extensive experiments on several popular datasets and various degradation scenarios.
arXiv Detail & Related papers (2020-09-14T08:23:37Z)
MPLP++: Fast, Parallel Dual Block-Coordinate Ascent for Dense Graphical Models [96.1052289276254]
This work introduces a new MAP-solver, based on the popular Dual Block-Coordinate Ascent principle. Surprisingly, by making a small change to the low-performing solver, we derive the new solver MPLP++ that significantly outperforms all existing solvers by a large margin.
arXiv Detail & Related papers (2020-04-16T16:20:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.