Related papers: AgenticPruner: MAC-Constrained Neural Network Compression via LLM-Driven Strategy Search

AgenticPruner: MAC-Constrained Neural Network Compression via LLM-Driven Strategy Search

URL: http://arxiv.org/abs/2601.12272v1
Date: Sun, 18 Jan 2026 06:07:29 GMT
Title: AgenticPruner: MAC-Constrained Neural Network Compression via LLM-Driven Strategy Search
Authors: Shahrzad Esmat, Mahdi Banisharif, Ali Jannesari,
Abstract summary: We propose a framework to achieve Multiply-Accumulate (MAC) operation budgets through iterative strategy learning.<n>Our approach coordinates three specialized agents: a Profiling Agent that analyzes model architecture and MAC distributions, a Master Agent that orchestrates the workflow with divergence monitoring, and an Analysis Agent powered by Claude 3.5 Sonnet.<n>We validate our framework on ImageNet-1K across ResNet, ConvNeXt, and DeiT architectures.
Score: 7.825137277264239
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neural network pruning remains essential for deploying deep learning models on resource-constrained devices, yet existing approaches primarily target parameter reduction without directly controlling computational cost. This yields unpredictable inference latency in deployment scenarios where strict Multiply-Accumulate (MAC) operation budgets must be met. We propose AgenticPruner, a framework utilizing large language models to achieve MAC-constrained optimization through iterative strategy learning. Our approach coordinates three specialized agents: a Profiling Agent that analyzes model architecture and MAC distributions, a Master Agent that orchestrates the workflow with divergence monitoring, and an Analysis Agent powered by Claude 3.5 Sonnet that learns optimal strategies from historical attempts. Through in-context learning, the Analysis Agent improves convergence success rate from 48% to 71% compared to grid search. Building upon isomorphic pruning's graph-based structural grouping, our method adds context-aware adaptation by analyzing patterns across pruning iterations, enabling automatic convergence to target MAC budgets within user-defined tolerance bands. We validate our framework on ImageNet-1K across ResNet, ConvNeXt, and DeiT architectures. On CNNs, our approach achieves MAC targeting while maintaining or improving accuracy: ResNet-50 reaches 1.77G MACs with 77.04% accuracy (+0.91% vs baseline); ResNet-101 achieves 4.22G MACs with 78.94% accuracy (+1.56% vs baseline). For ConvNeXt-Small, pruning to 8.17G MACs yields 1.41x GPU and 1.07x CPU speedup with 45% parameter reduction. On Vision Transformers, we demonstrate MAC-budget compliance within user-defined tolerance bands (typically +1% to +5% overshoot, -5% to -15% undershoot), establishing feasibility for deployment scenarios requiring strict computational guarantees.

Related papers

HQP: Sensitivity-Aware Hybrid Quantization and Pruning for Ultra-Low-Latency Edge AI Inference [0.0]
Hybrid Quantization and Pruning (HQP) framework designed to achieve synergistic model acceleration.<n>HQP framework achieves a peak performance gain of 3.12 times inference speedup and a 55 percent model size reduction.
arXiv Detail & Related papers (2026-02-02T18:17:45Z)
LiQSS: Post-Transformer Linear Quantum-Inspired State-Space Tensor Networks for Real-Time 6G [85.58816960936069]
Proactive and agentic control in Sixth-Generation (6G) Open Radio Access Networks (O-RAN) requires control-grade prediction under stringent Near-Time (Near-RT) latency and computational constraints.<n>This paper investigates a post-Transformer paradigm for efficient radio telemetry forecasting.<n>We propose a quantum-inspired state-space tensor network that replaces self-attention with stable structured state-space dynamics kernels.
arXiv Detail & Related papers (2026-01-18T12:08:38Z)
Towards a Science of Scaling Agent Systems [79.64446272302287]
We formalize a definition for agent evaluation and characterize scaling laws as the interplay between agent quantity, coordination structure, modelic, and task properties.<n>We derive a predictive model using coordination metrics, that cross-validated R2=0, enabling prediction on unseen task domains.<n>We identify three effects: (1) a tool-coordination trade-off: under fixed computational budgets, tool-heavy tasks suffer disproportionately from multi-agent overhead, and (2) a capability saturation: coordination yields diminishing or negative returns once single-agent baselines exceed 45%.
arXiv Detail & Related papers (2025-12-09T06:52:21Z)
ProfilingAgent: Profiling-Guided Agentic Reasoning for Adaptive Model Optimization [7.64805011214817]
Profiling tools expose per-layer latency, memory, and compute cost, yet are rarely integrated into automated pipelines.<n>We propose ProfilingAgent, a profiling-guided, agentic approach that uses large language models (LLMs) to automate compression via structured pruning and post-training dynamic quantization.
arXiv Detail & Related papers (2025-09-06T04:02:04Z)
AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security [74.22452069013289]
AegisLLM is a cooperative multi-agent defense against adversarial attacks and information leakage.<n>We show that scaling agentic reasoning system at test-time substantially enhances robustness without compromising model utility.<n> Comprehensive evaluations across key threat scenarios, including unlearning and jailbreaking, demonstrate the effectiveness of AegisLLM.
arXiv Detail & Related papers (2025-04-29T17:36:05Z)
Filter Pruning For CNN With Enhanced Linear Representation Redundancy [3.853146967741941]
We present a data-driven loss function term calculated from the correlation matrix of different feature maps in the same layer, named CCM-loss. CCM-loss provides us with another universal transcendental mathematical tool besides L*-norm regularization. In our new strategy, we mainly focus on the consistency and integrality of the information flow in the network.
arXiv Detail & Related papers (2023-10-10T06:27:30Z)
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications [68.35683849098105]
We introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups. Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K. Our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K.
arXiv Detail & Related papers (2022-06-21T17:59:56Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Greedy Network Enlarging [53.319011626986004]
We propose a greedy network enlarging method based on the reallocation of computations. With step-by-step modifying the computations on different stages, the enlarged network will be equipped with optimal allocation and utilization of MACs. With application of our method on GhostNet, we achieve state-of-the-art 80.9% and 84.3% ImageNet top-1 accuracies.
arXiv Detail & Related papers (2021-07-31T08:36:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.