Systematic Characterization of Minimal Deep Learning Architectures: A Unified Analysis of Convergence, Pruning, and Quantization
- URL: http://arxiv.org/abs/2601.17987v1
- Date: Sun, 25 Jan 2026 20:31:10 GMT
- Title: Systematic Characterization of Minimal Deep Learning Architectures: A Unified Analysis of Convergence, Pruning, and Quantization
- Authors: Ziwei Zheng, Huizhi Liang, Vaclav Snasel, Vito Latora, Panos Pardalos, Giuseppe Nicosia, Varun Ojha,
- Abstract summary: Deep learning networks excel at classification, yet identifying minimal architectures that reliably solve a task remains challenging.<n>We present a computational methodology for exploring and analyzing the relationships among convergence, pruning, and quantization.<n>Our initial results show that, despite architectural diversity, performance is largely invariant and learning dynamics consistently exhibit three regimes: unstable, learning, and overfitting.
- Score: 6.49583548940407
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning networks excel at classification, yet identifying minimal architectures that reliably solve a task remains challenging. We present a computational methodology for systematically exploring and analyzing the relationships among convergence, pruning, and quantization. The workflow first performs a structured design sweep across a large set of architectures, then evaluates convergence behavior, pruning sensitivity, and quantization robustness on representative models. Focusing on well-known image classification of increasing complexity, and across Deep Neural Networks, Convolutional Neural Networks, and Vision Transformers, our initial results show that, despite architectural diversity, performance is largely invariant and learning dynamics consistently exhibit three regimes: unstable, learning, and overfitting. We further characterize the minimal learnable parameters required for stable learning, uncover distinct convergence and pruning phases, and quantify the effect of reduced numeric precision on trainable parameters. Aligning with intuition, the results confirm that deeper architectures are more resilient to pruning than shallower ones, with parameter redundancy as high as 60%, and quantization impacts models with fewer learnable parameters more severely and has a larger effect on harder image datasets. These findings provide actionable guidance for selecting compact, stable models under pruning and low-precision constraints in image classification.
Related papers
- Component-Aware Pruning Framework for Neural Network Controllers via Gradient-Based Importance Estimation [0.34410212782758043]
This paper introduces a component-aware pruning framework that utilizes gradient information to compute three distinct importance metrics during training.<n> Experimental results with an autoencoder and a TDMPC agent demonstrate that the proposed framework reveals critical structural dependencies and dynamic shifts in importance.
arXiv Detail & Related papers (2026-01-27T16:53:19Z) - Implicit Neural Representation-Based Continuous Single Image Super Resolution: An Empirical Study [50.15623093332659]
Implicit neural representation (INR) has become the standard approach for arbitrary-scale image super-resolution (ASSR)<n>We compare existing techniques across diverse settings and present aggregated performance results on multiple image quality metrics.<n>We examine a new loss function that penalizes intensity variations while preserving edges, textures, and finer details during training.
arXiv Detail & Related papers (2026-01-25T07:09:20Z) - Model Hemorrhage and the Robustness Limits of Large Language Models [119.46442117681147]
Large language models (LLMs) demonstrate strong performance across natural language processing tasks, yet undergo significant performance degradation when modified for deployment.<n>We define this phenomenon as model hemorrhage - performance decline caused by parameter alterations and architectural changes.
arXiv Detail & Related papers (2025-03-31T10:16:03Z) - Generalized Factor Neural Network Model for High-dimensional Regression [50.554377879576066]
We tackle the challenges of modeling high-dimensional data sets with latent low-dimensional structures hidden within complex, non-linear, and noisy relationships.<n>Our approach enables a seamless integration of concepts from non-parametric regression, factor models, and neural networks for high-dimensional regression.
arXiv Detail & Related papers (2025-02-16T23:13:55Z) - LayerMix: Enhanced Data Augmentation through Fractal Integration for Robust Deep Learning [1.786053901581251]
Deep learning models often struggle to maintain consistent performance when confronted with Out-of-Distribution (OOD) samples.<n>We introduce LayerMix, an innovative data augmentation approach that systematically enhances model robustness.<n>Our method generates semantically consistent synthetic samples that significantly improve neural network generalization capabilities.
arXiv Detail & Related papers (2025-01-08T22:22:44Z) - ConsistentFeature: A Plug-and-Play Component for Neural Network Regularization [0.32885740436059047]
Over- parameterized neural network models often lead to significant performance discrepancies between training and test sets.<n>We introduce a simple perspective on overfitting: models learn different representations in different i.i.d. datasets.<n>We propose an adaptive method, ConsistentFeature, that regularizes the model by constraining feature differences across random subsets of the same training set.
arXiv Detail & Related papers (2024-12-02T13:21:31Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - DepGraph: Towards Any Structural Pruning [68.40343338847664]
We study general structural pruning of arbitrary architecture like CNNs, RNNs, GNNs and Transformers.
We propose a general and fully automatic method, emphDependency Graph (DepGraph), to explicitly model the dependency between layers and comprehensively group parameters for pruning.
In this work, we extensively evaluate our method on several architectures and tasks, including ResNe(X)t, DenseNet, MobileNet and Vision transformer for images, GAT for graph, DGCNN for 3D point cloud, alongside LSTM for language, and demonstrate that, even with a
arXiv Detail & Related papers (2023-01-30T14:02:33Z) - Neural Networks with Quantization Constraints [111.42313650830248]
We present a constrained learning approach to quantization training.
We show that the resulting problem is strongly dual and does away with gradient estimations.
We demonstrate that the proposed approach exhibits competitive performance in image classification tasks.
arXiv Detail & Related papers (2022-10-27T17:12:48Z) - A Generic Shared Attention Mechanism for Various Backbone Neural Networks [53.36677373145012]
Self-attention modules (SAMs) produce strongly correlated attention maps across different layers.
Dense-and-Implicit Attention (DIA) shares SAMs across layers and employs a long short-term memory module.
Our simple yet effective DIA can consistently enhance various network backbones.
arXiv Detail & Related papers (2022-10-27T13:24:08Z) - "Understanding Robustness Lottery": A Geometric Visual Comparative
Analysis of Neural Network Pruning Approaches [29.048660060344574]
This work aims to shed light on how different pruning methods alter the network's internal feature representation and the corresponding impact on model performance.
We introduce a visual geometric analysis of feature representations to compare and highlight the impact of pruning on model performance and feature representation.
The proposed tool provides an environment for in-depth comparison of pruning methods and a comprehensive understanding of how model response to common data corruption.
arXiv Detail & Related papers (2022-06-16T04:44:13Z) - ScatSimCLR: self-supervised contrastive learning with pretext task
regularization for small-scale datasets [5.2424255020469595]
We consider a problem of self-supervised learning for small-scale datasets based on contrastive loss between multiple views of the data.
We argue that the number of parameters of the whole system and the number of views can be considerably reduced while preserving the same classification accuracy.
arXiv Detail & Related papers (2021-08-31T15:58:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.