Finding Optimal Kernel Size and Dimension in Convolutional Neural Networks An Architecture Optimization Approach
- URL: http://arxiv.org/abs/2506.14846v1
- Date: Mon, 16 Jun 2025 15:15:30 GMT
- Title: Finding Optimal Kernel Size and Dimension in Convolutional Neural Networks An Architecture Optimization Approach
- Authors: Shreyas Rajeev, B Sathish Babu,
- Abstract summary: Kernel size selection in Convolutional Neural Networks (CNNs) is a critical but often overlooked design decision.<n>This paper proposes the Best Kernel Size Estimation (BKSEF) for optimal, layer-wise kernel size determination.<n> BKSEF balances information gain, computational efficiency, and accuracy improvements by integrating principles from information theory, signal processing, and learning theory.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Kernel size selection in Convolutional Neural Networks (CNNs) is a critical but often overlooked design decision that affects receptive field, feature extraction, computational cost, and model accuracy. This paper proposes the Best Kernel Size Estimation Function (BKSEF), a mathematically grounded and empirically validated framework for optimal, layer-wise kernel size determination. BKSEF balances information gain, computational efficiency, and accuracy improvements by integrating principles from information theory, signal processing, and learning theory. Extensive experiments on CIFAR-10, CIFAR-100, ImageNet-lite, ChestX-ray14, and GTSRB datasets demonstrate that BKSEF-guided architectures achieve up to 3.1 percent accuracy improvement and 42.8 percent reduction in FLOPs compared to traditional models using uniform 3x3 kernels. Two real-world case studies further validate the approach: one for medical image classification in a cloud-based setup, and another for traffic sign recognition on edge devices. The former achieved enhanced interpretability and accuracy, while the latter reduced latency and model size significantly, with minimal accuracy trade-off. These results show that kernel size can be an active, optimizable parameter rather than a fixed heuristic. BKSEF provides practical heuristics and theoretical support for researchers and developers seeking efficient and application-aware CNN designs. It is suitable for integration into neural architecture search pipelines and real-time systems, offering a new perspective on CNN optimization.
Related papers
- Optimizers Qualitatively Alter Solutions And We Should Leverage This [62.662640460717476]
Deep Neural Networks (DNNs) can not guarantee convergence to a unique global minimum of the loss when using only local information, such as SGD.<n>We argue that the community should aim at understanding the biases of already existing methods, as well as aim to build new DNNs with the explicit intent of inducing certain properties of the solution.
arXiv Detail & Related papers (2025-07-16T13:33:31Z) - Building Efficient Lightweight CNN Models [0.0]
Convolutional Neural Networks (CNNs) are pivotal in image classification tasks due to their robust feature extraction capabilities.<n>This paper introduces a methodology to construct lightweight CNNs while maintaining competitive accuracy.<n>The proposed model achieved a state-of-the-art accuracy of 99% on the handwritten digit MNIST and 89% on fashion MNIST, with only 14,862 parameters and a model size of 0.17 MB.
arXiv Detail & Related papers (2025-01-26T14:39:01Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - SpiKernel: A Kernel Size Exploration Methodology for Improving Accuracy of the Embedded Spiking Neural Network Systems [6.006032394972252]
Spiking Neural Networks (SNNs) can offer ultra-low power/energy consumption for machine learning-based application tasks.<n>Currently, most of the SNN architectures need a significantly larger model size to achieve higher accuracy.<n>We propose Spi Kernel, a novel methodology that improves the accuracy of SNNs through kernel size exploration.
arXiv Detail & Related papers (2024-04-02T06:42:14Z) - FlowNAS: Neural Architecture Search for Optical Flow Estimation [65.44079917247369]
We propose a neural architecture search method named FlowNAS to automatically find the better encoder architecture for flow estimation task.
Experimental results show that the discovered architecture with the weights inherited from the super-network achieves 4.67% F1-all error on KITTI.
arXiv Detail & Related papers (2022-07-04T09:05:25Z) - Multi-objective Evolutionary Approach for Efficient Kernel Size and
Shape for CNN [12.697368516837718]
State-of-the-art development in CNN topology, such as VGGNet and ResNet, have become increasingly accurate.
These networks are computationally expensive involving billions of arithmetic operations and parameters.
This paper considers optimising the computational resource consumption by reducing the size and number of kernels in convolutional layers.
arXiv Detail & Related papers (2021-06-28T14:47:29Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Convolution Neural Network Hyperparameter Optimization Using Simplified
Swarm Optimization [2.322689362836168]
Convolutional Neural Network (CNN) is widely used in computer vision.
It is not easy to find a network architecture with better performance.
arXiv Detail & Related papers (2021-03-06T00:23:27Z) - An Efficient Quantitative Approach for Optimizing Convolutional Neural
Networks [16.072287925319806]
We propose 3D-Receptive Field (3DRF) to estimate the quality of a CNN architecture and guide the search process of designs.
Our models can achieve up to 5.47% accuracy improvement and up to 65.38% parameters, compared with state-of-the-art CNN structures like MobileNet and ResNet.
arXiv Detail & Related papers (2020-09-11T05:14:34Z) - DrNAS: Dirichlet Neural Architecture Search [88.56953713817545]
We treat the continuously relaxed architecture mixing weight as random variables, modeled by Dirichlet distribution.
With recently developed pathwise derivatives, the Dirichlet parameters can be easily optimized with gradient-based generalization.
To alleviate the large memory consumption of differentiable NAS, we propose a simple yet effective progressive learning scheme.
arXiv Detail & Related papers (2020-06-18T08:23:02Z) - FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining [65.39532971991778]
We present an accuracy predictor that scores architecture and training recipes jointly, guiding both sample selection and ranking.
We run fast evolutionary searches in just CPU minutes to generate architecture-recipe pairs for a variety of resource constraints.
FBNetV3 makes up a family of state-of-the-art compact neural networks that outperform both automatically and manually-designed competitors.
arXiv Detail & Related papers (2020-06-03T05:20:21Z) - Lightweight Residual Densely Connected Convolutional Neural Network [18.310331378001397]
The lightweight residual densely connected blocks are proposed to guaranty the deep supervision, efficient gradient flow, and feature reuse abilities of convolutional neural network.
The proposed method decreases the cost of training and inference processes without using any special hardware-software equipment.
arXiv Detail & Related papers (2020-01-02T17:15:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.