A Comparative Study of Custom CNNs, Pre-trained Models, and Transfer Learning Across Multiple Visual Datasets
- URL: http://arxiv.org/abs/2601.02246v1
- Date: Mon, 05 Jan 2026 16:26:32 GMT
- Title: A Comparative Study of Custom CNNs, Pre-trained Models, and Transfer Learning Across Multiple Visual Datasets
- Authors: Annoor Sharara Akhand,
- Abstract summary: Convolutional Neural Networks (CNNs) are a standard approach for visual recognition due to their capacity to learn hierarchical representations from raw pixels.<n>In practice, practitioners often choose among (i) training a compact custom CNN from scratch, (ii) using a large pre-trained CNN as a fixed feature extractor, and (iii) performing transfer learning via partial or full fine-tuning of a pre-trained backbone.<n>This report presents a controlled comparison of these three paradigms across five real-world image classification datasets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Convolutional Neural Networks (CNNs) are a standard approach for visual recognition due to their capacity to learn hierarchical representations from raw pixels. In practice, practitioners often choose among (i) training a compact custom CNN from scratch, (ii) using a large pre-trained CNN as a fixed feature extractor, and (iii) performing transfer learning via partial or full fine-tuning of a pre-trained backbone. This report presents a controlled comparison of these three paradigms across five real-world image classification datasets spanning road-surface defect recognition, agricultural variety identification, fruit/leaf disease recognition, pedestrian walkway encroachment recognition, and unauthorized vehicle recognition. Models are evaluated using accuracy and macro F1-score, complemented by efficiency metrics including training time per epoch and parameter counts. The results show that transfer learning consistently yields the strongest predictive performance, while the custom CNN provides an attractive efficiency--accuracy trade-off, especially when compute and memory budgets are constrained.
Related papers
- Performance Analysis of Image Classification on Bangladeshi Datasets [0.0]
Convolutional Neural Networks (CNNs) have demonstrated remarkable success in image classification tasks.<n>We present a comparative analysis of a custom-designed CNN and several widely used deep learning architectures for an image classification task.
arXiv Detail & Related papers (2026-01-07T21:15:16Z) - Comparative Analysis of Custom CNN Architectures versus Pre-trained Models and Transfer Learning: A Study on Five Bangladesh Datasets [0.0]
Transfer learning with fine-tuning consistently outperforms both custom CNNs built from scratch and feature extraction methods.<n>While custom CNNs offer advantages in model size (3.4M parameters vs. 11-134M for pre-trained models), pre-trained models with transfer learning provide superior performance.
arXiv Detail & Related papers (2026-01-07T19:36:41Z) - Bayesian Topological Convolutional Neural Nets [0.5985483103102681]
Convolutional neural networks (CNNs) have been established as the main workhorse in image data processing.<n>We propose a new Bayesian topological CNN that promotes a novel interplay between topology-aware learning and Bayesian sampling.<n>We evaluate the model on benchmark image classification datasets and demonstrate its superiority over conventional CNNs, Bayesian neural networks (BNNs), and topological CNNs.
arXiv Detail & Related papers (2025-10-13T17:57:43Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Human activity recognition using deep learning approaches and single
frame cnn and convolutional lstm [0.0]
We explore two deep learning-based approaches, namely single frame Convolutional Neural Networks (CNNs) and convolutional Long Short-Term Memory to recognise human actions from videos.
The two models were trained and evaluated on a benchmark action recognition dataset, UCF50, and another dataset that was created for the experimentation.
Though both models exhibit good accuracies, the single frame CNN model outperforms the Convolutional LSTM model by having an accuracy of 99.8% with the UCF50 dataset.
arXiv Detail & Related papers (2023-04-18T01:33:29Z) - Understanding and Improving Transfer Learning of Deep Models via Neural Collapse [37.483109067209504]
This work investigates the relationship between neural collapse (NC) and transfer learning for classification problems.
We find strong correlation between feature collapse and downstream performance.
Our proposed fine-tuning methods deliver good performances while reducing fine-tuning parameters by at least 90%.
arXiv Detail & Related papers (2022-12-23T08:48:34Z) - Incremental Online Learning Algorithms Comparison for Gesture and Visual
Smart Sensors [68.8204255655161]
This paper compares four state-of-the-art algorithms in two real applications: gesture recognition based on accelerometer data and image classification.
Our results confirm these systems' reliability and the feasibility of deploying them in tiny-memory MCUs.
arXiv Detail & Related papers (2022-09-01T17:05:20Z) - Facilitated machine learning for image-based fruit quality assessment in
developing countries [68.8204255655161]
Automated image classification is a common task for supervised machine learning in food science.
We propose an alternative method based on pre-trained vision transformers (ViTs)
It can be easily implemented with limited resources on a standard device.
arXiv Detail & Related papers (2022-07-10T19:52:20Z) - Revisiting Classifier: Transferring Vision-Language Models for Video
Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research.
In this study, we focus on transferring knowledge for video classification tasks.
We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z) - CONVIQT: Contrastive Video Quality Estimator [63.749184706461826]
Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms.
Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner.
Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning.
arXiv Detail & Related papers (2022-06-29T15:22:01Z) - Calibrating Class Activation Maps for Long-Tailed Visual Recognition [60.77124328049557]
We present two effective modifications of CNNs to improve network learning from long-tailed distribution.
First, we present a Class Activation Map (CAMC) module to improve the learning and prediction of network classifiers.
Second, we investigate the use of normalized classifiers for representation learning in long-tailed problems.
arXiv Detail & Related papers (2021-08-29T05:45:03Z) - Combining Deep Transfer Learning with Signal-image Encoding for
Multi-Modal Mental Wellbeing Classification [2.513785998932353]
This paper proposes a framework to tackle the limitation in performing emotional state recognition on multiple multimodal datasets.
We show that model performance when inferring real-world wellbeing rated on a 5-point Likert scale can be enhanced using our framework.
arXiv Detail & Related papers (2020-11-20T13:37:23Z) - Learning to Learn Parameterized Classification Networks for Scalable
Input Images [76.44375136492827]
Convolutional Neural Networks (CNNs) do not have a predictable recognition behavior with respect to the input resolution change.
We employ meta learners to generate convolutional weights of main networks for various input scales.
We further utilize knowledge distillation on the fly over model predictions based on different input resolutions.
arXiv Detail & Related papers (2020-07-13T04:27:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.