A Comprehensive Survey on Architectural Advances in Deep CNNs: Challenges, Applications, and Emerging Research Directions
- URL: http://arxiv.org/abs/2503.16546v1
- Date: Wed, 19 Mar 2025 08:41:06 GMT
- Title: A Comprehensive Survey on Architectural Advances in Deep CNNs: Challenges, Applications, and Emerging Research Directions
- Authors: Saddam Hussain Khan, Rashid Iqbal,
- Abstract summary: CNNs have significantly advanced deep learning, driving breakthroughs in computer vision, natural language processing, medical diagnosis, object detection, and speech recognition.<n>This survey presents a unified taxonomy that classifies CNN architectures based on spatial exploitation, multi-path structures, depth, width, dimensionality expansion, channel boosting, and attention mechanisms.<n>It systematically reviews CNN applications in face recognition, pose estimation, action recognition, text classification, statistical language modeling, disease diagnosis, radiological analysis, cryptocurrency sentiment prediction, 1D data processing, video analysis, and speech recognition.
- Score: 1.0523436939538895
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep Convolutional Neural Networks (CNNs) have significantly advanced deep learning, driving breakthroughs in computer vision, natural language processing, medical diagnosis, object detection, and speech recognition. Architectural innovations including 1D, 2D, and 3D convolutional models, dilated and grouped convolutions, depthwise separable convolutions, and attention mechanisms address domain-specific challenges and enhance feature representation and computational efficiency. Structural refinements such as spatial-channel exploitation, multi-path design, and feature-map enhancement contribute to robust hierarchical feature extraction and improved generalization, particularly through transfer learning. Efficient preprocessing strategies, including Fourier transforms, structured transforms, low-precision computation, and weight compression, optimize inference speed and facilitate deployment in resource-constrained environments. This survey presents a unified taxonomy that classifies CNN architectures based on spatial exploitation, multi-path structures, depth, width, dimensionality expansion, channel boosting, and attention mechanisms. It systematically reviews CNN applications in face recognition, pose estimation, action recognition, text classification, statistical language modeling, disease diagnosis, radiological analysis, cryptocurrency sentiment prediction, 1D data processing, video analysis, and speech recognition. In addition to consolidating architectural advancements, the review highlights emerging learning paradigms such as few-shot, zero-shot, weakly supervised, federated learning frameworks and future research directions include hybrid CNN-transformer models, vision-language integration, generative learning, etc. This review provides a comprehensive perspective on CNN's evolution from 2015 to 2025, outlining key innovations, challenges, and opportunities.
Related papers
- Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook [85.43403500874889]
Retrieval-augmented generation (RAG) has emerged as a pivotal technique in artificial intelligence (AI)
Recent advancements in RAG for embodied AI, with a particular focus on applications in planning, task execution, multimodal perception, interaction, and specialized domains.
arXiv Detail & Related papers (2025-03-23T10:33:28Z) - Geometric Properties and Graph-Based Optimization of Neural Networks: Addressing Non-Linearity, Dimensionality, and Scalability [0.0]
This research explores neural networks through geometric metrics and graph structures.<n>It addresses the limited understanding of geometric structures governing neural networks.<n>We identify three key challenges: (1) overcoming linear separability limitations, (2) managing the dimensionality-complexity trade-off, and (3) improving scalability through graph representations.
arXiv Detail & Related papers (2025-02-24T03:36:34Z) - From Noise to Nuance: Advances in Deep Generative Image Models [8.802499769896192]
Deep learning-based image generation has undergone a paradigm shift since 2021.<n>Recent developments in Stable Diffusion, DALL-E, and consistency models have redefined the capabilities and performance boundaries of image synthesis.<n>We investigate how enhanced multi-modal understanding and zero-shot generation capabilities are reshaping practical applications across industries.
arXiv Detail & Related papers (2024-12-12T02:09:04Z) - Towards Advanced Speech Signal Processing: A Statistical Perspective on Convolution-Based Architectures and its Applications [0.0]
This article surveys convolution-based models including convolutional neural networks (CNNs), Conformers, ResNets, and CRNNs-as speech signal processing models.<n>We compare the strengths and weaknesses of each model, identify potential errors and propose avenues for further research, emphasizing the central role it plays in advancing applications of speech technologies.
arXiv Detail & Related papers (2024-11-20T13:01:30Z) - A Comprehensive Survey of Convolutions in Deep Learning: Applications,
Challenges, and Future Trends [5.76466022747257]
Convolutional Neural Networks (CNNs) are used for various computer vision tasks such as image classification, object detection, and image segmentation.
There are numerous types of CNNs designed to meet specific needs and requirements, including 1D, 2D, and 3D CNNs.
It's crucial to gain a thorough understanding and perform a comparative analysis of these different CNN types to understand their strengths and weaknesses.
arXiv Detail & Related papers (2024-02-23T18:28:57Z) - Integration and Performance Analysis of Artificial Intelligence and
Computer Vision Based on Deep Learning Algorithms [5.734290974917728]
This paper focuses on the analysis of the application effectiveness of the integration of deep learning and computer vision technologies.
Deep learning achieves a historic breakthrough by constructing hierarchical neural networks, enabling end-to-end feature learning and semantic understanding of images.
The successful experiences in the field of computer vision provide strong support for training deep learning algorithms.
arXiv Detail & Related papers (2023-12-20T09:37:06Z) - Large Language Models for Information Retrieval: A Survey [58.30439850203101]
Information retrieval has evolved from term-based methods to its integration with advanced neural models.
Recent research has sought to leverage large language models (LLMs) to improve IR systems.
We delve into the confluence of LLMs and IR systems, including crucial aspects such as query rewriters, retrievers, rerankers, and readers.
arXiv Detail & Related papers (2023-08-14T12:47:22Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Tensor Methods in Computer Vision and Deep Learning [120.3881619902096]
tensors, or multidimensional arrays, are data structures that can naturally represent visual data of multiple dimensions.
With the advent of the deep learning paradigm shift in computer vision, tensors have become even more fundamental.
This article provides an in-depth and practical review of tensors and tensor methods in the context of representation learning and deep learning.
arXiv Detail & Related papers (2021-07-07T18:42:45Z) - Variational Structured Attention Networks for Deep Visual Representation
Learning [49.80498066480928]
We propose a unified deep framework to jointly learn both spatial attention maps and channel attention in a principled manner.
Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework.
We implement the inference rules within the neural network, thus allowing for end-to-end learning of the probabilistic and the CNN front-end parameters.
arXiv Detail & Related papers (2021-03-05T07:37:24Z) - Deep Learning for Community Detection: Progress, Challenges and
Opportunities [79.26787486888549]
Article summarizes the contributions of the various frameworks, models, and algorithms in deep neural networks.
This article summarizes the contributions of the various frameworks, models, and algorithms in deep neural networks.
arXiv Detail & Related papers (2020-05-17T11:22:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.