Related papers: On Learnable Parameters of Optimal and Suboptimal Deep Learning Models

On Learnable Parameters of Optimal and Suboptimal Deep Learning Models

URL: http://arxiv.org/abs/2408.11720v1
Date: Wed, 21 Aug 2024 15:50:37 GMT
Title: On Learnable Parameters of Optimal and Suboptimal Deep Learning Models
Authors: Ziwei Zheng, Huizhi Liang, Vaclav Snasel, Vito Latora, Panos Pardalos, Giuseppe Nicosia, Varun Ojha,
Abstract summary: We study the structural and operational aspects of deep learning models. Our research focuses on the nuances of learnable parameters (weight) statistics, distribution, node interaction, and visualization.
Score: 2.889799048595314
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We scrutinize the structural and operational aspects of deep learning models, particularly focusing on the nuances of learnable parameters (weight) statistics, distribution, node interaction, and visualization. By establishing correlations between variance in weight patterns and overall network performance, we investigate the varying (optimal and suboptimal) performances of various deep-learning models. Our empirical analysis extends across widely recognized datasets such as MNIST, Fashion-MNIST, and CIFAR-10, and various deep learning models such as deep neural networks (DNNs), convolutional neural networks (CNNs), and vision transformer (ViT), enabling us to pinpoint characteristics of learnable parameters that correlate with successful networks. Through extensive experiments on the diverse architectures of deep learning models, we shed light on the critical factors that influence the functionality and efficiency of DNNs. Our findings reveal that successful networks, irrespective of datasets or models, are invariably similar to other successful networks in their converged weights statistics and distribution, while poor-performing networks vary in their weights. In addition, our research shows that the learnable parameters of widely varied deep learning models such as DNN, CNN, and ViT exhibit similar learning characteristics.

Related papers

Self-similarity Analysis in Deep Neural Networks [33.89368443432198]
Current research has found that some deep neural networks exhibit strong hierarchical self-similarity in feature representation or parameter distribution.<n>This paper proposes a complex network method based on the output features of hidden-layer neurons to investigate the self-similarity of feature networks constructed at different hidden layers.
arXiv Detail & Related papers (2025-07-23T09:01:53Z)
Generalized Factor Neural Network Model for High-dimensional Regression [50.554377879576066]
We tackle the challenges of modeling high-dimensional data sets with latent low-dimensional structures hidden within complex, non-linear, and noisy relationships. Our approach enables a seamless integration of concepts from non-parametric regression, factor models, and neural networks for high-dimensional regression.
arXiv Detail & Related papers (2025-02-16T23:13:55Z)
Spiking Neural Networks in Vertical Federated Learning: Performance Trade-offs [2.1756721838833797]
Federated machine learning enables model training across multiple clients. Vertical Federated Learning (VFL) deals with instances where the clients have different feature sets of the same samples. Spiking Neural Networks (SNNs) are being leveraged to enable fast and accurate processing at the edge.
arXiv Detail & Related papers (2024-07-24T23:31:02Z)
Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning. Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z)
Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning. Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation. Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z)
Super Consistency of Neural Network Landscapes and Learning Rate Transfer [72.54450821671624]
We study the landscape through the lens of the loss Hessian. We find that certain spectral properties under $mu$P are largely independent of the size of the network. We show that in the Neural Tangent Kernel (NTK) and other scaling regimes, the sharpness exhibits very different dynamics at different scales.
arXiv Detail & Related papers (2024-02-27T12:28:01Z)
Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity. Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z)
LowDINO -- A Low Parameter Self Supervised Learning Model [0.0]
This research aims to explore the possibility of designing a neural network architecture that allows for small networks to adopt the properties of huge networks. Previous studies have shown that using convolutional neural networks (ConvNets) can provide inherent inductive bias. To reduce the number of parameters, attention mechanisms are utilized through the usage of MobileViT blocks.
arXiv Detail & Related papers (2023-05-28T18:34:59Z)
A Comprehensive Overview and Comparative Analysis on Deep Learning Models: CNN, RNN, LSTM, GRU [0.40498500266986387]
Deep learning (DL) has emerged as a powerful subset of machine learning (ML) and artificial intelligence (AI) Its impact spans across various domains, including speech recognition, healthcare, autonomous vehicles, cybersecurity, predictive analytics, and more. We conduct a comprehensive survey of various deep learning models, including CNNs, Recurrent Neural Networks (RNNs), Generative Models, Deep Reinforcement Learning (DRL), and Deep Transfer Learning.
arXiv Detail & Related papers (2023-05-27T13:23:21Z)
Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters. We find that our approach successfully generates parameters for a wide range of loss prompts. We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z)
Characterizing Learning Dynamics of Deep Neural Networks via Complex Networks [1.0869257688521987]
Complex Network Theory (CNT) represents Deep Neural Networks (DNNs) as directed weighted graphs to study them as dynamical systems. We introduce metrics for nodes/neurons and layers, namely Nodes Strength and Layers Fluctuation. Our framework distills trends in the learning dynamics and separates low from high accurate networks.
arXiv Detail & Related papers (2021-10-06T10:03:32Z)
The large learning rate phase of deep learning: the catapult mechanism [50.23041928811575]
We present a class of neural networks with solvable training dynamics. We find good agreement between our model's predictions and training dynamics in realistic deep learning settings. We believe our results shed light on characteristics of models trained at different learning rates.
arXiv Detail & Related papers (2020-03-04T17:52:48Z)
Inferring Convolutional Neural Networks' accuracies from their architectural characterizations [0.0]
We study the relationships between a CNN's architecture and its performance. We show that the attributes can be predictive of the networks' performance in two specific computer vision-based physics problems. We use machine learning models to predict whether a network can perform better than a certain threshold accuracy before training.
arXiv Detail & Related papers (2020-01-07T16:41:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.