Related papers: Entropy-based Guidance of Deep Neural Networks for Accelerated Convergence and Improved Performance

Entropy-based Guidance of Deep Neural Networks for Accelerated Convergence and Improved Performance

URL: http://arxiv.org/abs/2308.14938v2
Date: Wed, 3 Jul 2024 23:15:39 GMT
Title: Entropy-based Guidance of Deep Neural Networks for Accelerated Convergence and Improved Performance
Authors: Mackenzie J. Meni, Ryan T. White, Michael Mayo, Kevin Pilkiewicz,
Abstract summary: We derive new mathematical results to measure the changes in entropy as fully-connected and convolutional neural networks process data. By measuring the change in entropy as networks process data effectively, patterns critical to a well-performing network can be visualized and identified. Experiments in image compression, image classification, and image segmentation on benchmark datasets demonstrate these losses guide neural networks to learn rich latent data representations in fewer dimensions.
Score: 0.8749675983608172
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neural networks have dramatically increased our capacity to learn from large, high-dimensional datasets across innumerable disciplines. However, their decisions are not easily interpretable, their computational costs are high, and building and training them are not straightforward processes. To add structure to these efforts, we derive new mathematical results to efficiently measure the changes in entropy as fully-connected and convolutional neural networks process data. By measuring the change in entropy as networks process data effectively, patterns critical to a well-performing network can be visualized and identified. Entropy-based loss terms are developed to improve dense and convolutional model accuracy and efficiency by promoting the ideal entropy patterns. Experiments in image compression, image classification, and image segmentation on benchmark datasets demonstrate these losses guide neural networks to learn rich latent data representations in fewer dimensions, converge in fewer training epochs, and achieve higher accuracy.

Related papers

Neural Collapse is Globally Optimal in Deep Regularized ResNets and Transformers [33.441694038617506]
We prove that global optima of deep regularized transformers and residual networks (ResNets) with LayerNorm trained with cross entropy or mean squared error loss are approximately collapsed.<n>Our theoretical results are supported by experiments on computer vision and language datasets showing that, as the depth grows, neural collapse indeed becomes more prominent.
arXiv Detail & Related papers (2025-05-21T08:16:03Z)
Preserving Information: How does Topological Data Analysis improve Neural Network performance? [0.0]
We introduce a method for integrating Topological Data Analysis (TDA) with Convolutional Neural Networks (CNN) in the context of image recognition. Our approach, further referred to as Vector Stitching, involves combining raw image data with additional topological information. The results of our experiments highlight the potential of incorporating results of additional data analysis into the network's inference process.
arXiv Detail & Related papers (2024-11-27T14:56:05Z)
Quantifying Emergence in Neural Networks: Insights from Pruning and Training Dynamics [0.0]
Emergence, where complex behaviors develop from the interactions of simpler components within a network, plays a crucial role in enhancing capabilities. We introduce a quantitative framework to measure emergence during the training process and examine its impact on network performance. Our hypothesis posits that the degree of emergence, defined by the connectivity between active and inactive nodes, can predict the development of emergent behaviors in the network.
arXiv Detail & Related papers (2024-09-03T03:03:35Z)
Assessing Neural Network Representations During Training Using Noise-Resilient Diffusion Spectral Entropy [55.014926694758195]
Entropy and mutual information in neural networks provide rich information on the learning process. We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures. We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
arXiv Detail & Related papers (2023-12-04T01:32:42Z)
Dynamic Community Detection via Adversarial Temporal Graph Representation Learning [17.487265170798974]
In this work, an adversarial temporal graph representation learning framework is proposed to detect dynamic communities from a small sample of brain network data. In addition, the framework employs adversarial training to guide the learning of temporal graph representation and optimize the measurable modularity loss to maximize the modularity of community.
arXiv Detail & Related papers (2022-06-29T08:44:22Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
On Robustness and Transferability of Convolutional Neural Networks [147.71743081671508]
Modern deep convolutional networks (CNNs) are often criticized for not generalizing under distributional shifts. We study the interplay between out-of-distribution and transfer performance of modern image classification CNNs for the first time. We find that increasing both the training set and model sizes significantly improve the distributional shift robustness.
arXiv Detail & Related papers (2020-07-16T18:39:04Z)
Neural networks adapting to datasets: learning network size and topology [77.34726150561087]
We introduce a flexible setup allowing for a neural network to learn both its size and topology during the course of a gradient-based training. The resulting network has the structure of a graph tailored to the particular learning task and dataset.
arXiv Detail & Related papers (2020-06-22T12:46:44Z)
Understanding the Effects of Data Parallelism and Sparsity on Neural Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity. Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z)
Exploring the Connection Between Binary and Spiking Neural Networks [1.329054857829016]
We bridge the recent algorithmic progress in training Binary Neural Networks and Spiking Neural Networks. We show that training Spiking Neural Networks in the extreme quantization regime results in near full precision accuracies on large-scale datasets.
arXiv Detail & Related papers (2020-02-24T03:46:51Z)
Large-Scale Gradient-Free Deep Learning with Recursive Local Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources. Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize. We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.