Related papers: The BUTTER Zone: An Empirical Study of Training Dynamics in Fully Connected Neural Networks

The BUTTER Zone: An Empirical Study of Training Dynamics in Fully Connected Neural Networks

URL: http://arxiv.org/abs/2207.12547v2
Date: Mon, 16 Oct 2023 18:40:54 GMT
Title: The BUTTER Zone: An Empirical Study of Training Dynamics in Fully Connected Neural Networks
Authors: Charles Edison Tripp, Jordan Perr-Sauer, Lucas Hayne, Monte Lunacek, Jamil Gafur
Abstract summary: We present an empirical dataset surveying the deep learning phenomenon on fully-connected feed-forward perceptron neural networks. The dataset records the per-epoch training and generalization performance of 483 thousand distinct hyper parameter choices. Repeating each experiment an average of 24 times resulted in 11 million total training runs and 40 billion epochs recorded.
Score: 0.562479170374811
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present an empirical dataset surveying the deep learning phenomenon on fully-connected feed-forward multilayer perceptron neural networks. The dataset, which is now freely available online, records the per-epoch training and generalization performance of 483 thousand distinct hyperparameter choices of architectures, tasks, depths, network sizes (number of parameters), learning rates, batch sizes, and regularization penalties. Repeating each experiment an average of 24 times resulted in 11 million total training runs and 40 billion epochs recorded. Accumulating this 1.7 TB dataset utilized 11 thousand CPU core-years, 72.3 GPU-years, and 163 node-years. In surveying the dataset, we observe durable patterns persisting across tasks and topologies. We aim to spark scientific study of machine learning techniques as a catalyst for the theoretical discoveries needed to progress the field beyond energy-intensive and heuristic practices.

Related papers

On Neural Inertial Classification Networks for Pedestrian Activity Recognition [2.374912052693646]
Inertial sensors are crucial for recognizing pedestrian activity. Recent advances in deep learning have greatly improved inertial sensing performance and robustness. Different domains and platforms use deep-learning techniques to enhance network performance, but there is no common benchmark. The aim of this paper is to fill this gap by defining and analyzing ten data-driven techniques for improving neural inertial classification networks.
arXiv Detail & Related papers (2025-02-23T08:15:26Z)
Multi-modal Data Fusion and Deep Ensemble Learning for Accurate Crop Yield Prediction [0.0]
This study introduces RicEns-Net, a novel Deep Ensemble model designed to predict crop yields. The research focuses on the use of synthetic aperture radar (SAR), optical remote sensing data from Sentinel 1, 2, and 3 satellites, and meteorological measurements such as surface temperature and rainfall. The primary objective is to enhance the precision of crop yield prediction by developing a machine-learning framework capable of handling complex environmental data.
arXiv Detail & Related papers (2025-02-09T22:48:27Z)
Differentiable architecture search with multi-dimensional attention for spiking neural networks [4.318876451929319]
Spiking Neural Networks (SNNs) have gained enormous popularity in the field of artificial intelligence. The majority of SNN methods directly inherit the structure of Artificial Neural Networks (ANN) We propose Multi-Attention Differentiable Architecture Search (MA-DARTS) to directly automate the search for the optimal network structure of SNNs.
arXiv Detail & Related papers (2024-11-01T07:18:32Z)
Scaling Wearable Foundation Models [54.93979158708164]
We investigate the scaling properties of sensor foundation models across compute, data, and model size. Using a dataset of up to 40 million hours of in-situ heart rate, heart rate variability, electrodermal activity, accelerometer, skin temperature, and altimeter per-minute data from over 165,000 people, we create LSM. Our results establish the scaling laws of LSM for tasks such as imputation, extrapolation, both across time and sensor modalities.
arXiv Detail & Related papers (2024-10-17T15:08:21Z)
TENNs-PLEIADES: Building Temporal Kernels with Orthogonal Polynomials [1.1970409518725493]
We focus on interfacing these networks with event-based data to perform online classification and detection with low latency. We experimented with three event-based benchmarks and obtained state-of-the-art results on all three by large margins with significantly smaller memory and compute costs.
arXiv Detail & Related papers (2024-05-20T17:06:24Z)
Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning [50.809769498312434]
We propose a novel dataset pruning method termed as Temporal Dual-Depth Scoring (TDDS) Our method achieves 54.51% accuracy with only 10% training data, surpassing random selection by 7.83% and other comparison methods by at least 12.69%.
arXiv Detail & Related papers (2023-11-22T03:45:30Z)
Data Augmentations in Deep Weight Spaces [89.45272760013928]
We introduce a novel augmentation scheme based on the Mixup method. We evaluate the performance of these techniques on existing benchmarks as well as new benchmarks we generate.
arXiv Detail & Related papers (2023-11-15T10:43:13Z)
Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training [1.5301777464637454]
We propose a novel approach that exploits sparseworks to optimize the memory utilization and communication in two popular algorithms for parallel deep learning. We integrate our approach into AxoNN, a highly scalable framework for parallel deep learning, and demonstrate the reduction in communication time and memory utilization.
arXiv Detail & Related papers (2023-02-10T04:22:25Z)
Accelerating Domain-aware Deep Learning Models with Distributed Training [0.8164433158925593]
We present a novel distributed domain-aware network that utilizes domain-specific knowledge with improved model performance. From our analysis, the network effectively predicts high peaks in discharge measurements at watershed outlets with up to 4.1x speedup. Our approach achieved a 12.6x overall speedup and the mean prediction performance by 16%.
arXiv Detail & Related papers (2023-01-25T22:59:47Z)
Accelerating Training and Inference of Graph Neural Networks with Fast Sampling and Pipelining [58.10436813430554]
Mini-batch training of graph neural networks (GNNs) requires a lot of computation and data movement. We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment. We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler. We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised.
arXiv Detail & Related papers (2021-10-16T02:41:35Z)
Understanding the Effects of Data Parallelism and Sparsity on Neural Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity. Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z)
Predicting Neural Network Accuracy from Weights [25.73213712719546]
We show experimentally that the accuracy of a trained neural network can be predicted surprisingly well by looking only at its weights. We release a collection of 120k convolutional neural networks trained on four different datasets to encourage further research in this area.
arXiv Detail & Related papers (2020-02-26T13:06:14Z)
Deep Learning based Pedestrian Inertial Navigation: Methods, Dataset and On-Device Inference [49.88536971774444]
Inertial measurements units (IMUs) are small, cheap, energy efficient, and widely employed in smart devices and mobile robots. Exploiting inertial data for accurate and reliable pedestrian navigation supports is a key component for emerging Internet-of-Things applications and services. We present and release the Oxford Inertial Odometry dataset (OxIOD), a first-of-its-kind public dataset for deep learning based inertial navigation research.
arXiv Detail & Related papers (2020-01-13T04:41:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.