Large Deviations for Accelerating Neural Networks Training
- URL: http://arxiv.org/abs/2303.00954v1
- Date: Thu, 2 Mar 2023 04:14:05 GMT
- Title: Large Deviations for Accelerating Neural Networks Training
- Authors: Sreelekha Guggilam, Varun Chandola, Abani Patra
- Abstract summary: We propose the LAD Improved Iterative Training (LIIT), a novel training approach for ANN using large deviations principle.
The LIIT approach uses a Modified Training Sample (MTS) that is generated and iteratively updated using a LAD anomaly score based sampling strategy.
The MTS sample is designed to be well representative of the training data by including most anomalous of the observations in each class.
- Score: 5.864710987890994
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Artificial neural networks (ANNs) require tremendous amount of data to train
on. However, in classification models, most data features are often similar
which can lead to increase in training time without significant improvement in
the performance. Thus, we hypothesize that there could be a more efficient way
to train an ANN using a better representative sample. For this, we propose the
LAD Improved Iterative Training (LIIT), a novel training approach for ANN using
large deviations principle to generate and iteratively update training samples
in a fast and efficient setting. This is exploratory work with extensive
opportunities for future work. The thesis presents this ongoing research work
with the following contributions from this study: (1) We propose a novel ANN
training method, LIIT, based on the large deviations theory where additional
dimensionality reduction is not needed to study high dimensional data. (2) The
LIIT approach uses a Modified Training Sample (MTS) that is generated and
iteratively updated using a LAD anomaly score based sampling strategy. (3) The
MTS sample is designed to be well representative of the training data by
including most anomalous of the observations in each class. This ensures
distinct patterns and features are learnt with smaller samples. (4) We study
the classification performance of the LIIT trained ANNs with traditional batch
trained counterparts.
Related papers
- BEND: Bagging Deep Learning Training Based on Efficient Neural Network Diffusion [56.9358325168226]
We propose a Bagging deep learning training algorithm based on Efficient Neural network Diffusion (BEND)
Our approach is simple but effective, first using multiple trained model weights and biases as inputs to train autoencoder and latent diffusion model.
Our proposed BEND algorithm can consistently outperform the mean and median accuracies of both the original trained model and the diffused model.
arXiv Detail & Related papers (2024-03-23T08:40:38Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in
Dense Encoders [63.28408887247742]
We study whether training procedures can be improved to yield better generalization capabilities in the resulting models.
We recommend a simple recipe for training dense encoders: Train on MSMARCO with parameter-efficient methods, such as LoRA, and opt for using in-batch negatives unless given well-constructed hard negatives.
arXiv Detail & Related papers (2023-11-16T10:42:58Z) - KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training [2.8804804517897935]
We propose a method for hiding the least-important samples during the training of deep neural networks.
We adaptively find samples to exclude in a given epoch based on their contribution to the overall learning process.
Our method can reduce total training time by up to 22% impacting accuracy only by 0.4% compared to the baseline.
arXiv Detail & Related papers (2023-10-16T06:19:29Z) - Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks.
We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z) - Effective and Efficient Training for Sequential Recommendation using
Recency Sampling [91.02268704681124]
We propose a novel Recency-based Sampling of Sequences training objective.
We show that the models enhanced with our method can achieve performances exceeding or very close to stateof-the-art BERT4Rec.
arXiv Detail & Related papers (2022-07-06T13:06:31Z) - Revisiting the Updates of a Pre-trained Model for Few-shot Learning [11.871523410051527]
We compare the two popular updating methods, fine-tuning and linear probing.
We find that fine-tuning is better than linear probing as the number of samples increases.
arXiv Detail & Related papers (2022-05-13T08:47:06Z) - A Mixed Integer Programming Approach to Training Dense Neural Networks [0.0]
We propose novel mixed integer programming (MIP) formulations for training fully-connected ANNs.
Our formulations can account for both binary activation and rectified linear unit (ReLU) activation ANNs.
We also develop a layer-wise greedy approach, a technique adapted for reducing the number of layers in the ANN, for model pretraining.
arXiv Detail & Related papers (2022-01-03T15:53:51Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - A Novel Training Protocol for Performance Predictors of Evolutionary
Neural Architecture Search Algorithms [10.658358586764171]
Evolutionary Neural Architecture Search (ENAS) can automatically design the architectures of Deep Neural Networks (DNNs) using evolutionary computation algorithms.
Performance predictors are a type of regression models which can assist to accomplish the search, while without exerting much computational resource.
We propose a new training protocol to address these issues, consisting of designing a pairwise ranking indicator to construct the training target, proposing to use the logistic regression to fit the training samples, and developing a differential method to building the training instances.
arXiv Detail & Related papers (2020-08-30T14:39:28Z) - Reinforced Curriculum Learning on Pre-trained Neural Machine Translation
Models [20.976165305749777]
We learn a curriculum for improving a pre-trained NMT model by re-selecting influential data samples from the original training set.
We propose a data selection framework based on Deterministic Actor-Critic, in which a critic network predicts the expected change of model performance.
arXiv Detail & Related papers (2020-04-13T03:40:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.