Graph Learning with Loss-Guided Training
- URL: http://arxiv.org/abs/2006.00460v1
- Date: Sun, 31 May 2020 08:03:06 GMT
- Title: Graph Learning with Loss-Guided Training
- Authors: Eliav Buchnik, Edith Cohen
- Abstract summary: We explore em loss-guided training in a new domain of node embedding methods pioneered by sc DeepWalk.
Our empirical evaluation on a rich collection of datasets shows significant acceleration over the baseline static methods, both in terms of total training performed and overall computation.
- Score: 16.815638149823744
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classically, ML models trained with stochastic gradient descent (SGD) are
designed to minimize the average loss per example and use a distribution of
training examples that remains {\em static} in the course of training. Research
in recent years demonstrated, empirically and theoretically, that significant
acceleration is possible by methods that dynamically adjust the training
distribution in the course of training so that training is more focused on
examples with higher loss. We explore {\em loss-guided training} in a new
domain of node embedding methods pioneered by {\sc DeepWalk}. These methods
work with implicit and large set of positive training examples that are
generated using random walks on the input graph and therefore are not amenable
for typical example selection methods. We propose computationally efficient
methods that allow for loss-guided training in this framework. Our empirical
evaluation on a rich collection of datasets shows significant acceleration over
the baseline static methods, both in terms of total training performed and
overall computation.
Related papers
- Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Tackling Interference Induced by Data Training Loops in A/B Tests: A Weighted Training Approach [6.028247638616059]
We introduce a novel approach called weighted training.
This approach entails training a model to predict the probability of each data point appearing in either the treatment or control data.
We demonstrate that this approach achieves the least variance among all estimators that do not cause shifts in the training distributions.
arXiv Detail & Related papers (2023-10-26T15:52:34Z) - A Data-Centric Approach for Improving Adversarial Training Through the
Lens of Out-of-Distribution Detection [0.4893345190925178]
We propose detecting and removing hard samples directly from the training procedure rather than applying complicated algorithms to mitigate their effects.
Our results on SVHN and CIFAR-10 datasets show the effectiveness of this method in improving the adversarial training without adding too much computational cost.
arXiv Detail & Related papers (2023-01-25T08:13:50Z) - Learning from Data with Noisy Labels Using Temporal Self-Ensemble [11.245833546360386]
Deep neural networks (DNNs) have an enormous capacity to memorize noisy labels.
Current state-of-the-art methods present a co-training scheme that trains dual networks using samples associated with small losses.
We propose a simple yet effective robust training scheme that operates by training only a single network.
arXiv Detail & Related papers (2022-07-21T08:16:31Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - Practical Convex Formulation of Robust One-hidden-layer Neural Network
Training [12.71266194474117]
We show that the training of a one-hidden-layer, scalar-output fully-connected ReLULU neural network can be reformulated as a finite-dimensional convex program.
We derive a convex optimization approach to efficiently solve the "adversarial training" problem.
Our method can be applied to binary classification and regression, and provides an alternative to the current adversarial training methods.
arXiv Detail & Related papers (2021-05-25T22:06:27Z) - An Empirical Comparison of Instance Attribution Methods for NLP [62.63504976810927]
We evaluate the degree to which different potential instance attribution agree with respect to the importance of training samples.
We find that simple retrieval methods yield training instances that differ from those identified via gradient-based methods.
arXiv Detail & Related papers (2021-04-09T01:03:17Z) - Deep learning: a statistical viewpoint [120.94133818355645]
Deep learning has revealed some major surprises from a theoretical perspective.
In particular, simple gradient methods easily find near-perfect solutions to non-optimal training problems.
We conjecture that specific principles underlie these phenomena.
arXiv Detail & Related papers (2021-03-16T16:26:36Z) - Predicting Training Time Without Training [120.92623395389255]
We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function.
We leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model.
We are able to predict the time it takes to fine-tune a model to a given loss without having to perform any training.
arXiv Detail & Related papers (2020-08-28T04:29:54Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Estimating Training Data Influence by Tracing Gradient Descent [21.94989239842377]
TracIn computes the influence of a training example on a prediction made by the model.
TracIn is simple to implement; all it needs is the ability to work agnostic loss functions.
arXiv Detail & Related papers (2020-02-19T22:40:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.