PA&DA: Jointly Sampling PAth and DAta for Consistent NAS
- URL: http://arxiv.org/abs/2302.14772v1
- Date: Tue, 28 Feb 2023 17:14:24 GMT
- Title: PA&DA: Jointly Sampling PAth and DAta for Consistent NAS
- Authors: Shun Lu, Yu Hu, Longxing Yang, Zihao Sun, Jilin Mei, Jianchao Tan,
Chengru Song
- Abstract summary: One-shot NAS methods train a supernet and then inherit the pre-trained weights to evaluate sub-models.
Large gradient variance occurs during supernet training, which degrades the supernet ranking consistency.
We propose to explicitly minimize the gradient variance of the supernet training by jointly optimizing the sampling distributions of PAth and DAta.
- Score: 8.737995937682271
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Based on the weight-sharing mechanism, one-shot NAS methods train a supernet
and then inherit the pre-trained weights to evaluate sub-models, largely
reducing the search cost. However, several works have pointed out that the
shared weights suffer from different gradient descent directions during
training. And we further find that large gradient variance occurs during
supernet training, which degrades the supernet ranking consistency. To mitigate
this issue, we propose to explicitly minimize the gradient variance of the
supernet training by jointly optimizing the sampling distributions of PAth and
DAta (PA&DA). We theoretically derive the relationship between the gradient
variance and the sampling distributions, and reveal that the optimal sampling
probability is proportional to the normalized gradient norm of path and
training data. Hence, we use the normalized gradient norm as the importance
indicator for path and training data, and adopt an importance sampling strategy
for the supernet training. Our method only requires negligible computation cost
for optimizing the sampling distributions of path and data, but achieves lower
gradient variance during supernet training and better generalization
performance for the supernet, resulting in a more consistent NAS. We conduct
comprehensive comparisons with other improved approaches in various search
spaces. Results show that our method surpasses others with more reliable
ranking performance and higher accuracy of searched architectures, showing the
effectiveness of our method. Code is available at
https://github.com/ShunLu91/PA-DA.
Related papers
- Stable Target Field for Reduced Variance Score Estimation in Diffusion
Models [5.9115407007859755]
Diffusion models generate samples by reversing a fixed forward diffusion process.
We argue that the source of such variance lies in the handling of intermediate noise-variance scales.
We propose to remedy the problem by incorporating a reference batch which we use to calculate weighted conditional scores as more stable training targets.
arXiv Detail & Related papers (2023-02-01T18:57:01Z) - ScoreMix: A Scalable Augmentation Strategy for Training GANs with
Limited Data [93.06336507035486]
Generative Adversarial Networks (GANs) typically suffer from overfitting when limited training data is available.
We present ScoreMix, a novel and scalable data augmentation approach for various image synthesis tasks.
arXiv Detail & Related papers (2022-10-27T02:55:15Z) - Learning to Re-weight Examples with Optimal Transport for Imbalanced
Classification [74.62203971625173]
Imbalanced data pose challenges for deep learning based classification models.
One of the most widely-used approaches for tackling imbalanced data is re-weighting.
We propose a novel re-weighting method based on optimal transport (OT) from a distributional point of view.
arXiv Detail & Related papers (2022-08-05T01:23:54Z) - KL Guided Domain Adaptation [88.19298405363452]
Domain adaptation is an important problem and often needed for real-world applications.
A common approach in the domain adaptation literature is to learn a representation of the input that has the same distributions over the source and the target domain.
We show that with a probabilistic representation network, the KL term can be estimated efficiently via minibatch samples.
arXiv Detail & Related papers (2021-06-14T22:24:23Z) - On the Importance of Sampling in Learning Graph Convolutional Networks [13.713485304798368]
Graph Convolutional Networks (GCNs) have achieved impressive empirical advancement across a wide variety of graph-related applications.
Despite their success, training GCNs on large graphs suffers from computational and memory issues.
We describe and analyze a general textbftextitdoubly variance reduction schema that can accelerate any sampling method under the memory budget.
arXiv Detail & Related papers (2021-03-03T21:31:23Z) - Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning.
Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch.
ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z) - Bandit Samplers for Training Graph Neural Networks [63.17765191700203]
Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs)
These sampling algorithms are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT)
arXiv Detail & Related papers (2020-06-10T12:48:37Z) - Generalized ODIN: Detecting Out-of-distribution Image without Learning
from Out-of-distribution Data [87.61504710345528]
We propose two strategies for freeing a neural network from tuning with OoD data, while improving its OoD detection performance.
We specifically propose to decompose confidence scoring as well as a modified input pre-processing method.
Our further analysis on a larger scale image dataset shows that the two types of distribution shifts, specifically semantic shift and non-semantic shift, present a significant difference.
arXiv Detail & Related papers (2020-02-26T04:18:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.