PA&DA: Jointly Sampling PAth and DAta for Consistent NAS
- URL: http://arxiv.org/abs/2302.14772v1
- Date: Tue, 28 Feb 2023 17:14:24 GMT
- Title: PA&DA: Jointly Sampling PAth and DAta for Consistent NAS
- Authors: Shun Lu, Yu Hu, Longxing Yang, Zihao Sun, Jilin Mei, Jianchao Tan,
Chengru Song
- Abstract summary: One-shot NAS methods train a supernet and then inherit the pre-trained weights to evaluate sub-models.
Large gradient variance occurs during supernet training, which degrades the supernet ranking consistency.
We propose to explicitly minimize the gradient variance of the supernet training by jointly optimizing the sampling distributions of PAth and DAta.
- Score: 8.737995937682271
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Based on the weight-sharing mechanism, one-shot NAS methods train a supernet
and then inherit the pre-trained weights to evaluate sub-models, largely
reducing the search cost. However, several works have pointed out that the
shared weights suffer from different gradient descent directions during
training. And we further find that large gradient variance occurs during
supernet training, which degrades the supernet ranking consistency. To mitigate
this issue, we propose to explicitly minimize the gradient variance of the
supernet training by jointly optimizing the sampling distributions of PAth and
DAta (PA&DA). We theoretically derive the relationship between the gradient
variance and the sampling distributions, and reveal that the optimal sampling
probability is proportional to the normalized gradient norm of path and
training data. Hence, we use the normalized gradient norm as the importance
indicator for path and training data, and adopt an importance sampling strategy
for the supernet training. Our method only requires negligible computation cost
for optimizing the sampling distributions of path and data, but achieves lower
gradient variance during supernet training and better generalization
performance for the supernet, resulting in a more consistent NAS. We conduct
comprehensive comparisons with other improved approaches in various search
spaces. Results show that our method surpasses others with more reliable
ranking performance and higher accuracy of searched architectures, showing the
effectiveness of our method. Code is available at
https://github.com/ShunLu91/PA-DA.
Related papers
- A Bayesian Approach to Data Point Selection [24.98069363998565]
Data point selection (DPS) is becoming a critical topic in deep learning.
Existing approaches to DPS are predominantly based on a bi-level optimisation (BLO) formulation.
We propose a novel Bayesian approach to DPS.
arXiv Detail & Related papers (2024-11-06T09:04:13Z) - The Sampling-Gaussian for stereo matching [7.9898209414259425]
The soft-argmax operation is widely adopted in neural network-based stereo matching methods.
Previous methods failed to effectively improve the accuracy and even compromises the efficiency of the network.
We propose a novel supervision method for stereo matching, Sampling-Gaussian.
arXiv Detail & Related papers (2024-10-09T03:57:13Z) - ScoreMix: A Scalable Augmentation Strategy for Training GANs with
Limited Data [93.06336507035486]
Generative Adversarial Networks (GANs) typically suffer from overfitting when limited training data is available.
We present ScoreMix, a novel and scalable data augmentation approach for various image synthesis tasks.
arXiv Detail & Related papers (2022-10-27T02:55:15Z) - Learning to Re-weight Examples with Optimal Transport for Imbalanced
Classification [74.62203971625173]
Imbalanced data pose challenges for deep learning based classification models.
One of the most widely-used approaches for tackling imbalanced data is re-weighting.
We propose a novel re-weighting method based on optimal transport (OT) from a distributional point of view.
arXiv Detail & Related papers (2022-08-05T01:23:54Z) - KL Guided Domain Adaptation [88.19298405363452]
Domain adaptation is an important problem and often needed for real-world applications.
A common approach in the domain adaptation literature is to learn a representation of the input that has the same distributions over the source and the target domain.
We show that with a probabilistic representation network, the KL term can be estimated efficiently via minibatch samples.
arXiv Detail & Related papers (2021-06-14T22:24:23Z) - Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning.
Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch.
ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z) - Bandit Samplers for Training Graph Neural Networks [63.17765191700203]
Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs)
These sampling algorithms are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT)
arXiv Detail & Related papers (2020-06-10T12:48:37Z) - Generalized ODIN: Detecting Out-of-distribution Image without Learning
from Out-of-distribution Data [87.61504710345528]
We propose two strategies for freeing a neural network from tuning with OoD data, while improving its OoD detection performance.
We specifically propose to decompose confidence scoring as well as a modified input pre-processing method.
Our further analysis on a larger scale image dataset shows that the two types of distribution shifts, specifically semantic shift and non-semantic shift, present a significant difference.
arXiv Detail & Related papers (2020-02-26T04:18:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.