Fast Certified Robust Training via Better Initialization and Shorter
Warmup
- URL: http://arxiv.org/abs/2103.17268v2
- Date: Thu, 1 Apr 2021 17:35:36 GMT
- Title: Fast Certified Robust Training via Better Initialization and Shorter
Warmup
- Authors: Zhouxing Shi, Yihan Wang, Huan Zhang, Jinfeng Yi, Cho-Jui Hsieh
- Abstract summary: We propose a new IBP and principled regularizers during the warmup stage to stabilize certified bounds.
We find that batch normalization (BN) is a crucial architectural element to build best-performing networks for certified training.
- Score: 95.81628508228623
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, bound propagation based certified adversarial defense have been
proposed for training neural networks with certifiable robustness guarantees.
Despite state-of-the-art (SOTA) methods including interval bound propagation
(IBP) and CROWN-IBP have per-batch training complexity similar to standard
neural network training, to reach SOTA performance they usually need a long
warmup schedule with hundreds or thousands epochs and are thus still quite
costly for training. In this paper, we discover that the weight initialization
adopted by prior works, such as Xavier or orthogonal initialization, which was
originally designed for standard network training, results in very loose
certified bounds at initialization thus a longer warmup schedule must be used.
We also find that IBP based training leads to a significant imbalance in ReLU
activation states, which can hamper model performance. Based on our findings,
we derive a new IBP initialization as well as principled regularizers during
the warmup stage to stabilize certified bounds during initialization and warmup
stage, which can significantly reduce the warmup schedule and improve the
balance of ReLU activation states. Additionally, we find that batch
normalization (BN) is a crucial architectural element to build best-performing
networks for certified training, because it helps stabilize bound variance and
balance ReLU activation states. With our proposed initialization, regularizers
and architectural changes combined, we are able to obtain 65.03% verified error
on CIFAR-10 ($\epsilon=\frac{8}{255}$) and 82.13% verified error on
TinyImageNet ($\epsilon=\frac{1}{255}$) using very short training schedules
(160 and 80 total epochs, respectively), outperforming literature SOTA trained
with a few hundreds or thousands epochs.
Related papers
- SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced
Token Detection [49.43407207482008]
SpacTor is a new training procedure consisting of a hybrid objective combining span corruption (SC) and token replacement detection (RTD)
In our experiments with encoder-decoder architectures (T5) on a variety of NLP tasks, SpacTor-T5 yields the same downstream performance as standard SC pre-training.
arXiv Detail & Related papers (2024-01-24T00:36:13Z) - Overcoming Recency Bias of Normalization Statistics in Continual
Learning: Balance and Adaptation [67.77048565738728]
Continual learning involves learning a sequence of tasks and balancing their knowledge appropriately.
We propose Adaptive Balance of BN (AdaB$2$N), which incorporates appropriately a Bayesian-based strategy to adapt task-wise contributions.
Our approach achieves significant performance gains across a wide range of benchmarks.
arXiv Detail & Related papers (2023-10-13T04:50:40Z) - TAPS: Connecting Certified and Adversarial Training [6.688598900034783]
We propose TAPS, an (unsound) certified training method that combines IBP and PGD training to yield precise, although not necessarily sound, worst-case loss approximations.
TAPS achieves a new state-of-the-art in many settings, e.g., reaching a certified accuracy of $22%$ on TinyImageNet.
arXiv Detail & Related papers (2023-05-08T09:32:05Z) - On the Convergence of Certified Robust Training with Interval Bound
Propagation [147.77638840942447]
We present a theoretical analysis on the convergence of IBP training.
We show that when using IBP training to train a randomly two-layer ReLU neural network with logistic loss, gradient descent can linearly converge to zero robust training error.
arXiv Detail & Related papers (2022-03-16T21:49:13Z) - When to Prune? A Policy towards Early Structural Pruning [27.91996628143805]
We propose a policy that prunes as early as possible during training without hurting performance.
Our method yields $1.4%$ top-1 accuracy boost over state-of-the-art pruning counterparts, cuts down training cost on GPU by $2.4times$.
arXiv Detail & Related papers (2021-10-22T18:39:22Z) - FracTrain: Fractionally Squeezing Bit Savings Both Temporally and
Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.
FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z) - Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive
Meta-Pruning [83.59005356327103]
A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning.
We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset.
We validate STAMP against recent advanced pruning methods on benchmark datasets.
arXiv Detail & Related papers (2020-06-22T10:57:43Z) - Pruning Filters while Training for Efficiently Optimizing Deep Learning
Networks [6.269700080380206]
Pruning techniques have been proposed that remove less significant weights in deep networks.
We propose a dynamic pruning-while-training procedure, wherein we prune filters of a deep network during training itself.
Results indicate that pruning while training yields a compressed network with almost no accuracy loss after pruning 50% of the filters.
arXiv Detail & Related papers (2020-03-05T18:05:17Z) - Picking Winning Tickets Before Training by Preserving Gradient Flow [9.67608102763644]
We argue that efficient training requires preserving the gradient flow through the network.
We empirically investigate the effectiveness of the proposed method with extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and ImageNet.
arXiv Detail & Related papers (2020-02-18T05:14:47Z) - Activation Density driven Energy-Efficient Pruning in Training [2.222917681321253]
We propose a novel pruning method that prunes a network real-time during training.
We obtain exceedingly sparse networks with accuracy comparable to the baseline network.
arXiv Detail & Related papers (2020-02-07T18:34:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.