Related papers: Fast Certified Robust Training via Better Initialization and Shorter Warmup

Fast Certified Robust Training via Better Initialization and Shorter Warmup

URL: http://arxiv.org/abs/2103.17268v2
Date: Thu, 1 Apr 2021 17:35:36 GMT
Title: Fast Certified Robust Training via Better Initialization and Shorter Warmup
Authors: Zhouxing Shi, Yihan Wang, Huan Zhang, Jinfeng Yi, Cho-Jui Hsieh
Abstract summary: We propose a new IBP and principled regularizers during the warmup stage to stabilize certified bounds. We find that batch normalization (BN) is a crucial architectural element to build best-performing networks for certified training.
Score: 95.81628508228623
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, bound propagation based certified adversarial defense have been proposed for training neural networks with certifiable robustness guarantees. Despite state-of-the-art (SOTA) methods including interval bound propagation (IBP) and CROWN-IBP have per-batch training complexity similar to standard neural network training, to reach SOTA performance they usually need a long warmup schedule with hundreds or thousands epochs and are thus still quite costly for training. In this paper, we discover that the weight initialization adopted by prior works, such as Xavier or orthogonal initialization, which was originally designed for standard network training, results in very loose certified bounds at initialization thus a longer warmup schedule must be used. We also find that IBP based training leads to a significant imbalance in ReLU activation states, which can hamper model performance. Based on our findings, we derive a new IBP initialization as well as principled regularizers during the warmup stage to stabilize certified bounds during initialization and warmup stage, which can significantly reduce the warmup schedule and improve the balance of ReLU activation states. Additionally, we find that batch normalization (BN) is a crucial architectural element to build best-performing networks for certified training, because it helps stabilize bound variance and balance ReLU activation states. With our proposed initialization, regularizers and architectural changes combined, we are able to obtain 65.03% verified error on CIFAR-10 ($\epsilon=\frac{8}{255}$) and 82.13% verified error on TinyImageNet ($\epsilon=\frac{1}{255}$) using very short training schedules (160 and 80 total epochs, respectively), outperforming literature SOTA trained with a few hundreds or thousands epochs.

Related papers

The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws [51.608402959163925]
We present the first systematic exploration of optimal sparse pre-training configurations for large language models. We find that initiating pruning at 25% of total training compute and concluding at 75% achieves near-optimal final evaluation loss. We propose a new scaling law that modifies the Chinchilla scaling law to use the average parameter count over pre-training.
arXiv Detail & Related papers (2025-01-21T20:23:22Z)
Efficiently Training Time-to-First-Spike Spiking Neural Networks from Scratch [39.05124192217359]
Spiking Neural Networks (SNNs) are well-suited for energy-efficient neuromorphic hardware. Time-to-First-Spike (TTFS) coding, which uses a single spike per neuron, offers extreme sparsity and energy efficiency but suffers from unstable training and low accuracy due to its sparse firing. We propose a training framework incorporating parameter normalization, training normalization, temporal output decoding, and pooling layer re-evaluation. Experiments show the framework stabilizes and accelerates training, reduces latency, and achieves state-of-the-art accuracy for TTFS SNNs on M
arXiv Detail & Related papers (2024-10-31T04:14:47Z)
SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection [49.43407207482008]
SpacTor is a new training procedure consisting of a hybrid objective combining span corruption (SC) and token replacement detection (RTD) In our experiments with encoder-decoder architectures (T5) on a variety of NLP tasks, SpacTor-T5 yields the same downstream performance as standard SC pre-training.
arXiv Detail & Related papers (2024-01-24T00:36:13Z)
Overcoming Recency Bias of Normalization Statistics in Continual Learning: Balance and Adaptation [67.77048565738728]
Continual learning involves learning a sequence of tasks and balancing their knowledge appropriately. We propose Adaptive Balance of BN (AdaB$2$N), which incorporates appropriately a Bayesian-based strategy to adapt task-wise contributions. Our approach achieves significant performance gains across a wide range of benchmarks.
arXiv Detail & Related papers (2023-10-13T04:50:40Z)
TAPS: Connecting Certified and Adversarial Training [6.688598900034783]
We propose TAPS, an (unsound) certified training method that combines IBP and PGD training to yield precise, although not necessarily sound, worst-case loss approximations. TAPS achieves a new state-of-the-art in many settings, e.g., reaching a certified accuracy of $22%$ on TinyImageNet.
arXiv Detail & Related papers (2023-05-08T09:32:05Z)
On the Convergence of Certified Robust Training with Interval Bound Propagation [147.77638840942447]
We present a theoretical analysis on the convergence of IBP training. We show that when using IBP training to train a randomly two-layer ReLU neural network with logistic loss, gradient descent can linearly converge to zero robust training error.
arXiv Detail & Related papers (2022-03-16T21:49:13Z)
When to Prune? A Policy towards Early Structural Pruning [27.91996628143805]
We propose a policy that prunes as early as possible during training without hurting performance. Our method yields $1.4%$ top-1 accuracy boost over state-of-the-art pruning counterparts, cuts down training cost on GPU by $2.4times$.
arXiv Detail & Related papers (2021-10-22T18:39:22Z)
FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients. FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z)
Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive Meta-Pruning [83.59005356327103]
A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning. We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset. We validate STAMP against recent advanced pruning methods on benchmark datasets.
arXiv Detail & Related papers (2020-06-22T10:57:43Z)
Pruning Filters while Training for Efficiently Optimizing Deep Learning Networks [6.269700080380206]
Pruning techniques have been proposed that remove less significant weights in deep networks. We propose a dynamic pruning-while-training procedure, wherein we prune filters of a deep network during training itself. Results indicate that pruning while training yields a compressed network with almost no accuracy loss after pruning 50% of the filters.
arXiv Detail & Related papers (2020-03-05T18:05:17Z)
Picking Winning Tickets Before Training by Preserving Gradient Flow [9.67608102763644]
We argue that efficient training requires preserving the gradient flow through the network. We empirically investigate the effectiveness of the proposed method with extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and ImageNet.
arXiv Detail & Related papers (2020-02-18T05:14:47Z)
Activation Density driven Energy-Efficient Pruning in Training [2.222917681321253]
We propose a novel pruning method that prunes a network real-time during training. We obtain exceedingly sparse networks with accuracy comparable to the baseline network.
arXiv Detail & Related papers (2020-02-07T18:34:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.