Related papers: Adversarial Training of Two-Layer Polynomial and ReLU Activation Networks via Convex Optimization

Adversarial Training of Two-Layer Polynomial and ReLU Activation Networks via Convex Optimization

URL: http://arxiv.org/abs/2405.14033v2
Date: Wed, 16 Oct 2024 17:52:37 GMT
Title: Adversarial Training of Two-Layer Polynomial and ReLU Activation Networks via Convex Optimization
Authors: Daniel Kuelbs, Sanjay Lall, Mert Pilanci,
Abstract summary: Training neural networks which are robust to adversarial attacks remains an important problem in deep learning. We reformulate the training problems for two-layer ReLU andfty$ activation networks as convex programs. We demonstrate the practical utility of convex adversarial training on large-scale problems.
Score: 40.68266398473983
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Training neural networks which are robust to adversarial attacks remains an important problem in deep learning, especially as heavily overparameterized models are adopted in safety-critical settings. Drawing from recent work which reformulates the training problems for two-layer ReLU and polynomial activation networks as convex programs, we devise a convex semidefinite program (SDP) for adversarial training of two-layer polynomial activation networks and prove that the convex SDP achieves the same globally optimal solution as its nonconvex counterpart. The convex SDP is observed to improve robust test accuracy against $\ell_\infty$ attacks relative to the original convex training formulation on multiple datasets. Additionally, we present scalable implementations of adversarial training for two-layer polynomial and ReLU networks which are compatible with standard machine learning libraries and GPU acceleration. Leveraging these implementations, we retrain the final two fully connected layers of a Pre-Activation ResNet-18 model on the CIFAR-10 dataset with both polynomial and ReLU activations. The two `robustified' models achieve significantly higher robust test accuracies against $\ell_\infty$ attacks than a Pre-Activation ResNet-18 model trained with sharpness-aware minimization, demonstrating the practical utility of convex adversarial training on large-scale problems.

Related papers

Unrolled Neural Networks for Constrained Optimization [83.29547301151177]
Our framework comprises two coupled neural networks that jointly approximate the saddle point of the Lagrangian.<n>We numerically evaluate the framework on mixed-integer quadratic programs and power allocation in wireless networks.
arXiv Detail & Related papers (2026-01-24T03:12:41Z)
Peregrine: One-Shot Fine-Tuning for FHE Inference of General Deep CNNs [5.719717928243504]
We address two fundamental challenges in adapting general deep CNNs for FHE-based inference.<n>The first is approximating non-linear activations such as ReLU with low-degrees while minimizing accuracy degradation.<n>The second is overcoming the cipher capacity barrier that constrains high-resolution image processing on FHE inference.
arXiv Detail & Related papers (2025-11-24T10:47:39Z)
PAPER: Privacy-Preserving ResNet Models using Low-Degree Polynomial Approximations and Structural Optimizations on Leveled FHE [5.819818547073678]
Recent work has made non-interactive privacy-preserving inference more practical by running deep Convolution Neural Network (CNN) with Fully Homomorphic Encryption (FHE)<n>They also depend on high-degree approximations of non-linear activations, which increase multiplicative depth and reduce accuracy by 2-5% compared to plaintext ReLU models.<n>In this work, we focus on ResNets, a widely adopted benchmark architecture in privacy-preserving inference, and close the accuracy gap between their FHE non-interactive models and counterparts.
arXiv Detail & Related papers (2025-09-26T19:10:23Z)
Fast State-Augmented Learning for Wireless Resource Allocation with Dual Variable Regression [83.27791109672927]
We show how a state-augmented graph neural network (GNN) parametrization for the resource allocation policy circumvents the drawbacks of the ubiquitous dual subgradient methods.<n>Lagrangian maximizing state-augmented policies are learned during the offline training phase.<n>We prove a convergence result and an exponential probability bound on the excursions of the dual function (iterate) optimality gaps.
arXiv Detail & Related papers (2025-06-23T15:20:58Z)
A Training Framework for Optimal and Stable Training of Polynomial Neural Networks [0.462761393623313]
Polynomial Neural Networks (PNNs) are pivotal for applications such as privacy-preserving Encryption viaHE.<n>Low-degrees can limit model expressivity, while higher-degrees often suffer from numerical instability and gradient explosion.<n>We introduce a robust and versatile training framework featuring two innovations: 1) a novel Boundary Loss that exponentially penalizes inputs outside a predefined stable range, and 2) Selective Gradient Clipping that effectively tames gradient magnitudes while preserving essential Normalization statistics.
arXiv Detail & Related papers (2025-05-16T18:00:02Z)
Efficient Adversarial Training in LLMs with Continuous Attacks [99.5882845458567]
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails. We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses. C-AdvIPO is an adversarial variant of IPO that does not require utility data for adversarially robust alignment.
arXiv Detail & Related papers (2024-05-24T14:20:09Z)
Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning [81.0108753452546]
We propose Dynamic Reversible Dual-Residual Networks, or Dr$2$Net, to finetune a pretrained model with substantially reduced memory consumption. Dr$2$Net contains two types of residual connections, one maintaining the residual structure in the pretrained models, and the other making the network reversible. We show that Dr$2$Net can reach comparable performance to conventional finetuning but with significantly less memory usage.
arXiv Detail & Related papers (2024-01-08T18:59:31Z)
The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes. In this paper we examine the use of convex neural recovery models. We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program. We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z)
Fixing the NTK: From Neural Network Linearizations to Exact Convex Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data. A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z)
Training Large Scale Polynomial CNNs for E2E Inference over Homomorphic Encryption [33.35896071292604]
Training large-scale CNNs that during inference can be run under Homomorphic Encryption (HE) is challenging. We provide a novel training method for large CNNs such as ResNet-152 and ConvNeXt models.
arXiv Detail & Related papers (2023-04-26T20:41:37Z)
Distributed Adversarial Training to Robustify Deep Neural Networks at Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification. To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training. We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z)
Learning in Feedback-driven Recurrent Spiking Neural Networks using full-FORCE Training [4.124948554183487]
We propose a supervised training procedure for RSNNs, where a second network is introduced only during the training. The proposed training procedure consists of generating targets for both recurrent and readout layers. We demonstrate the improved performance and noise robustness of the proposed full-FORCE training procedure to model 8 dynamical systems.
arXiv Detail & Related papers (2022-05-26T19:01:19Z)
Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions [41.337814204665364]
We develop algorithms for convex optimization of two-layer neural networks with ReLU activation functions. We show that convex gated ReLU models obtain data-dependent approximation bounds for the ReLU training problem.
arXiv Detail & Related papers (2022-02-02T23:50:53Z)
Neural Spectrahedra and Semidefinite Lifts: Global Convex Optimization of Polynomial Activation Neural Networks in Fully Polynomial-Time [31.94590517036704]
We develop exact convex optimization formulations for two-layer numerical networks with second degree activations. We show that semidefinite neural and therefore global optimization is in complexity dimension and sample size for all input data. The proposed approach is significantly faster to obtain better test accuracy compared to the standard backpropagation procedure.
arXiv Detail & Related papers (2021-01-07T08:43:01Z)
A Practical Layer-Parallel Training Algorithm for Residual Networks [41.267919563145604]
gradient-based algorithms for training ResNets typically require a forward pass of the input data, followed by back-propagating the objective gradient to update parameters. We propose a novel serial-parallel hybrid training strategy to enable the use of data augmentation, together with downsampling filters to reduce the communication cost.
arXiv Detail & Related papers (2020-09-03T06:03:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.