Fixing the NTK: From Neural Network Linearizations to Exact Convex
Programs
- URL: http://arxiv.org/abs/2309.15096v1
- Date: Tue, 26 Sep 2023 17:42:52 GMT
- Title: Fixing the NTK: From Neural Network Linearizations to Exact Convex
Programs
- Authors: Rajat Vadiraj Dwaraknath, Tolga Ergen, Mert Pilanci
- Abstract summary: We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data.
A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
- Score: 63.768739279562105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, theoretical analyses of deep neural networks have broadly focused
on two directions: 1) Providing insight into neural network training by SGD in
the limit of infinite hidden-layer width and infinitesimally small learning
rate (also known as gradient flow) via the Neural Tangent Kernel (NTK), and 2)
Globally optimizing the regularized training objective via cone-constrained
convex reformulations of ReLU networks. The latter research direction also
yielded an alternative formulation of the ReLU network, called a gated ReLU
network, that is globally optimizable via efficient unconstrained convex
programs. In this work, we interpret the convex program for this gated ReLU
network as a Multiple Kernel Learning (MKL) model with a weighted data masking
feature map and establish a connection to the NTK. Specifically, we show that
for a particular choice of mask weights that do not depend on the learning
targets, this kernel is equivalent to the NTK of the gated ReLU network on the
training data. A consequence of this lack of dependence on the targets is that
the NTK cannot perform better than the optimal MKL kernel on the training set.
By using iterative reweighting, we improve the weights induced by the NTK to
obtain the optimal MKL kernel which is equivalent to the solution of the exact
convex reformulation of the gated ReLU network. We also provide several
numerical simulations corroborating our theory. Additionally, we provide an
analysis of the prediction error of the resulting optimal kernel via
consistency results for the group lasso.
Related papers
- Equidistribution-based training of Free Knot Splines and ReLU Neural Networks [0.0]
We consider the problem of one-dimensional function approximation using shallow neural networks (NN) with a rectified linear unit (ReLU) activation function.
We show that their ill-conditioning degrades rapidly as the width of the network increases.
We leverage the theory of optimal piecewise linear interpolants to improve the training procedure for a ReLU NN.
arXiv Detail & Related papers (2024-07-02T10:51:36Z) - Reverse Engineering Deep ReLU Networks An Optimization-based Algorithm [0.0]
We present a novel method for reconstructing deep ReLU networks by leveraging convex optimization techniques and a sampling-based approach.
Our research contributes to the growing body of work on reverse engineering deep ReLU networks and paves the way for new advancements in neural network interpretability and security.
arXiv Detail & Related papers (2023-12-07T20:15:06Z) - Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z) - Learning k-Level Structured Sparse Neural Networks Using Group Envelope Regularization [4.0554893636822]
We introduce a novel approach to deploy large-scale Deep Neural Networks on constrained resources.
The method speeds up inference time and aims to reduce memory demand and power consumption.
arXiv Detail & Related papers (2022-12-25T15:40:05Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model
Classes and Cone Decompositions [41.337814204665364]
We develop algorithms for convex optimization of two-layer neural networks with ReLU activation functions.
We show that convex gated ReLU models obtain data-dependent approximation bounds for the ReLU training problem.
arXiv Detail & Related papers (2022-02-02T23:50:53Z) - LocalDrop: A Hybrid Regularization for Deep Neural Networks [98.30782118441158]
We propose a new approach for the regularization of neural networks by the local Rademacher complexity called LocalDrop.
A new regularization function for both fully-connected networks (FCNs) and convolutional neural networks (CNNs) has been developed based on the proposed upper bound of the local Rademacher complexity.
arXiv Detail & Related papers (2021-03-01T03:10:11Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z) - A Revision of Neural Tangent Kernel-based Approaches for Neural Networks [34.75076385561115]
We use the neural tangent kernel to show that networks can fit any finite training sample perfectly.
A simple and analytic kernel function was derived as indeed equivalent to a fully-trained network.
Our tighter analysis resolves the scaling problem and enables the validation of the original NTK-based results.
arXiv Detail & Related papers (2020-07-02T05:07:55Z) - The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural
Networks: an Exact Characterization of the Optimal Solutions [51.60996023961886]
We prove that finding all globally optimal two-layer ReLU neural networks can be performed by solving a convex optimization program with cone constraints.
Our analysis is novel, characterizes all optimal solutions, and does not leverage duality-based analysis which was recently used to lift neural network training into convex spaces.
arXiv Detail & Related papers (2020-06-10T15:38:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.