Dual Convexified Convolutional Neural Networks
- URL: http://arxiv.org/abs/2205.14056v1
- Date: Fri, 27 May 2022 15:45:08 GMT
- Title: Dual Convexified Convolutional Neural Networks
- Authors: Site Bai, Chuyang Ke, Jean Honorio
- Abstract summary: We propose the framework of dual convexified convolutional neural networks (DCCNNs)
In this framework, we first introduce a primal learning problem motivated from convexified convolutional neural networks (CCNNs)
We then construct the dual convex training program through careful analysis of the Karush-Kuhn-Tucker (KKT) conditions and Fenchel conjugates.
- Score: 27.0231994885228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose the framework of dual convexified convolutional neural networks
(DCCNNs). In this framework, we first introduce a primal learning problem
motivated from convexified convolutional neural networks (CCNNs), and then
construct the dual convex training program through careful analysis of the
Karush-Kuhn-Tucker (KKT) conditions and Fenchel conjugates. Our approach
reduces the memory overhead of constructing a large kernel matrix and
eliminates the ambiguity of factorizing the matrix. Due to the low-rank
structure in CCNNs and the related subdifferential of nuclear norms, there is
no closed-form expression to recover the primal solution from the dual
solution. To overcome this, we propose a highly novel weight recovery
algorithm, which takes the dual solution and the kernel information as the
input, and recovers the linear and convolutional weights of a CCNN.
Furthermore, our recovery algorithm exploits the low-rank structure and imposes
a small number of filters indirectly, which reduces the parameter size. As a
result, DCCNNs inherit all the statistical benefits of CCNNs, while enjoying a
more formal and efficient workflow.
Related papers
- Compositional Curvature Bounds for Deep Neural Networks [7.373617024876726]
A key challenge that threatens the widespread use of neural networks in safety-critical applications is their vulnerability to adversarial attacks.
We study the second-order behavior of continuously differentiable deep neural networks, focusing on robustness against adversarial perturbations.
We introduce a novel algorithm to analytically compute provable upper bounds on the second derivative of neural networks.
arXiv Detail & Related papers (2024-06-07T17:50:15Z) - The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Fixing the NTK: From Neural Network Linearizations to Exact Convex
Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data.
A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z) - Compacting Binary Neural Networks by Sparse Kernel Selection [58.84313343190488]
This paper is motivated by a previously revealed phenomenon that the binary kernels in successful BNNs are nearly power-law distributed.
We develop the Permutation Straight-Through Estimator (PSTE) that is able to not only optimize the selection process end-to-end but also maintain the non-repetitive occupancy of selected codewords.
Experiments verify that our method reduces both the model size and bit-wise computational costs, and achieves accuracy improvements compared with state-of-the-art BNNs under comparable budgets.
arXiv Detail & Related papers (2023-03-25T13:53:02Z) - On Model Compression for Neural Networks: Framework, Algorithm, and Convergence Guarantee [21.818773423324235]
This paper focuses on two model compression techniques: low-rank approximation and weight approximation.
In this paper, a holistic framework is proposed for model compression from a novel perspective of non optimization.
arXiv Detail & Related papers (2023-03-13T02:14:42Z) - A Recursively Recurrent Neural Network (R2N2) Architecture for Learning
Iterative Algorithms [64.3064050603721]
We generalize Runge-Kutta neural network to a recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms.
We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields similar iterations to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta solvers for ordinary differential equations.
arXiv Detail & Related papers (2022-11-22T16:30:33Z) - Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model
Classes and Cone Decompositions [41.337814204665364]
We develop algorithms for convex optimization of two-layer neural networks with ReLU activation functions.
We show that convex gated ReLU models obtain data-dependent approximation bounds for the ReLU training problem.
arXiv Detail & Related papers (2022-02-02T23:50:53Z) - Kronecker-factored Quasi-Newton Methods for Convolutional Neural
Networks [10.175972095073282]
KF-QN-CNN is a new quasi-factored training convolutional neural networks (CNNs)
KF-QN-CNN consistently exhibited superior performance in all of our tests.
arXiv Detail & Related papers (2021-02-12T19:40:34Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z) - Implicit Convex Regularizers of CNN Architectures: Convex Optimization
of Two- and Three-Layer Networks in Polynomial Time [70.15611146583068]
We study training of Convolutional Neural Networks (CNNs) with ReLU activations.
We introduce exact convex optimization with a complexity with respect to the number of data samples, the number of neurons, and data dimension.
arXiv Detail & Related papers (2020-06-26T04:47:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.