Related papers: Dual Convexified Convolutional Neural Networks

Dual Convexified Convolutional Neural Networks

URL: http://arxiv.org/abs/2205.14056v1
Date: Fri, 27 May 2022 15:45:08 GMT
Title: Dual Convexified Convolutional Neural Networks
Authors: Site Bai, Chuyang Ke, Jean Honorio
Abstract summary: We propose the framework of dual convexified convolutional neural networks (DCCNNs) In this framework, we first introduce a primal learning problem motivated from convexified convolutional neural networks (CCNNs) We then construct the dual convex training program through careful analysis of the Karush-Kuhn-Tucker (KKT) conditions and Fenchel conjugates.
Score: 27.0231994885228
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose the framework of dual convexified convolutional neural networks (DCCNNs). In this framework, we first introduce a primal learning problem motivated from convexified convolutional neural networks (CCNNs), and then construct the dual convex training program through careful analysis of the Karush-Kuhn-Tucker (KKT) conditions and Fenchel conjugates. Our approach reduces the memory overhead of constructing a large kernel matrix and eliminates the ambiguity of factorizing the matrix. Due to the low-rank structure in CCNNs and the related subdifferential of nuclear norms, there is no closed-form expression to recover the primal solution from the dual solution. To overcome this, we propose a highly novel weight recovery algorithm, which takes the dual solution and the kernel information as the input, and recovers the linear and convolutional weights of a CCNN. Furthermore, our recovery algorithm exploits the low-rank structure and imposes a small number of filters indirectly, which reduces the parameter size. As a result, DCCNNs inherit all the statistical benefits of CCNNs, while enjoying a more formal and efficient workflow.

Related papers

Convexity in ReLU Neural Networks: beyond ICNNs? [17.01649106055384]
We show that every convex function implemented by a 1-hidden-layer ReLU network can be expressed by an ICNN with the same architecture. We also provide a numerical procedure that allows an exact check of convexity for ReLU neural networks with a large number of affine regions.
arXiv Detail & Related papers (2025-01-06T13:53:59Z)
Compositional Curvature Bounds for Deep Neural Networks [7.373617024876726]
A key challenge that threatens the widespread use of neural networks in safety-critical applications is their vulnerability to adversarial attacks. We study the second-order behavior of continuously differentiable deep neural networks, focusing on robustness against adversarial perturbations. We introduce a novel algorithm to analytically compute provable upper bounds on the second derivative of neural networks.
arXiv Detail & Related papers (2024-06-07T17:50:15Z)
The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes. In this paper we examine the use of convex neural recovery models. We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program. We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z)
Fixing the NTK: From Neural Network Linearizations to Exact Convex Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data. A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z)
Compacting Binary Neural Networks by Sparse Kernel Selection [58.84313343190488]
This paper is motivated by a previously revealed phenomenon that the binary kernels in successful BNNs are nearly power-law distributed. We develop the Permutation Straight-Through Estimator (PSTE) that is able to not only optimize the selection process end-to-end but also maintain the non-repetitive occupancy of selected codewords. Experiments verify that our method reduces both the model size and bit-wise computational costs, and achieves accuracy improvements compared with state-of-the-art BNNs under comparable budgets.
arXiv Detail & Related papers (2023-03-25T13:53:02Z)
On Model Compression for Neural Networks: Framework, Algorithm, and Convergence Guarantee [21.818773423324235]
This paper focuses on two model compression techniques: low-rank approximation and weight approximation. In this paper, a holistic framework is proposed for model compression from a novel perspective of non optimization.
arXiv Detail & Related papers (2023-03-13T02:14:42Z)
A Recursively Recurrent Neural Network (R2N2) Architecture for Learning Iterative Algorithms [64.3064050603721]
We generalize Runge-Kutta neural network to a recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms. We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields similar iterations to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta solvers for ordinary differential equations.
arXiv Detail & Related papers (2022-11-22T16:30:33Z)
Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions [41.337814204665364]
We develop algorithms for convex optimization of two-layer neural networks with ReLU activation functions. We show that convex gated ReLU models obtain data-dependent approximation bounds for the ReLU training problem.
arXiv Detail & Related papers (2022-02-02T23:50:53Z)
Kronecker-factored Quasi-Newton Methods for Convolutional Neural Networks [10.175972095073282]
KF-QN-CNN is a new quasi-factored training convolutional neural networks (CNNs) KF-QN-CNN consistently exhibited superior performance in all of our tests.
arXiv Detail & Related papers (2021-02-12T19:40:34Z)
ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN. We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z)
Implicit Convex Regularizers of CNN Architectures: Convex Optimization of Two- and Three-Layer Networks in Polynomial Time [70.15611146583068]
We study training of Convolutional Neural Networks (CNNs) with ReLU activations. We introduce exact convex optimization with a complexity with respect to the number of data samples, the number of neurons, and data dimension.
arXiv Detail & Related papers (2020-06-26T04:47:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.