Related papers: 0/1 Deep Neural Networks via Block Coordinate Descent

0/1 Deep Neural Networks via Block Coordinate Descent

URL: http://arxiv.org/abs/2206.09379v2
Date: Thu, 31 Aug 2023 12:22:15 GMT
Title: 0/1 Deep Neural Networks via Block Coordinate Descent
Authors: Hui Zhang, Shenglong Zhou, Geoffrey Ye Li, Naihua Xiu
Abstract summary: The step function is one of the simplest and most natural activation functions for deep neural networks (DNNs) As it counts 1 for positive variables and 0 for others, its intrinsic characteristics (e.g., discontinuity and no viable information of subgradients) impede its development for decades.
Score: 40.11141921215105
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The step function is one of the simplest and most natural activation functions for deep neural networks (DNNs). As it counts 1 for positive variables and 0 for others, its intrinsic characteristics (e.g., discontinuity and no viable information of subgradients) impede its development for several decades. Even if there is an impressive body of work on designing DNNs with continuous activation functions that can be deemed as surrogates of the step function, it is still in the possession of some advantageous properties, such as complete robustness to outliers and being capable of attaining the best learning-theoretic guarantee of predictive accuracy. Hence, in this paper, we aim to train DNNs with the step function used as an activation function (dubbed as 0/1 DNNs). We first reformulate 0/1 DNNs as an unconstrained optimization problem and then solve it by a block coordinate descend (BCD) method. Moreover, we acquire closed-form solutions for sub-problems of BCD as well as its convergence properties. Furthermore, we also integrate $\ell_{2,0}$-regularization into 0/1 DNN to accelerate the training process and compress the network scale. As a result, the proposed algorithm has a high performance on classifying MNIST and Fashion-MNIST datasets. As a result, the proposed algorithm has a desirable performance on classifying MNIST, FashionMNIST, Cifar10, and Cifar100 datasets.

Related papers

OPAF: Optimized Secure Two-Party Computation Protocols for Nonlinear Activation Functions in Recurrent Neural Network [8.825150825838769]
This paper pays special attention to the implementation of non-linear functions in semi-honest model with two-party settings. We propose a novel and efficient protocol for exponential function by using a divide-and-conquer strategy. Next, we take advantage of the symmetry of sigmoid and Tanh, and fine-tune the inputs to reduce the 2PC building blocks.
arXiv Detail & Related papers (2024-03-01T02:49:40Z)
Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification. Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z)
Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors. Our work is the first attempt to optimize BNNs from the bilinear perspective. We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z)
Graph-adaptive Rectified Linear Unit for Graph Neural Networks [64.92221119723048]
Graph Neural Networks (GNNs) have achieved remarkable success by extending traditional convolution to learning on non-Euclidean data. We propose Graph-adaptive Rectified Linear Unit (GReLU) which is a new parametric activation function incorporating the neighborhood information in a novel and efficient way. We conduct comprehensive experiments to show that our plug-and-play GReLU method is efficient and effective given different GNN backbones and various downstream tasks.
arXiv Detail & Related papers (2022-02-13T10:54:59Z)
Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions [1.7149364927872015]
gradient descent (SGD) type optimization methods perform very effectively in the training of deep neural networks (DNNs) In this work we study SGD type optimization methods in the training of fully-connected feedforward DNNs with rectified linear unit (ReLU) activation.
arXiv Detail & Related papers (2021-12-13T11:45:36Z)
Dynamic Binary Neural Network by learning channel-wise thresholds [9.432747511001246]
We propose a dynamic BNN (DyBNN) incorporating dynamic learnable channel-wise thresholds of Sign function and shift parameters of PReLU. The DyBNN based on two backbones of ReActNet (MobileNetV1 and ResNet18) achieve 71.2% and 67.4% top1-accuracy on ImageNet dataset.
arXiv Detail & Related papers (2021-10-08T17:41:36Z)
Online Limited Memory Neural-Linear Bandits with Likelihood Matching [53.18698496031658]
We study neural-linear bandits for solving problems where both exploration and representation learning play an important role. We propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.
arXiv Detail & Related papers (2021-02-07T14:19:07Z)
An Integer Programming Approach to Deep Neural Networks with Binary Activation Functions [0.0]
We study deep neural networks with binary activation functions (BDNN) We show that the BDNN can be reformulated as a mixed-integer linear program which can be solved to global optimality by classical programming solvers.
arXiv Detail & Related papers (2020-07-07T10:28:20Z)
Self-Directed Online Machine Learning for Topology Optimization [58.920693413667216]
Self-directed Online Learning Optimization integrates Deep Neural Network (DNN) with Finite Element Method (FEM) calculations. Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization. It reduced the computational time by 2 5 orders of magnitude compared with directly using methods, and outperformed all state-of-the-art algorithms tested in our experiments.
arXiv Detail & Related papers (2020-02-04T20:00:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.