0/1 Deep Neural Networks via Block Coordinate Descent
        - URL: http://arxiv.org/abs/2206.09379v2
- Date: Thu, 31 Aug 2023 12:22:15 GMT
- Title: 0/1 Deep Neural Networks via Block Coordinate Descent
- Authors: Hui Zhang, Shenglong Zhou, Geoffrey Ye Li, Naihua Xiu
- Abstract summary: The step function is one of the simplest and most natural activation functions for deep neural networks (DNNs)
As it counts 1 for positive variables and 0 for others, its intrinsic characteristics (e.g., discontinuity and no viable information of subgradients) impede its development for decades.
- Score: 40.11141921215105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   The step function is one of the simplest and most natural activation
functions for deep neural networks (DNNs). As it counts 1 for positive
variables and 0 for others, its intrinsic characteristics (e.g., discontinuity
and no viable information of subgradients) impede its development for several
decades. Even if there is an impressive body of work on designing DNNs with
continuous activation functions that can be deemed as surrogates of the step
function, it is still in the possession of some advantageous properties, such
as complete robustness to outliers and being capable of attaining the best
learning-theoretic guarantee of predictive accuracy. Hence, in this paper, we
aim to train DNNs with the step function used as an activation function (dubbed
as 0/1 DNNs). We first reformulate 0/1 DNNs as an unconstrained optimization
problem and then solve it by a block coordinate descend (BCD) method. Moreover,
we acquire closed-form solutions for sub-problems of BCD as well as its
convergence properties. Furthermore, we also integrate
$\ell_{2,0}$-regularization into 0/1 DNN to accelerate the training process and
compress the network scale. As a result, the proposed algorithm has a high
performance on classifying MNIST and Fashion-MNIST datasets. As a result, the
proposed algorithm has a desirable performance on classifying MNIST,
FashionMNIST, Cifar10, and Cifar100 datasets.
 
      
        Related papers
        - OPAF: Optimized Secure Two-Party Computation Protocols for Nonlinear   Activation Functions in Recurrent Neural Network [8.825150825838769]
 This paper pays special attention to the implementation of non-linear functions in semi-honest model with two-party settings.
We propose a novel and efficient protocol for exponential function by using a divide-and-conquer strategy.
Next, we take advantage of the symmetry of sigmoid and Tanh, and fine-tune the inputs to reduce the 2PC building blocks.
 arXiv  Detail & Related papers  (2024-03-01T02:49:40Z)
- Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
 We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
 arXiv  Detail & Related papers  (2023-05-30T19:37:44Z)
- Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
 BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors.
Our work is the first attempt to optimize BNNs from the bilinear perspective.
We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
 arXiv  Detail & Related papers  (2022-09-04T06:45:33Z)
- Graph-adaptive Rectified Linear Unit for Graph Neural Networks [64.92221119723048]
 Graph Neural Networks (GNNs) have achieved remarkable success by extending traditional convolution to learning on non-Euclidean data.
We propose Graph-adaptive Rectified Linear Unit (GReLU) which is a new parametric activation function incorporating the neighborhood information in a novel and efficient way.
We conduct comprehensive experiments to show that our plug-and-play GReLU method is efficient and effective given different GNN backbones and various downstream tasks.
 arXiv  Detail & Related papers  (2022-02-13T10:54:59Z)
- Convergence proof for stochastic gradient descent in the training of
  deep neural networks with ReLU activation for constant target functions [1.7149364927872015]
 gradient descent (SGD) type optimization methods perform very effectively in the training of deep neural networks (DNNs)
In this work we study SGD type optimization methods in the training of fully-connected feedforward DNNs with rectified linear unit (ReLU) activation.
 arXiv  Detail & Related papers  (2021-12-13T11:45:36Z)
- Dynamic Binary Neural Network by learning channel-wise thresholds [9.432747511001246]
 We propose a dynamic BNN (DyBNN) incorporating dynamic learnable channel-wise thresholds of Sign function and shift parameters of PReLU.
The DyBNN based on two backbones of ReActNet (MobileNetV1 and ResNet18) achieve 71.2% and 67.4% top1-accuracy on ImageNet dataset.
 arXiv  Detail & Related papers  (2021-10-08T17:41:36Z)
- Online Limited Memory Neural-Linear Bandits with Likelihood Matching [53.18698496031658]
 We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.
We propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.
 arXiv  Detail & Related papers  (2021-02-07T14:19:07Z)
- An Integer Programming Approach to Deep Neural Networks with Binary
  Activation Functions [0.0]
 We study deep neural networks with binary activation functions (BDNN)
We show that the BDNN can be reformulated as a mixed-integer linear program which can be solved to global optimality by classical programming solvers.
 arXiv  Detail & Related papers  (2020-07-07T10:28:20Z)
- Self-Directed Online Machine Learning for Topology Optimization [58.920693413667216]
 Self-directed Online Learning Optimization integrates Deep Neural Network (DNN) with Finite Element Method (FEM) calculations.
Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization.
It reduced the computational time by 2  5 orders of magnitude compared with directly using methods, and outperformed all state-of-the-art algorithms tested in our experiments.
 arXiv  Detail & Related papers  (2020-02-04T20:00:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.