Zero Stability Well Predicts Performance of Convolutional Neural
Networks
- URL: http://arxiv.org/abs/2206.13100v1
- Date: Mon, 27 Jun 2022 08:07:08 GMT
- Title: Zero Stability Well Predicts Performance of Convolutional Neural
Networks
- Authors: Liangming Chen, Long Jin, Mingsheng Shang
- Abstract summary: We find that if a discrete solver of an ordinary differential equation is zero stable, the CNN corresponding to that solver performs well.
Based on the preliminary observation, we provide a higher-order discretization to construct CNNs and then propose a zero-stable network (ZeroSNet)
To guarantee zero stability of the ZeroSNet, we first deduce a structure that meets consistency conditions and then give a zero stable region of a training-free parameter.
- Score: 6.965550605588623
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The question of what kind of convolutional neural network (CNN) structure
performs well is fascinating. In this work, we move toward the answer with one
more step by connecting zero stability and model performance. Specifically, we
found that if a discrete solver of an ordinary differential equation is zero
stable, the CNN corresponding to that solver performs well. We first give the
interpretation of zero stability in the context of deep learning and then
investigate the performance of existing first- and second-order CNNs under
different zero-stable circumstances. Based on the preliminary observation, we
provide a higher-order discretization to construct CNNs and then propose a
zero-stable network (ZeroSNet). To guarantee zero stability of the ZeroSNet, we
first deduce a structure that meets consistency conditions and then give a zero
stable region of a training-free parameter. By analyzing the roots of a
characteristic equation, we theoretically obtain the optimal coefficients of
feature maps. Empirically, we present our results from three aspects: We
provide extensive empirical evidence of different depth on different datasets
to show that the moduli of the characteristic equation's roots are the keys for
the performance of CNNs that require historical features; Our experiments show
that ZeroSNet outperforms existing CNNs which is based on high-order
discretization; ZeroSNets show better robustness against noises on the input.
The source code is available at \url{https://github.com/LongJin-lab/ZeroSNet}.
Related papers
- LinSATNet: The Positive Linear Satisfiability Neural Networks [116.65291739666303]
This paper studies how to introduce the popular positive linear satisfiability to neural networks.
We propose the first differentiable satisfiability layer based on an extension of the classic Sinkhorn algorithm for jointly encoding multiple sets of marginal distributions.
arXiv Detail & Related papers (2024-07-18T22:05:21Z) - PICNN: A Pathway towards Interpretable Convolutional Neural Networks [12.31424771480963]
We introduce a novel pathway to alleviate the entanglement between filters and image classes.
We use the Bernoulli sampling to generate the filter-cluster assignment matrix from a learnable filter-class correspondence matrix.
We evaluate the effectiveness of our method on ten widely used network architectures.
arXiv Detail & Related papers (2023-12-19T11:36:03Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Do deep neural networks have an inbuilt Occam's razor? [1.1470070927586016]
We show that structured data combined with an intrinsic Occam's razor-like inductive bias towards simple functions counteracts the exponential growth of functions with complexity.
This analysis reveals that structured data, combined with an intrinsic Occam's razor-like inductive bias towards (Kolmogorov) simple functions that is strong enough to counteract the exponential growth of functions with complexity, is a key to the success of DNNs.
arXiv Detail & Related papers (2023-04-13T16:58:21Z) - Interpreting Bias in the Neural Networks: A Peek Into Representational
Similarity [0.0]
We investigate the performance and internal representational structure of convolution-based neural networks trained on biased data.
We specifically study similarities in representations, using Centered Kernel Alignment (CKA) for different objective functions.
We note that without progressive representational similarities among the layers of a neural network, the performance is less likely to be robust.
arXiv Detail & Related papers (2022-11-14T22:17:14Z) - What Can Be Learnt With Wide Convolutional Neural Networks? [69.55323565255631]
We study infinitely-wide deep CNNs in the kernel regime.
We prove that deep CNNs adapt to the spatial scale of the target function.
We conclude by computing the generalisation error of a deep CNN trained on the output of another deep CNN.
arXiv Detail & Related papers (2022-08-01T17:19:32Z) - 0/1 Deep Neural Networks via Block Coordinate Descent [40.11141921215105]
The step function is one of the simplest and most natural activation functions for deep neural networks (DNNs)
As it counts 1 for positive variables and 0 for others, its intrinsic characteristics (e.g., discontinuity and no viable information of subgradients) impede its development for decades.
arXiv Detail & Related papers (2022-06-19T11:12:30Z) - Stability of Neural Networks on Manifolds to Relative Perturbations [118.84154142918214]
Graph Neural Networks (GNNs) show impressive performance in many practical scenarios.
GNNs can scale well on large size graphs, but this is contradicted by the fact that existing stability bounds grow with the number of nodes.
arXiv Detail & Related papers (2021-10-10T04:37:19Z) - Training Stable Graph Neural Networks Through Constrained Learning [116.03137405192356]
Graph Neural Networks (GNNs) rely on graph convolutions to learn features from network data.
GNNs are stable to different types of perturbations of the underlying graph, a property that they inherit from graph filters.
We propose a novel constrained learning approach by imposing a constraint on the stability condition of the GNN within a perturbation of choice.
arXiv Detail & Related papers (2021-10-07T15:54:42Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.