Deep Networks from the Principle of Rate Reduction
- URL: http://arxiv.org/abs/2010.14765v1
- Date: Tue, 27 Oct 2020 06:01:43 GMT
- Title: Deep Networks from the Principle of Rate Reduction
- Authors: Kwan Ho Ryan Chan, Yaodong Yu, Chong You, Haozhi Qi, John Wright, Yi
Ma
- Abstract summary: This work attempts to interpret modern deep (convolutional) networks from the principles of rate reduction and (shift) invariant classification.
We show that the basic iterative ascent gradient scheme for optimizing the rate reduction of learned features naturally leads to a multi-layer deep network, one iteration per layer.
All components of this "white box" network have precise optimization, statistical, and geometric interpretation.
- Score: 32.87280757001462
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work attempts to interpret modern deep (convolutional) networks from the
principles of rate reduction and (shift) invariant classification. We show that
the basic iterative gradient ascent scheme for optimizing the rate reduction of
learned features naturally leads to a multi-layer deep network, one iteration
per layer. The layered architectures, linear and nonlinear operators, and even
parameters of the network are all explicitly constructed layer-by-layer in a
forward propagation fashion by emulating the gradient scheme. All components of
this "white box" network have precise optimization, statistical, and geometric
interpretation. This principled framework also reveals and justifies the role
of multi-channel lifting and sparse coding in early stage of deep networks.
Moreover, all linear operators of the so-derived network naturally become
multi-channel convolutions when we enforce classification to be rigorously
shift-invariant. The derivation also indicates that such a convolutional
network is significantly more efficient to construct and learn in the spectral
domain. Our preliminary simulations and experiments indicate that so
constructed deep network can already learn a good discriminative representation
even without any back propagation training.
Related papers
- Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks [0.2302001830524133]
We instantiate a regularized form of the clipping gradient algorithm and prove that it can converge to the global minima of deep neural network loss functions.
We present empirical evidence that our theoretically founded regularized gradient clipping algorithm is also competitive with the state-of-the-art deep-learnings.
arXiv Detail & Related papers (2024-04-12T17:37:42Z) - Rotation Equivariant Proximal Operator for Deep Unfolding Methods in Image Restoration [62.41329042683779]
We propose a high-accuracy rotation equivariant proximal network that embeds rotation symmetry priors into the deep unfolding framework.
This study makes efforts to suggest a high-accuracy rotation equivariant proximal network that effectively embeds rotation symmetry priors into the deep unfolding framework.
arXiv Detail & Related papers (2023-12-25T11:53:06Z) - Deep Residual Compensation Convolutional Network without Backpropagation [0.0]
We introduce a residual compensation convolutional network, which is the first PCANet-like network trained with hundreds of layers.
To correct the classification errors, we train each layer with new labels derived from the residual information of all its preceding layers.
Our experiments show that our deep network outperforms all existing PCANet-like networks and is competitive with several traditional gradient-based models.
arXiv Detail & Related papers (2023-01-27T11:45:09Z) - Simple initialization and parametrization of sinusoidal networks via
their kernel bandwidth [92.25666446274188]
sinusoidal neural networks with activations have been proposed as an alternative to networks with traditional activation functions.
We first propose a simplified version of such sinusoidal neural networks, which allows both for easier practical implementation and simpler theoretical analysis.
We then analyze the behavior of these networks from the neural tangent kernel perspective and demonstrate that their kernel approximates a low-pass filter with an adjustable bandwidth.
arXiv Detail & Related papers (2022-11-26T07:41:48Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - Redundancy in Deep Linear Neural Networks [0.0]
Conventional wisdom states that deep linear neural networks benefit from expressiveness and optimization advantages over a single linear layer.
This paper suggests that, in practice, the training process of deep linear fully-connected networks using conventionals is convex in the same manner as a single linear fully-connected layer.
arXiv Detail & Related papers (2022-06-09T13:21:00Z) - The Principles of Deep Learning Theory [19.33681537640272]
This book develops an effective theory approach to understanding deep neural networks of practical relevance.
We explain how these effectively-deep networks learn nontrivial representations from training.
We show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks.
arXiv Detail & Related papers (2021-06-18T15:00:00Z) - Learning Structures for Deep Neural Networks [99.8331363309895]
We propose to adopt the efficient coding principle, rooted in information theory and developed in computational neuroscience.
We show that sparse coding can effectively maximize the entropy of the output signals.
Our experiments on a public image classification dataset demonstrate that using the structure learned from scratch by our proposed algorithm, one can achieve a classification accuracy comparable to the best expert-designed structure.
arXiv Detail & Related papers (2021-05-27T12:27:24Z) - ReduNet: A White-box Deep Network from the Principle of Maximizing Rate
Reduction [32.489371527159236]
This work attempts to provide a plausible theoretical framework that aims to interpret modern deep (convolutional) networks from the principles of data compression and discriminative representation.
We show that for high-dimensional multi-class data, the optimal linear discriminative representation maximizes the coding rate difference between the whole dataset and the average of all the subsets.
We show that the basic iterative gradient ascent scheme for optimizing the rate reduction objective naturally leads to a multi-layer deep network, named ReduNet, that shares common characteristics of modern deep networks.
arXiv Detail & Related papers (2021-05-21T16:29:57Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z) - Dynamic Hierarchical Mimicking Towards Consistent Optimization
Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability.
Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network.
Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.