Towards Better Orthogonality Regularization with Disentangled Norm in
Training Deep CNNs
- URL: http://arxiv.org/abs/2306.09939v1
- Date: Fri, 16 Jun 2023 16:19:59 GMT
- Title: Towards Better Orthogonality Regularization with Disentangled Norm in
Training Deep CNNs
- Authors: Changhao Wu, Shenan Zhang, Fangsong Long, Ziliang Yin, Tuo Leng
- Abstract summary: We propose a novel measure for achieving better orthogonality among filters, which disentangles diagonal and correlation information from the residual.
We conduct experiments with our kernel orthogonality regularization toolkit on ResNet and WideResNet in CIFAR-10 and CIFAR-100.
- Score: 0.37498611358320727
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Orthogonality regularization has been developed to prevent deep CNNs from
training instability and feature redundancy. Among existing proposals, kernel
orthogonality regularization enforces orthogonality by minimizing the residual
between the Gram matrix formed by convolutional filters and the orthogonality
matrix.
We propose a novel measure for achieving better orthogonality among filters,
which disentangles diagonal and correlation information from the residual. The
model equipped with the measure under the principle of imposing strict
orthogonality between filters surpasses previous regularization methods in
near-orthogonality. Moreover, we observe the benefits of improved strict filter
orthogonality in relatively shallow models, but as model depth increases, the
performance gains in models employing strict kernel orthogonality decrease
sharply.
Furthermore, based on the observation of the potential conflict between
strict kernel orthogonality and growing model capacity, we propose a relaxation
theory on kernel orthogonality regularization. The relaxed kernel orthogonality
achieves enhanced performance on models with increased capacity, shedding light
on the burden of strict kernel orthogonality on deep model performance.
We conduct extensive experiments with our kernel orthogonality regularization
toolkit on ResNet and WideResNet in CIFAR-10 and CIFAR-100. We observe
state-of-the-art gains in model performance from the toolkit, which includes
both strict orthogonality and relaxed orthogonality regularization, and obtain
more robust models with expressive features. These experiments demonstrate the
efficacy of our toolkit and subtly provide insights into the often overlooked
challenges posed by strict orthogonality, addressing the burden of strict
orthogonality on capacity-rich models.
Related papers
- Efficient Algorithms for Regularized Nonnegative Scale-invariant Low-rank Approximation Models [3.6034001987137767]
We show that scale-invariance inherent to low-rank approximation models causes an implicit regularization with both unexpected beneficial and detrimental effects.
We derive a generic Majorization Minimization algorithm that handles many regularized nonnegative low-rank approximations.
We showcase our contributions on sparse Nonnegative Matrix Factorization, ridge-regularized Canonical Polyadic decomposition and sparse Nonnegative Tucker Decomposition.
arXiv Detail & Related papers (2024-03-27T12:49:14Z) - Low-resolution Prior Equilibrium Network for CT Reconstruction [3.5639148953570836]
We present a novel deep learning-based CT reconstruction model, where the low-resolution image is introduced to obtain an effective regularization term for improving the networks robustness.
Experimental results on both sparse-view and limited-angle reconstruction problems are provided, demonstrating that our end-to-end low-resolution prior equilibrium model outperforms other state-of-the-art methods in terms of noise reduction, contrast-to-noise ratio, and preservation of edge details.
arXiv Detail & Related papers (2024-01-28T13:59:58Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Generalizing and Improving Jacobian and Hessian Regularization [1.926971915834451]
We generalize previous efforts by extending the target matrix from zero to any matrix that admits efficient matrix-vector products.
The proposed paradigm allows us to construct novel regularization terms that enforce symmetry or diagonality on square Jacobian and Hessian matrices.
We introduce Lanczos-based spectral norm minimization to tackle this difficulty.
arXiv Detail & Related papers (2022-12-01T07:01:59Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Multi-View Spectral Clustering Tailored Tensor Low-Rank Representation [105.33409035876691]
This paper explores the problem of multi-view spectral clustering (MVSC) based on tensor low-rank modeling.
We design a novel structured tensor low-rank norm tailored to MVSC.
We show that the proposed method outperforms state-of-the-art methods to a significant extent.
arXiv Detail & Related papers (2020-04-30T11:52:12Z) - Controllable Orthogonalization in Training DNNs [96.1365404059924]
Orthogonality is widely used for training deep neural networks (DNNs) due to its ability to maintain all singular values of the Jacobian close to 1.
This paper proposes a computationally efficient and numerically stable orthogonalization method using Newton's iteration (ONI)
We show that our method improves the performance of image classification networks by effectively controlling the orthogonality to provide an optimal tradeoff between optimization benefits and representational capacity reduction.
We also show that ONI stabilizes the training of generative adversarial networks (GANs) by maintaining the Lipschitz continuity of a network, similar to spectral normalization (
arXiv Detail & Related papers (2020-04-02T10:14:27Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z) - Self-Orthogonality Module: A Network Architecture Plug-in for Learning
Orthogonal Filters [28.54654866641997]
We introduce an implicit self-regularization into OR to push the mean and variance of filter angles in a network towards 90 and 0 simultaneously.
Our regularization can be implemented as an architectural plug-in and integrated with an arbitrary network.
arXiv Detail & Related papers (2020-01-05T17:31:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.