Towards Better Orthogonality Regularization with Disentangled Norm in
Training Deep CNNs
- URL: http://arxiv.org/abs/2306.09939v1
- Date: Fri, 16 Jun 2023 16:19:59 GMT
- Title: Towards Better Orthogonality Regularization with Disentangled Norm in
Training Deep CNNs
- Authors: Changhao Wu, Shenan Zhang, Fangsong Long, Ziliang Yin, Tuo Leng
- Abstract summary: We propose a novel measure for achieving better orthogonality among filters, which disentangles diagonal and correlation information from the residual.
We conduct experiments with our kernel orthogonality regularization toolkit on ResNet and WideResNet in CIFAR-10 and CIFAR-100.
- Score: 0.37498611358320727
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Orthogonality regularization has been developed to prevent deep CNNs from
training instability and feature redundancy. Among existing proposals, kernel
orthogonality regularization enforces orthogonality by minimizing the residual
between the Gram matrix formed by convolutional filters and the orthogonality
matrix.
We propose a novel measure for achieving better orthogonality among filters,
which disentangles diagonal and correlation information from the residual. The
model equipped with the measure under the principle of imposing strict
orthogonality between filters surpasses previous regularization methods in
near-orthogonality. Moreover, we observe the benefits of improved strict filter
orthogonality in relatively shallow models, but as model depth increases, the
performance gains in models employing strict kernel orthogonality decrease
sharply.
Furthermore, based on the observation of the potential conflict between
strict kernel orthogonality and growing model capacity, we propose a relaxation
theory on kernel orthogonality regularization. The relaxed kernel orthogonality
achieves enhanced performance on models with increased capacity, shedding light
on the burden of strict kernel orthogonality on deep model performance.
We conduct extensive experiments with our kernel orthogonality regularization
toolkit on ResNet and WideResNet in CIFAR-10 and CIFAR-100. We observe
state-of-the-art gains in model performance from the toolkit, which includes
both strict orthogonality and relaxed orthogonality regularization, and obtain
more robust models with expressive features. These experiments demonstrate the
efficacy of our toolkit and subtly provide insights into the often overlooked
challenges posed by strict orthogonality, addressing the burden of strict
orthogonality on capacity-rich models.
Related papers
- A theoretical framework for overfitting in energy-based modeling [5.1337384597700995]
We investigate the impact of limited data on training pairwise energy-based models for inverse problems aimed at identifying interaction networks.
We dissect training trajectories across the eigenbasis of the coupling matrix, exploiting the independent evolution of eigenmodes.
We show that finite data corrections can be accurately modeled through random matrix theory calculations.
arXiv Detail & Related papers (2025-01-31T14:21:02Z) - Diagonal Over-parameterization in Reproducing Kernel Hilbert Spaces as an Adaptive Feature Model: Generalization and Adaptivity [11.644182973599788]
diagonal adaptive kernel model learns kernel eigenvalues and output coefficients simultaneously during training.
We show that the adaptivity comes from learning the right eigenvalues during training.
arXiv Detail & Related papers (2025-01-15T09:20:02Z) - Efficient Algorithms for Regularized Nonnegative Scale-invariant Low-rank Approximation Models [3.6034001987137767]
We study the role of regularization functions in low-rank approximation models.
We propose a generic Majorization-Minimization (MM) algorithm capable of handling $ell_pp$-regularized nonnegative low-rank approximations.
arXiv Detail & Related papers (2024-03-27T12:49:14Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Multi-View Spectral Clustering Tailored Tensor Low-Rank Representation [105.33409035876691]
This paper explores the problem of multi-view spectral clustering (MVSC) based on tensor low-rank modeling.
We design a novel structured tensor low-rank norm tailored to MVSC.
We show that the proposed method outperforms state-of-the-art methods to a significant extent.
arXiv Detail & Related papers (2020-04-30T11:52:12Z) - Controllable Orthogonalization in Training DNNs [96.1365404059924]
Orthogonality is widely used for training deep neural networks (DNNs) due to its ability to maintain all singular values of the Jacobian close to 1.
This paper proposes a computationally efficient and numerically stable orthogonalization method using Newton's iteration (ONI)
We show that our method improves the performance of image classification networks by effectively controlling the orthogonality to provide an optimal tradeoff between optimization benefits and representational capacity reduction.
We also show that ONI stabilizes the training of generative adversarial networks (GANs) by maintaining the Lipschitz continuity of a network, similar to spectral normalization (
arXiv Detail & Related papers (2020-04-02T10:14:27Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z) - Self-Orthogonality Module: A Network Architecture Plug-in for Learning
Orthogonal Filters [28.54654866641997]
We introduce an implicit self-regularization into OR to push the mean and variance of filter angles in a network towards 90 and 0 simultaneously.
Our regularization can be implemented as an architectural plug-in and integrated with an arbitrary network.
arXiv Detail & Related papers (2020-01-05T17:31:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.