Related papers: Towards Better Orthogonality Regularization with Disentangled Norm in Training Deep CNNs

Towards Better Orthogonality Regularization with Disentangled Norm in Training Deep CNNs

URL: http://arxiv.org/abs/2306.09939v1
Date: Fri, 16 Jun 2023 16:19:59 GMT
Title: Towards Better Orthogonality Regularization with Disentangled Norm in Training Deep CNNs
Authors: Changhao Wu, Shenan Zhang, Fangsong Long, Ziliang Yin, Tuo Leng
Abstract summary: We propose a novel measure for achieving better orthogonality among filters, which disentangles diagonal and correlation information from the residual. We conduct experiments with our kernel orthogonality regularization toolkit on ResNet and WideResNet in CIFAR-10 and CIFAR-100.
Score: 0.37498611358320727
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Orthogonality regularization has been developed to prevent deep CNNs from training instability and feature redundancy. Among existing proposals, kernel orthogonality regularization enforces orthogonality by minimizing the residual between the Gram matrix formed by convolutional filters and the orthogonality matrix. We propose a novel measure for achieving better orthogonality among filters, which disentangles diagonal and correlation information from the residual. The model equipped with the measure under the principle of imposing strict orthogonality between filters surpasses previous regularization methods in near-orthogonality. Moreover, we observe the benefits of improved strict filter orthogonality in relatively shallow models, but as model depth increases, the performance gains in models employing strict kernel orthogonality decrease sharply. Furthermore, based on the observation of the potential conflict between strict kernel orthogonality and growing model capacity, we propose a relaxation theory on kernel orthogonality regularization. The relaxed kernel orthogonality achieves enhanced performance on models with increased capacity, shedding light on the burden of strict kernel orthogonality on deep model performance. We conduct extensive experiments with our kernel orthogonality regularization toolkit on ResNet and WideResNet in CIFAR-10 and CIFAR-100. We observe state-of-the-art gains in model performance from the toolkit, which includes both strict orthogonality and relaxed orthogonality regularization, and obtain more robust models with expressive features. These experiments demonstrate the efficacy of our toolkit and subtly provide insights into the often overlooked challenges posed by strict orthogonality, addressing the burden of strict orthogonality on capacity-rich models.

Related papers

A theoretical framework for overfitting in energy-based modeling [5.1337384597700995]
We investigate the impact of limited data on training pairwise energy-based models for inverse problems aimed at identifying interaction networks. We dissect training trajectories across the eigenbasis of the coupling matrix, exploiting the independent evolution of eigenmodes. We show that finite data corrections can be accurately modeled through random matrix theory calculations.
arXiv Detail & Related papers (2025-01-31T14:21:02Z)
Diagonal Over-parameterization in Reproducing Kernel Hilbert Spaces as an Adaptive Feature Model: Generalization and Adaptivity [11.644182973599788]
diagonal adaptive kernel model learns kernel eigenvalues and output coefficients simultaneously during training. We show that the adaptivity comes from learning the right eigenvalues during training.
arXiv Detail & Related papers (2025-01-15T09:20:02Z)
Efficient Algorithms for Regularized Nonnegative Scale-invariant Low-rank Approximation Models [3.6034001987137767]
We show that scale-invariance inherent to low-rank approximation models causes an implicit regularization with both unexpected beneficial and detrimental effects. We derive a generic Majorization Minimization algorithm that handles many regularized nonnegative low-rank approximations. We showcase our contributions on sparse Nonnegative Matrix Factorization, ridge-regularized Canonical Polyadic decomposition and sparse Nonnegative Tucker Decomposition.
arXiv Detail & Related papers (2024-03-27T12:49:14Z)
Low-resolution Prior Equilibrium Network for CT Reconstruction [3.5639148953570836]
We present a novel deep learning-based CT reconstruction model, where the low-resolution image is introduced to obtain an effective regularization term for improving the networks robustness. Experimental results on both sparse-view and limited-angle reconstruction problems are provided, demonstrating that our end-to-end low-resolution prior equilibrium model outperforms other state-of-the-art methods in terms of noise reduction, contrast-to-noise ratio, and preservation of edge details.
arXiv Detail & Related papers (2024-01-28T13:59:58Z)
Towards Continual Learning Desiderata via HSIC-Bottleneck Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion. Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Generalizing and Improving Jacobian and Hessian Regularization [1.926971915834451]
We generalize previous efforts by extending the target matrix from zero to any matrix that admits efficient matrix-vector products. The proposed paradigm allows us to construct novel regularization terms that enforce symmetry or diagonality on square Jacobian and Hessian matrices. We introduce Lanczos-based spectral norm minimization to tackle this difficulty.
arXiv Detail & Related papers (2022-12-01T07:01:59Z)
Understanding Implicit Regularization in Over-Parameterized Single Index Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model. We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z)
Multi-View Spectral Clustering Tailored Tensor Low-Rank Representation [105.33409035876691]
This paper explores the problem of multi-view spectral clustering (MVSC) based on tensor low-rank modeling. We design a novel structured tensor low-rank norm tailored to MVSC. We show that the proposed method outperforms state-of-the-art methods to a significant extent.
arXiv Detail & Related papers (2020-04-30T11:52:12Z)
Controllable Orthogonalization in Training DNNs [96.1365404059924]
Orthogonality is widely used for training deep neural networks (DNNs) due to its ability to maintain all singular values of the Jacobian close to 1. This paper proposes a computationally efficient and numerically stable orthogonalization method using Newton's iteration (ONI) We show that our method improves the performance of image classification networks by effectively controlling the orthogonality to provide an optimal tradeoff between optimization benefits and representational capacity reduction. We also show that ONI stabilizes the training of generative adversarial networks (GANs) by maintaining the Lipschitz continuity of a network, similar to spectral normalization (
arXiv Detail & Related papers (2020-04-02T10:14:27Z)
Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms. We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
Self-Orthogonality Module: A Network Architecture Plug-in for Learning Orthogonal Filters [28.54654866641997]
We introduce an implicit self-regularization into OR to push the mean and variance of filter angles in a network towards 90 and 0 simultaneously. Our regularization can be implemented as an architectural plug-in and integrated with an arbitrary network.
arXiv Detail & Related papers (2020-01-05T17:31:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.