Self-Orthogonality Module: A Network Architecture Plug-in for Learning
Orthogonal Filters
- URL: http://arxiv.org/abs/2001.01275v2
- Date: Fri, 17 Jan 2020 18:10:58 GMT
- Title: Self-Orthogonality Module: A Network Architecture Plug-in for Learning
Orthogonal Filters
- Authors: Ziming Zhang, Wenchi Ma, Yuanwei Wu, Guanghui Wang
- Abstract summary: We introduce an implicit self-regularization into OR to push the mean and variance of filter angles in a network towards 90 and 0 simultaneously.
Our regularization can be implemented as an architectural plug-in and integrated with an arbitrary network.
- Score: 28.54654866641997
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we investigate the empirical impact of orthogonality
regularization (OR) in deep learning, either solo or collaboratively. Recent
works on OR showed some promising results on the accuracy. In our ablation
study, however, we do not observe such significant improvement from existing OR
techniques compared with the conventional training based on weight decay,
dropout, and batch normalization. To identify the real gain from OR, inspired
by the locality sensitive hashing (LSH) in angle estimation, we propose to
introduce an implicit self-regularization into OR to push the mean and variance
of filter angles in a network towards 90 and 0 simultaneously to achieve (near)
orthogonality among the filters, without using any other explicit
regularization. Our regularization can be implemented as an architectural
plug-in and integrated with an arbitrary network. We reveal that OR helps
stabilize the training process and leads to faster convergence and better
generalization.
Related papers
- Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Group Orthogonalization Regularization For Vision Models Adaptation and
Robustness [31.43307762723943]
We propose a computationally efficient regularization technique that encourages orthonormality between groups of filters within the same layer.
Our experiments show that when incorporated into recent adaptation methods for diffusion models and vision transformers (ViTs), this regularization improves performance on downstream tasks.
arXiv Detail & Related papers (2023-06-16T17:53:16Z) - Towards Better Orthogonality Regularization with Disentangled Norm in
Training Deep CNNs [0.37498611358320727]
We propose a novel measure for achieving better orthogonality among filters, which disentangles diagonal and correlation information from the residual.
We conduct experiments with our kernel orthogonality regularization toolkit on ResNet and WideResNet in CIFAR-10 and CIFAR-100.
arXiv Detail & Related papers (2023-06-16T16:19:59Z) - Orthogonal SVD Covariance Conditioning and Latent Disentanglement [65.67315418971688]
Inserting an SVD meta-layer into neural networks is prone to make the covariance ill-conditioned.
We propose Nearest Orthogonal Gradient (NOG) and Optimal Learning Rate (OLR)
Experiments on visual recognition demonstrate that our methods can simultaneously improve covariance conditioning and generalization.
arXiv Detail & Related papers (2022-12-11T20:31:31Z) - Understanding the Covariance Structure of Convolutional Filters [86.0964031294896]
Recent ViT-inspired convolutional networks such as ConvMixer and ConvNeXt use large-kernel depthwise convolutions with notable structure.
We first observe that such learned filters have highly-structured covariance matrices, and we find that covariances calculated from small networks may be used to effectively initialize a variety of larger networks.
arXiv Detail & Related papers (2022-10-07T15:59:13Z) - Chaos is a Ladder: A New Theoretical Understanding of Contrastive
Learning via Augmentation Overlap [64.60460828425502]
We propose a new guarantee on the downstream performance of contrastive learning.
Our new theory hinges on the insight that the support of different intra-class samples will become more overlapped under aggressive data augmentations.
We propose an unsupervised model selection metric ARC that aligns well with downstream accuracy.
arXiv Detail & Related papers (2022-03-25T05:36:26Z) - Self-Ensembling GAN for Cross-Domain Semantic Segmentation [107.27377745720243]
This paper proposes a self-ensembling generative adversarial network (SE-GAN) exploiting cross-domain data for semantic segmentation.
In SE-GAN, a teacher network and a student network constitute a self-ensembling model for generating semantic segmentation maps, which together with a discriminator, forms a GAN.
Despite its simplicity, we find SE-GAN can significantly boost the performance of adversarial training and enhance the stability of the model.
arXiv Detail & Related papers (2021-12-15T09:50:25Z) - Orthogonal Jacobian Regularization for Unsupervised Disentanglement in
Image Generation [64.92152574895111]
We propose a simple Orthogonal Jacobian Regularization (OroJaR) to encourage deep generative model to learn disentangled representations.
Our method is effective in disentangled and controllable image generation, and performs favorably against the state-of-the-art methods.
arXiv Detail & Related papers (2021-08-17T15:01:46Z) - Sparsity Aware Normalization for GANs [32.76828505875087]
Generative adversarial networks (GANs) are known to benefit from regularization or normalization of their critic (discriminator) network during training.
In this paper, we analyze the popular spectral normalization scheme, find a significant drawback and introduce sparsity aware normalization (SAN), a new alternative approach for stabilizing GAN training.
arXiv Detail & Related papers (2021-03-03T15:05:18Z) - Learning Sparse Filters in Deep Convolutional Neural Networks with a
l1/l2 Pseudo-Norm [5.3791844634527495]
Deep neural networks (DNNs) have proven to be efficient for numerous tasks, but come at a high memory and computation cost.
Recent research has shown that their structure can be more compact without compromising their performance.
We present a sparsity-inducing regularization term based on the ratio l1/l2 pseudo-norm defined on the filter coefficients.
arXiv Detail & Related papers (2020-07-20T11:56:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.