Related papers: Improving Covariance Conditioning of the SVD Meta-layer by Orthogonality

Improving Covariance Conditioning of the SVD Meta-layer by Orthogonality

URL: http://arxiv.org/abs/2207.02119v1
Date: Tue, 5 Jul 2022 15:39:29 GMT
Title: Improving Covariance Conditioning of the SVD Meta-layer by Orthogonality
Authors: Yue Song, Nicu Sebe, Wei Wang
Abstract summary: Nearest Orthogonal Gradient (NOG) and Optimal Learning Rate (OLR) are proposed. Experiments on visual recognition demonstrate that our methods can simultaneously improve the covariance conditioning and generalization.
Score: 65.67315418971688
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Inserting an SVD meta-layer into neural networks is prone to make the covariance ill-conditioned, which could harm the model in the training stability and generalization abilities. In this paper, we systematically study how to improve the covariance conditioning by enforcing orthogonality to the Pre-SVD layer. Existing orthogonal treatments on the weights are first investigated. However, these techniques can improve the conditioning but would hurt the performance. To avoid such a side effect, we propose the Nearest Orthogonal Gradient (NOG) and Optimal Learning Rate (OLR). The effectiveness of our methods is validated in two applications: decorrelated Batch Normalization (BN) and Global Covariance Pooling (GCP). Extensive experiments on visual recognition demonstrate that our methods can simultaneously improve the covariance conditioning and generalization. Moreover, the combinations with orthogonal weight can further boost the performances.

Related papers

Dynamic Context-oriented Decomposition for Task-aware Low-rank Adaptation with Less Forgetting and Faster Convergence [131.41894248194995]
We propose context-oriented decomposition adaptation (CorDA), a novel method that initializes adapters in a task-aware manner.<n>Thanks to the task awareness, our method enables two optional adaptation modes, knowledge-preserved mode (KPM) and instruction-previewed mode (IPM)
arXiv Detail & Related papers (2025-06-16T07:55:14Z)
A General Adaptive Dual-level Weighting Mechanism for Remote Sensing Pansharpening [11.791358860917189]
deep learning methods for remote sensing pansharpening have advanced rapidly. Many existing methods struggle to fully leverage feature heterogeneity and redundancy. We introduce a general adaptive dual-level weighting mechanism (ADWM) to address these challenges.
arXiv Detail & Related papers (2025-03-17T14:24:00Z)
Orthogonal SVD Covariance Conditioning and Latent Disentanglement [65.67315418971688]
Inserting an SVD meta-layer into neural networks is prone to make the covariance ill-conditioned. We propose Nearest Orthogonal Gradient (NOG) and Optimal Learning Rate (OLR) Experiments on visual recognition demonstrate that our methods can simultaneously improve covariance conditioning and generalization.
arXiv Detail & Related papers (2022-12-11T20:31:31Z)
Enhancing Adversarial Training with Second-Order Statistics of Weights [23.90998469971413]
We show that treating model weights as random variables allows for enhancing adversarial training through textbfSecond-Order textbfStatistics textbfOptimization. We conduct an extensive set of experiments, which show that S$2$O not only improves the robustness and generalization of the trained neural networks when used in isolation, but also integrates easily in state-of-the-art adversarial training techniques.
arXiv Detail & Related papers (2022-03-11T15:40:57Z)
Heterogeneous Calibration: A post-hoc model-agnostic framework for improved generalization [8.815439276597818]
We introduce the notion of heterogeneous calibration that applies a post-hoc model-agnostic transformation to model outputs for improving AUC performance on binary classification tasks. We refer to simple patterns as heterogeneous partitions of the feature space and show theoretically that perfectly calibrating each partition separately optimize AUC. While the theoretical optimality of this framework holds for any model, we focus on deep neural networks (DNNs) and test the simplest instantiation of this paradigm on a variety of open-source datasets.
arXiv Detail & Related papers (2022-02-10T05:08:50Z)
Revisiting Consistency Regularization for Semi-Supervised Learning [80.28461584135967]
We propose an improved consistency regularization framework by a simple yet effective technique, FeatDistLoss. Experimental results show that our model defines a new state of the art for various datasets and settings.
arXiv Detail & Related papers (2021-12-10T20:46:13Z)
Stratified Learning: A General-Purpose Statistical Method for Improved Learning under Covariate Shift [1.1470070927586016]
We propose a simple, statistically principled, and theoretically justified method to improve supervised learning when the training set is not representative. We build upon a well-established methodology in causal inference, and show that the effects of covariate shift can be reduced or eliminated by conditioning on propensity scores. We demonstrate the effectiveness of our general-purpose method on two contemporary research questions in cosmology, outperforming state-of-the-art importance weighting methods.
arXiv Detail & Related papers (2021-06-21T15:53:20Z)
Cogradient Descent for Dependable Learning [64.02052988844301]
We propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem. CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint. It can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-06-20T04:28:20Z)
Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling? [59.820507600960745]
We propose a new GCP meta-layer that uses SVD in the forward pass, and Pad'e Approximants in the backward propagation to compute the gradients. The proposed meta-layer has been integrated into different CNN models and achieves state-of-the-art performances on both large-scale and fine-grained datasets.
arXiv Detail & Related papers (2021-05-06T08:03:45Z)
Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem. We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent. Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.