Related papers: Generalizing and Improving Jacobian and Hessian Regularization

Generalizing and Improving Jacobian and Hessian Regularization

URL: http://arxiv.org/abs/2212.00311v1
Date: Thu, 1 Dec 2022 07:01:59 GMT
Title: Generalizing and Improving Jacobian and Hessian Regularization
Authors: Chenwei Cui, Zehao Yan, Guangshen Liu, Liangfu Lu
Abstract summary: We generalize previous efforts by extending the target matrix from zero to any matrix that admits efficient matrix-vector products. The proposed paradigm allows us to construct novel regularization terms that enforce symmetry or diagonality on square Jacobian and Hessian matrices. We introduce Lanczos-based spectral norm minimization to tackle this difficulty.
Score: 1.926971915834451
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Jacobian and Hessian regularization aim to reduce the magnitude of the first and second-order partial derivatives with respect to neural network inputs, and they are predominantly used to ensure the adversarial robustness of image classifiers. In this work, we generalize previous efforts by extending the target matrix from zero to any matrix that admits efficient matrix-vector products. The proposed paradigm allows us to construct novel regularization terms that enforce symmetry or diagonality on square Jacobian and Hessian matrices. On the other hand, the major challenge for Jacobian and Hessian regularization has been high computational complexity. We introduce Lanczos-based spectral norm minimization to tackle this difficulty. This technique uses a parallelized implementation of the Lanczos algorithm and is capable of effective and stable regularization of large Jacobian and Hessian matrices. Theoretical justifications and empirical evidence are provided for the proposed paradigm and technique. We carry out exploratory experiments to validate the effectiveness of our novel regularization terms. We also conduct comparative experiments to evaluate Lanczos-based spectral norm minimization against prior methods. Results show that the proposed methodologies are advantageous for a wide range of tasks.

Related papers

Matrix-Free Two-to-Infinity and One-to-Two Norms Estimation [3.148633400386997]
We propose new randomized algorithms for estimating the two-to-infinity and one-to-two norms in a matrix-free setting.<n>Our methods are based on appropriate modifications of Hutchinson's diagonal estimator and its Hutch++ version.
arXiv Detail & Related papers (2025-08-06T13:37:37Z)
Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy [57.54306942529943]
We propose an Approximately Orthogonal Fine-Tuning (AOFT) strategy for representing the low-rank weight matrices.<n>Our method achieves competitive performance across a range of downstream image classification tasks.
arXiv Detail & Related papers (2025-07-17T16:09:05Z)
On the Upper Bounds for the Matrix Spectral Norm [4.773232329447336]
We propose a new Counterbalance estimator that provides upper bounds on the norm and derive probabilistic guarantees on its underestimation.<n>Compared to standard approaches such as the power method, the proposed estimator produces significantly tighter upper bounds in both synthetic and real-world settings.
arXiv Detail & Related papers (2025-06-18T17:39:03Z)
Regularized Projection Matrix Approximation with Applications to Community Detection [1.3761665705201904]
This paper introduces a regularized projection matrix approximation framework designed to recover cluster information from the affinity matrix. We investigate three distinct penalty functions, each specifically tailored to address bounded, positive, and sparse scenarios. Numerical experiments conducted on both synthetic and real-world datasets reveal that our regularized projection matrix approximation approach significantly outperforms state-of-the-art methods in clustering performance.
arXiv Detail & Related papers (2024-05-26T15:18:22Z)
The Inductive Bias of Flatness Regularization for Deep Matrix Factorization [58.851514333119255]
This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in deep linear networks. We show that for all depth greater than one, with the standard Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters.
arXiv Detail & Related papers (2023-06-22T23:14:57Z)
Semi-Supervised Subspace Clustering via Tensor Low-Rank Representation [64.49871502193477]
We propose a novel semi-supervised subspace clustering method, which is able to simultaneously augment the initial supervisory information and construct a discriminative affinity matrix. Comprehensive experimental results on six commonly-used benchmark datasets demonstrate the superiority of our method over state-of-the-art methods.
arXiv Detail & Related papers (2022-05-21T01:47:17Z)
Riemannian statistics meets random matrix theory: towards learning from high-dimensional covariance matrices [2.352645870795664]
This paper shows that there seems to exist no practical method of computing the normalising factors associated with Riemannian Gaussian distributions on spaces of high-dimensional covariance matrices. It is shown that this missing method comes from an unexpected new connection with random matrix theory. Numerical experiments are conducted which demonstrate how this new approximation can unlock the difficulties which have impeded applications to real-world datasets.
arXiv Detail & Related papers (2022-03-01T03:16:50Z)
Learning a Compressive Sensing Matrix with Structural Constraints via Maximum Mean Discrepancy Optimization [17.104994036477308]
We introduce a learning-based algorithm to obtain a measurement matrix for compressive sensing related recovery problems. Recent success of such metrics in neural network related topics motivate a solution of the problem based on machine learning.
arXiv Detail & Related papers (2021-10-14T08:35:54Z)
Adversarially-Trained Nonnegative Matrix Factorization [77.34726150561087]
We consider an adversarially-trained version of the nonnegative matrix factorization. In our formulation, an attacker adds an arbitrary matrix of bounded norm to the given data matrix. We design efficient algorithms inspired by adversarial training to optimize for dictionary and coefficient matrices.
arXiv Detail & Related papers (2021-04-10T13:13:17Z)
On the Efficient Implementation of the Matrix Exponentiated Gradient Algorithm for Low-Rank Matrix Optimization [26.858608065417663]
Convex optimization over the spectrahedron has important applications in machine learning, signal processing and statistics. We propose efficient implementations of MEG, which are tailored for optimization with low-rank matrices, and only use a single low-rank SVD on each iteration. We also provide efficiently-computable certificates for the correct convergence of our methods.
arXiv Detail & Related papers (2020-12-18T19:14:51Z)
Understanding Implicit Regularization in Over-Parameterized Single Index Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model. We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z)
Controllable Orthogonalization in Training DNNs [96.1365404059924]
Orthogonality is widely used for training deep neural networks (DNNs) due to its ability to maintain all singular values of the Jacobian close to 1. This paper proposes a computationally efficient and numerically stable orthogonalization method using Newton's iteration (ONI) We show that our method improves the performance of image classification networks by effectively controlling the orthogonality to provide an optimal tradeoff between optimization benefits and representational capacity reduction. We also show that ONI stabilizes the training of generative adversarial networks (GANs) by maintaining the Lipschitz continuity of a network, similar to spectral normalization (
arXiv Detail & Related papers (2020-04-02T10:14:27Z)
Accurate Optimization of Weighted Nuclear Norm for Non-Rigid Structure from Motion [15.641335104467982]
We show that more accurate results can be achieved with 2nd order methods. Our main result shows how to construct bilinear formulations, for a general class of regularizers. We show experimentally, on a number of structure from motion problems, that our approach outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-03-23T13:52:16Z)
Optimal Iterative Sketching with the Subsampled Randomized Hadamard Transform [64.90148466525754]
We study the performance of iterative sketching for least-squares problems. We show that the convergence rate for Haar and randomized Hadamard matrices are identical, andally improve upon random projections. These techniques may be applied to other algorithms that employ randomized dimension reduction.
arXiv Detail & Related papers (2020-02-03T16:17:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.