A Novel Sparse Regularizer
- URL: http://arxiv.org/abs/2301.07285v5
- Date: Thu, 20 Apr 2023 19:37:05 GMT
- Title: A Novel Sparse Regularizer
- Authors: Hovig Tigran Bayandorian
- Abstract summary: This paper introduces a regularizer based on minimizing a novel measure of entropy applied to the model during optimization.
It is differentiable, simple and fast to compute, scale-invariant, requires a trivial amount of additional memory, and can easily be parallelized.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: $L_p$-norm regularization schemes such as $L_0$, $L_1$, and $L_2$-norm
regularization and $L_p$-norm-based regularization techniques such as weight
decay, LASSO, and elastic net compute a quantity which depends on model weights
considered in isolation from one another. This paper introduces a regularizer
based on minimizing a novel measure of entropy applied to the model during
optimization. In contrast with $L_p$-norm-based regularization, this
regularizer is concerned with the spatial arrangement of weights within a
weight matrix. This novel regularizer is an additive term for the loss function
and is differentiable, simple and fast to compute, scale-invariant, requires a
trivial amount of additional memory, and can easily be parallelized.
Empirically this method yields approximately a one order-of-magnitude
improvement in the number of nonzero model parameters required to achieve a
given level of test accuracy when training LeNet300 on MNIST.
Related papers
- Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Decoupled Weight Decay for Any $p$ Norm [1.1510009152620668]
We consider a simple yet effective approach to sparsification, based on the Bridge, $L_p$ regularization during training.
We introduce a novel weight decay scheme, which generalizes the standard $L$ weight decay to any $p$ norm.
We empirically demonstrate that it leads to highly sparse networks, while maintaining performance comparable to standard $L$ regularization.
arXiv Detail & Related papers (2024-04-16T18:02:15Z) - A stochastic optimization approach to train non-linear neural networks
with a higher-order variation regularization [3.0277213703725767]
This study considers a $(k,q)$th order variation regularization ($(k,q)$-VR)
$(k,q)$-VR is defined as the $q$th-powered integral of the absolute $k$th order derivative of the parametric models to be trained.
Our numerical experiments demonstrate that the neural networks trained with the $(k,q)$-VR terms are more resilient'' than those with the conventional parameter regularization.
arXiv Detail & Related papers (2023-08-04T12:57:13Z) - Conditional Matrix Flows for Gaussian Graphical Models [1.6435014180036467]
We propose a general framework for variation inference matrix GG-Flow in which the benefits of frequent keyization and Bayesian inference are studied.
As a train of the sparse for any $lambda$ and any $l_q$ (pse-) and for any $l_q$ (pse-) we have to (i) train the limit for any $lambda$ and any $l_q$ (pse-) and (like for the selection) the frequent solution.
arXiv Detail & Related papers (2023-06-12T17:25:12Z) - KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal [70.15267479220691]
We consider and analyze the sample complexity of model reinforcement learning with a generative variance-free model.
Our analysis shows that it is nearly minimax-optimal for finding an $varepsilon$-optimal policy when $varepsilon$ is sufficiently small.
arXiv Detail & Related papers (2022-05-27T19:39:24Z) - Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption.
They can suffer from ill-posedness and convergence instability.
This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z) - An efficient projection neural network for $\ell_1$-regularized logistic
regression [10.517079029721257]
This paper presents a simple projection neural network for $ell_$-regularized logistics regression.
The proposed neural network does not require any extra auxiliary variable nor any smooth approximation.
We also investigate the convergence of the proposed neural network by using the Lyapunov theory and show that it converges to a solution of the problem with any arbitrary initial value.
arXiv Detail & Related papers (2021-05-12T06:13:44Z) - Fast Minimum-norm Adversarial Attacks through Adaptive Norm Constraints [29.227720674726413]
We propose a fast minimum-norm (FMN) attack that works with different $ell_p$-norm perturbation models.
Experiments show that FMN significantly outperforms existing attacks in terms of convergence speed and time.
arXiv Detail & Related papers (2021-02-25T12:56:26Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Sparse Identification of Nonlinear Dynamical Systems via Reweighted
$\ell_1$-regularized Least Squares [62.997667081978825]
This work proposes an iterative sparse-regularized regression method to recover governing equations of nonlinear systems from noisy state measurements.
The aim of this work is to improve the accuracy and robustness of the method in the presence of state measurement noise.
arXiv Detail & Related papers (2020-05-27T08:30:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.