Related papers: A Novel Sparse Regularizer

A Novel Sparse Regularizer

URL: http://arxiv.org/abs/2301.07285v5
Date: Thu, 20 Apr 2023 19:37:05 GMT
Title: A Novel Sparse Regularizer
Authors: Hovig Tigran Bayandorian
Abstract summary: This paper introduces a regularizer based on minimizing a novel measure of entropy applied to the model during optimization. It is differentiable, simple and fast to compute, scale-invariant, requires a trivial amount of additional memory, and can easily be parallelized.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: $L_p$-norm regularization schemes such as $L_0$, $L_1$, and $L_2$-norm regularization and $L_p$-norm-based regularization techniques such as weight decay, LASSO, and elastic net compute a quantity which depends on model weights considered in isolation from one another. This paper introduces a regularizer based on minimizing a novel measure of entropy applied to the model during optimization. In contrast with $L_p$-norm-based regularization, this regularizer is concerned with the spatial arrangement of weights within a weight matrix. This novel regularizer is an additive term for the loss function and is differentiable, simple and fast to compute, scale-invariant, requires a trivial amount of additional memory, and can easily be parallelized. Empirically this method yields approximately a one order-of-magnitude improvement in the number of nonzero model parameters required to achieve a given level of test accuracy when training LeNet300 on MNIST.

Related papers

Scalable Approximate Algorithms for Optimal Transport Linear Models [0.769672852567215]
We propose a novel framework for solving a general class of non-negative linear regression models with an entropy-regularized OT datafit term. We derive simple multiplicative updates for common penalty and datafit terms. This method is suitable for large-scale problems due to its simplicity of implementation and straightforward parallelization.
arXiv Detail & Related papers (2025-04-06T20:37:25Z)
Training Deep Learning Models with Norm-Constrained LMOs [56.00317694850397]
We study optimization methods that leverage the linear minimization oracle (LMO) over a norm-ball. We propose a new family of algorithms that uses the LMO to adapt to the geometry of the problem and, perhaps surprisingly, show that they can be applied to unconstrained problems.
arXiv Detail & Related papers (2025-02-11T13:10:34Z)
Pushing the Limits of Large Language Model Quantization via the Linearity Theorem [71.3332971315821]
We present a "line theoremarity" establishing a direct relationship between the layer-wise $ell$ reconstruction error and the model perplexity increase due to quantization. This insight enables two novel applications: (1) a simple data-free LLM quantization method using Hadamard rotations and MSE-optimal grids, dubbed HIGGS, and (2) an optimal solution to the problem of finding non-uniform per-layer quantization levels.
arXiv Detail & Related papers (2024-11-26T15:35:44Z)
Iterative Reweighted Framework Based Algorithms for Sparse Linear Regression with Generalized Elastic Net Penalty [0.3124884279860061]
elastic net penalty is frequently employed in high-dimensional statistics for parameter regression and variable selection. empirical evidence has shown that the $ell_q$-norm penalty often provides better regression compared to the $ell_r$-norm penalty. We develop two efficient algorithms based on the locally Lipschitz continuous $epsilon$-approximation to $ell_q$-norm.
arXiv Detail & Related papers (2024-11-22T11:55:37Z)
Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models. We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z)
Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks. In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z)
Decoupled Weight Decay for Any $p$ Norm [1.1510009152620668]
We consider a simple yet effective approach to sparsification, based on the Bridge, $L_p$ regularization during training. We introduce a novel weight decay scheme, which generalizes the standard $L$ weight decay to any $p$ norm. We empirically demonstrate that it leads to highly sparse networks, while maintaining performance comparable to standard $L$ regularization.
arXiv Detail & Related papers (2024-04-16T18:02:15Z)
Conditional Matrix Flows for Gaussian Graphical Models [1.6435014180036467]
We propose a general framework for variation inference matrix GG-Flow in which the benefits of frequent keyization and Bayesian inference are studied. As a train of the sparse for any $lambda$ and any $l_q$ (pse-) and for any $l_q$ (pse-) we have to (i) train the limit for any $lambda$ and any $l_q$ (pse-) and (like for the selection) the frequent solution.
arXiv Detail & Related papers (2023-06-12T17:25:12Z)
KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal [70.15267479220691]
We consider and analyze the sample complexity of model reinforcement learning with a generative variance-free model. Our analysis shows that it is nearly minimax-optimal for finding an $varepsilon$-optimal policy when $varepsilon$ is sufficiently small.
arXiv Detail & Related papers (2022-05-27T19:39:24Z)
Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption. They can suffer from ill-posedness and convergence instability. This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z)
An efficient projection neural network for $\ell_1$-regularized logistic regression [10.517079029721257]
This paper presents a simple projection neural network for $ell_$-regularized logistics regression. The proposed neural network does not require any extra auxiliary variable nor any smooth approximation. We also investigate the convergence of the proposed neural network by using the Lyapunov theory and show that it converges to a solution of the problem with any arbitrary initial value.
arXiv Detail & Related papers (2021-05-12T06:13:44Z)
Fast Minimum-norm Adversarial Attacks through Adaptive Norm Constraints [29.227720674726413]
We propose a fast minimum-norm (FMN) attack that works with different $ell_p$-norm perturbation models. Experiments show that FMN significantly outperforms existing attacks in terms of convergence speed and time.
arXiv Detail & Related papers (2021-02-25T12:56:26Z)
Sparse Identification of Nonlinear Dynamical Systems via Reweighted $\ell_1$-regularized Least Squares [62.997667081978825]
This work proposes an iterative sparse-regularized regression method to recover governing equations of nonlinear systems from noisy state measurements. The aim of this work is to improve the accuracy and robustness of the method in the presence of state measurement noise.
arXiv Detail & Related papers (2020-05-27T08:30:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.