WEEP: A Differentiable Nonconvex Sparse Regularizer via Weakly-Convex Envelope
- URL: http://arxiv.org/abs/2507.20447v1
- Date: Mon, 28 Jul 2025 00:40:48 GMT
- Title: WEEP: A Differentiable Nonconvex Sparse Regularizer via Weakly-Convex Envelope
- Authors: Takanobu Furuhashi, Hidekata Hontani, Tatsuya Yokota,
- Abstract summary: Sparse regularization is fundamental in signal processing for efficient signal recovery and feature extraction.<n>This paper presents a novel form of sparsity-inducing penalty: the Weak-Weiable Penalty envelope.<n>We show superior performance and superior image denoising compared to established computational tractability.
- Score: 6.2573333363884425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sparse regularization is fundamental in signal processing for efficient signal recovery and feature extraction. However, it faces a fundamental dilemma: the most powerful sparsity-inducing penalties are often non-differentiable, conflicting with gradient-based optimizers that dominate the field. We introduce WEEP (Weakly-convex Envelope of Piecewise Penalty), a novel, fully differentiable sparse regularizer derived from the weakly-convex envelope framework. WEEP provides strong, unbiased sparsity while maintaining full differentiability and L-smoothness, making it natively compatible with any gradient-based optimizer. This resolves the conflict between statistical performance and computational tractability. We demonstrate superior performance compared to the L1-norm and other established non-convex sparse regularizers on challenging signal and image denoising tasks.
Related papers
- Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs [55.77845440440496]
Push-based decentralized communication enables optimization over communication networks, where information exchange may be asymmetric.<n>We develop a unified uniform-stability framework for the Gradient Push (SGP) algorithm.<n>A key technical ingredient is an imbalance-aware generalization bound through two quantities.
arXiv Detail & Related papers (2026-02-24T05:32:03Z) - Generalizing GNNs with Tokenized Mixture of Experts [75.8310720413187]
We show that improving stability requires reducing reliance on shift-sensitive features, leaving an irreducible worst-case generalization floor.<n>We propose STEM-GNN, a pretrain-then-finetune framework with a mixture-of-experts encoder for diverse computation paths.<n>Across nine node, link, and graph benchmarks, STEM-GNN achieves a stronger three-way balance, improving robustness to degree/homophily shifts and to feature/edge corruptions while remaining competitive on clean graphs.
arXiv Detail & Related papers (2026-02-09T22:48:30Z) - Provably Convergent Decentralized Optimization over Directed Graphs under Generalized Smoothness [1.5892054128426507]
Hessian norm is allowed to grow linearly with the gradient norm, thereby accommodating rapidly varying gradients beyond Lipschitz smoothness.<n>We integrate gradient-tracking techniques with gradient clipping to ensure accurate convergence over directed communication graphs.<n>Our results remain valid even when the gradient dissimilarity is unbounded, making the proposed framework more applicable to realistic heterogeneous data environments.
arXiv Detail & Related papers (2026-01-07T04:25:33Z) - Parallel Diffusion Solver via Residual Dirichlet Policy Optimization [88.7827307535107]
Diffusion models (DMs) have achieved state-of-the-art generative performance but suffer from high sampling latency due to their sequential denoising nature.<n>Existing solver-based acceleration methods often face significant image quality degradation under a low-dimensional budget.<n>We propose the Ensemble Parallel Direction solver (dubbed as EPD-EPr), a novel ODE solver that mitigates these errors by incorporating multiple gradient parallel evaluations in each step.
arXiv Detail & Related papers (2025-12-28T05:48:55Z) - Efficient Differentiable Approximation of Generalized Low-rank Regularization [64.73416824444328]
Low-rank regularization (LRR) has been widely applied in various machine learning tasks.<n>In this paper, we propose an efficient differentiable approximation of LRR.
arXiv Detail & Related papers (2025-05-21T11:49:17Z) - The Performance Of The Unadjusted Langevin Algorithm Without Smoothness Assumptions [0.0]
We propose a Langevin-based algorithm that does not rely on popular but computationally challenging techniques.<n>We show that the performance of samplers like ULA does not necessarily degenerate arbitrarily with low regularity.
arXiv Detail & Related papers (2025-02-05T18:55:54Z) - MARINA-P: Superior Performance in Non-smooth Federated Optimization with Adaptive Stepsizes [57.24311218570012]
We extend the non-smooth convex theory of EF21-P (Anonymous 2024) and MARINA-P (arXiv:2402.06412) in the non-size convex setting.<n>We provide theoretical guarantees under constant, decreasing, and adaptive (aktypetype) steps.
arXiv Detail & Related papers (2024-12-22T16:18:34Z) - Gradient Normalization Provably Benefits Nonconvex SGD under Heavy-Tailed Noise [60.92029979853314]
We investigate the roles of gradient normalization and clipping in ensuring the convergence of Gradient Descent (SGD) under heavy-tailed noise.
Our work provides the first theoretical evidence demonstrating the benefits of gradient normalization in SGD under heavy-tailed noise.
We introduce an accelerated SGD variant incorporating gradient normalization and clipping, further enhancing convergence rates under heavy-tailed noise.
arXiv Detail & Related papers (2024-10-21T22:40:42Z) - Taming Nonconvex Stochastic Mirror Descent with General Bregman
Divergence [25.717501580080846]
This paper revisits the convergence of gradient Forward Mirror (SMD) in the contemporary non optimization setting.
For the training, we develop provably convergent algorithms for the problem of linear networks.
arXiv Detail & Related papers (2024-02-27T17:56:49Z) - Towards More Robust Interpretation via Local Gradient Alignment [37.464250451280336]
We show that for every non-negative homogeneous neural network, a naive $ell$-robust criterion for gradients is textitnot normalization invariant.
We propose to combine both $ell$ and cosine distance-based criteria as regularization terms to leverage the advantages of both in aligning the local gradient.
We experimentally show that models trained with our method produce much more robust interpretations on CIFAR-10 and ImageNet-100.
arXiv Detail & Related papers (2022-11-29T03:38:28Z) - Learning with Noisy Labels via Sparse Regularization [76.31104997491695]
Learning with noisy labels is an important task for training accurate deep neural networks.
Some commonly-used loss functions, such as Cross Entropy (CE), suffer from severe overfitting to noisy labels.
We introduce the sparse regularization strategy to approximate the one-hot constraint.
arXiv Detail & Related papers (2021-07-31T09:40:23Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Semi-Sparsity for Smoothing Filters [1.1404527665142667]
We show a new semi-sparsity smoothing algorithm based on a novel sparsity-inducing framework.
We show many benefits to a series of signal/image processing and computer vision applications.
arXiv Detail & Related papers (2021-07-01T17:31:42Z) - Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses [52.039438701530905]
We provide sharp upper and lower bounds for several forms of gradient descent (SGD) on arbitrary Lipschitz nonsmooth convex losses.
Our bounds allow us to derive a new algorithm for differentially private nonsmooth convex optimization with optimal excess population risk.
arXiv Detail & Related papers (2020-06-12T02:45:21Z) - The Strength of Nesterov's Extrapolation in the Individual Convergence
of Nonsmooth Optimization [0.0]
We prove that Nesterov's extrapolation has the strength to make the individual convergence of gradient descent methods optimal for nonsmooth problems.
We give an extension of the derived algorithms to solve regularized learning tasks with nonsmooth losses in settings.
Our method is applicable as an efficient tool for solving large-scale $l$1-regularized hinge-loss learning problems.
arXiv Detail & Related papers (2020-06-08T03:35:41Z) - Distributionally Robust Bayesian Optimization [121.71766171427433]
We present a novel distributionally robust Bayesian optimization algorithm (DRBO) for zeroth-order, noisy optimization.
Our algorithm provably obtains sub-linear robust regret in various settings.
We demonstrate the robust performance of our method on both synthetic and real-world benchmarks.
arXiv Detail & Related papers (2020-02-20T22:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.