Related papers: A Unified Probabilistic Framework for Dictionary Learning with Parsimonious Activation

A Unified Probabilistic Framework for Dictionary Learning with Parsimonious Activation

URL: http://arxiv.org/abs/2509.25690v1
Date: Tue, 30 Sep 2025 02:46:11 GMT
Title: A Unified Probabilistic Framework for Dictionary Learning with Parsimonious Activation
Authors: Zihui Zhao, Yuanbo Tang, Jieyu Ren, Xiaoping Zhang, Yang Li,
Abstract summary: We introduce a parsimony promoting regularizer based on the row-wise $L_infty$ norm of the coefficient matrix.<n>This additional penalty encourages entire rows of the coefficient matrix to vanish, thereby reducing the number of dictionary atoms activated across the dataset.
Score: 10.775460285501739
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Dictionary learning is traditionally formulated as an $L_1$-regularized signal reconstruction problem. While recent developments have incorporated discriminative, hierarchical, or generative structures, most approaches rely on encouraging representation sparsity over individual samples that overlook how atoms are shared across samples, resulting in redundant and sub-optimal dictionaries. We introduce a parsimony promoting regularizer based on the row-wise $L_\infty$ norm of the coefficient matrix. This additional penalty encourages entire rows of the coefficient matrix to vanish, thereby reducing the number of dictionary atoms activated across the dataset. We derive the formulation from a probabilistic model with Beta-Bernoulli priors, which provides a Bayesian interpretation linking the regularization parameters to prior distributions. We further establish theoretical calculation for optimal hyperparameter selection and connect our formulation to both Minimum Description Length, Bayesian model selection and pathlet learning. Extensive experiments on benchmark datasets demonstrate that our method achieves substantially improved reconstruction quality (with a 20\% reduction in RMSE) and enhanced representation sparsity, utilizing fewer than one-tenth of the available dictionary atoms, while empirically validating our theoretical analysis.

Related papers

From STLS to Projection-based Dictionary Selection in Sparse Regression for System Identification [1.7341202786497238]
We revisit dictionary-based sparse regression, in particular, Sequential Threshold Least Squares (STLS)<n>We propose a score-guided library selection to provide practical guidance for data-driven modeling, with emphasis on SINDy-type algorithms.
arXiv Detail & Related papers (2025-12-16T13:42:10Z)
Neural Optimal Transport Meets Multivariate Conformal Prediction [58.43397908730771]
We propose a framework for conditional vectorile regression (CVQR)<n>CVQR combines neural optimal transport with quantized optimization, and apply it to predictions.
arXiv Detail & Related papers (2025-09-29T19:50:19Z)
Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z)
Hierarchical mixtures of Unigram models for short text clustering: The role of Beta-Liouville priors [1.03590082373586]
This paper presents a variant of the Multinomial mixture model tailored to the unsupervised classification of short text data.<n>We examine the theoretical properties of the Beta-Liouville distribution, with particular focus on its conjugacy with the Multinomial likelihood.
arXiv Detail & Related papers (2024-10-29T08:56:29Z)
Obtaining Explainable Classification Models using Distributionally Robust Optimization [12.511155426574563]
We study generalized linear models constructed using sets of feature value rules. An inherent trade-off exists between rule set sparsity and its prediction accuracy. We propose a new formulation to learn an ensemble of rule sets that simultaneously addresses these competing factors.
arXiv Detail & Related papers (2023-11-03T15:45:34Z)
Structured Radial Basis Function Network: Modelling Diversity for Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions. A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems. It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z)
Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations. Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z)
ReCAB-VAE: Gumbel-Softmax Variational Inference Based on Analytic Divergence [17.665255113864795]
We present a novel divergence-like metric which corresponds to the upper bound of the Kullback-Leibler divergence (KLD) of a relaxed categorical distribution. We also propose a relaxed categorical analytic bound variational autoencoder (ReCAB-VAE) that successfully models both continuous and relaxed latent representations.
arXiv Detail & Related papers (2022-05-09T08:11:46Z)
A Sparsity-promoting Dictionary Model for Variational Autoencoders [16.61511959679188]
Structuring the latent space in deep generative models is important to yield more expressive models and interpretable representations. We propose a simple yet effective methodology to structure the latent space via a sparsity-promoting dictionary model.
arXiv Detail & Related papers (2022-03-29T17:13:11Z)
Information-Theoretic Generalization Bounds for Iterative Semi-Supervised Learning [81.1071978288003]
In particular, we seek to understand the behaviour of the em generalization error of iterative SSL algorithms using information-theoretic principles. Our theoretical results suggest that when the class conditional variances are not too large, the upper bound on the generalization error decreases monotonically with the number of iterations, but quickly saturates.
arXiv Detail & Related papers (2021-10-03T05:38:49Z)
Statistical limits of dictionary learning: random matrix theory and the spectral replica method [28.54289139061295]
We consider increasingly complex models of matrix denoising and dictionary learning in the Bayes-optimal setting. We introduce a novel combination of the replica method from statistical mechanics together with random matrix theory, coined spectral replica method.
arXiv Detail & Related papers (2021-09-14T12:02:32Z)
Understanding Implicit Regularization in Over-Parameterized Single Index Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model. We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.