Related papers: Simple Alternating Minimization Provably Solves Complete Dictionary Learning

Simple Alternating Minimization Provably Solves Complete Dictionary Learning

URL: http://arxiv.org/abs/2210.12816v1
Date: Sun, 23 Oct 2022 18:30:45 GMT
Title: Simple Alternating Minimization Provably Solves Complete Dictionary Learning
Authors: Geyu Liang, Gavin Zhang, Salar Fattahi, Richard Y. Zhang
Abstract summary: This paper focuses on complete dictionary problem, where the goal is to reparametrize a set of given signals as linear combinations of atoms from a learned dictionary. There are two main challenges faced by theoretical and practical dictionary learning: the lack of theoretical guarantees for practically-used algorithms, and poor scalability when dealing with huge-scale datasets.
Score: 13.056764072568749
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper focuses on complete dictionary learning problem, where the goal is to reparametrize a set of given signals as linear combinations of atoms from a learned dictionary. There are two main challenges faced by theoretical and practical studies of dictionary learning: the lack of theoretical guarantees for practically-used heuristic algorithms, and their poor scalability when dealing with huge-scale datasets. Towards addressing these issues, we show that when the dictionary to be learned is orthogonal, that an alternating minimization method directly applied to the nonconvex and discrete formulation of the problem exactly recovers the ground truth. For the huge-scale, potentially online setting, we propose a minibatch version of our algorithm, which can provably learn a complete dictionary from a huge-scale dataset with minimal sample complexity, linear sparsity level, and linear convergence rate, thereby negating the need for any convex relaxation for the problem. Our numerical experiments showcase the superiority of our method compared with the existing techniques when applied to tasks on real data.

Related papers

Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling [90.86991492288487]
evaluating constraint on every token can be prohibitively expensive. LCD can distort the global distribution over strings, sampling tokens based only on local information. We show that our approach is superior to state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-07T18:30:18Z)
MaxMatch: Semi-Supervised Learning with Worst-Case Consistency [149.03760479533855]
We propose a worst-case consistency regularization technique for semi-supervised learning (SSL) We present a generalization bound for SSL consisting of the empirical loss terms observed on labeled and unlabeled training data separately. Motivated by this bound, we derive an SSL objective that minimizes the largest inconsistency between an original unlabeled sample and its multiple augmented variants.
arXiv Detail & Related papers (2022-09-26T12:04:49Z)
Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR) Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model. We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z)
New Tight Relaxations of Rank Minimization for Multi-Task Learning [161.23314844751556]
We propose two novel multi-task learning formulations based on two regularization terms. We show that our methods can correctly recover the low-rank structure shared across tasks, and outperform related multi-task learning methods.
arXiv Detail & Related papers (2021-12-09T07:29:57Z)
Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances. Online descent (OGD) is a popular approach to handle streaming data in pairwise learning. In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z)
Dictionary Learning Using Rank-One Atomic Decomposition (ROAD) [6.367823813868024]
Dictionary learning aims at seeking a dictionary under which the training data can be sparsely represented. Road outperforms other benchmark algorithms for both synthetic data and real data.
arXiv Detail & Related papers (2021-10-25T10:29:52Z)
Label-Descriptive Patterns and their Application to Characterizing Classification Errors [31.272875287136426]
State-of-the-art deep learning methods achieve human-like performance on many tasks, but make errors nevertheless. Characterizing these errors in easily interpretable terms gives insight into whether a model is prone to making systematic errors, but also gives a way to act and improve the model. In this paper we propose a method that allows us to do so for arbitrary classifiers by mining a small set of patterns that together succinctly describe the input data that is partitioned according to correctness of prediction.
arXiv Detail & Related papers (2021-10-18T19:42:21Z)
Information-Theoretic Generalization Bounds for Iterative Semi-Supervised Learning [81.1071978288003]
In particular, we seek to understand the behaviour of the em generalization error of iterative SSL algorithms using information-theoretic principles. Our theoretical results suggest that when the class conditional variances are not too large, the upper bound on the generalization error decreases monotonically with the number of iterations, but quickly saturates.
arXiv Detail & Related papers (2021-10-03T05:38:49Z)
Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination [82.52105963476703]
A recurring theme in statistical learning, online learning, and beyond is that faster convergence rates are possible for problems with low noise. First-order guarantees are relatively well understood in statistical and online learning. We show that the logarithmic loss and an information-theoretic quantity called the triangular discrimination play a fundamental role in obtaining first-order guarantees.
arXiv Detail & Related papers (2021-07-05T19:20:34Z)
PUDLE: Implicit Acceleration of Dictionary Learning by Backpropagation [4.081440927534577]
This paper offers the first theoretical proof for empirical results through PUDLE, a Provable Unfolded Dictionary LEarning method. We highlight the minimization impact of loss, unfolding, and backpropagation on convergence. We complement our findings through synthetic and image denoising experiments.
arXiv Detail & Related papers (2021-05-31T18:49:58Z)
Efficient Sparse Coding using Hierarchical Riemannian Pursuit [2.4087148947930634]
Sparse coding is a class of unsupervised methods for learning a representation of the input data in the form of a linear combination of a dictionary and a code. We propose an efficient synthetic state scheme for sparse coding tasks with a complete dictionary.
arXiv Detail & Related papers (2021-04-21T02:16:44Z)
Online Orthogonal Dictionary Learning Based on Frank-Wolfe Method [3.198144010381572]
Dictionary learning is a widely used unsupervised learning method in signal processing and machine learning. The proposed scheme includes a novel problem formulation and an efficient online algorithm design with convergence analysis. Experiments with synthetic data and real-world sensor readings demonstrate the effectiveness and efficiency of the proposed scheme.
arXiv Detail & Related papers (2021-03-02T05:49:23Z)
Autoregressive Belief Propagation for Decoding Block Codes [113.38181979662288]
We revisit recent methods that employ graph neural networks for decoding error correcting codes. Our method violates the symmetry conditions that enable the other methods to train exclusively with the zero-word. Despite not having the luxury of training on a single word, and the inability to train on more than a small fraction of the relevant sample space, we demonstrate effective training.
arXiv Detail & Related papers (2021-01-23T17:14:55Z)
Accelerated Sparse Bayesian Learning via Screening Test and Its Applications [0.9916217495995309]
For a linear system, to find the sparsest solution provided with an over-complete dictionary of features directly is typically NP-hard. We propose sparse Bayesian learning, which uses a parameterized prior to encourage sparsity in solution.
arXiv Detail & Related papers (2020-07-08T10:21:56Z)
Semi-Supervised Learning with Meta-Gradient [123.26748223837802]
We propose a simple yet effective meta-learning algorithm in semi-supervised learning. We find that the proposed algorithm performs favorably against state-of-the-art methods.
arXiv Detail & Related papers (2020-07-08T08:48:56Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.