Simple Alternating Minimization Provably Solves Complete Dictionary
Learning
- URL: http://arxiv.org/abs/2210.12816v1
- Date: Sun, 23 Oct 2022 18:30:45 GMT
- Title: Simple Alternating Minimization Provably Solves Complete Dictionary
Learning
- Authors: Geyu Liang, Gavin Zhang, Salar Fattahi, Richard Y. Zhang
- Abstract summary: This paper focuses on complete dictionary problem, where the goal is to reparametrize a set of given signals as linear combinations of atoms from a learned dictionary.
There are two main challenges faced by theoretical and practical dictionary learning: the lack of theoretical guarantees for practically-used algorithms, and poor scalability when dealing with huge-scale datasets.
- Score: 13.056764072568749
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper focuses on complete dictionary learning problem, where the goal is
to reparametrize a set of given signals as linear combinations of atoms from a
learned dictionary. There are two main challenges faced by theoretical and
practical studies of dictionary learning: the lack of theoretical guarantees
for practically-used heuristic algorithms, and their poor scalability when
dealing with huge-scale datasets. Towards addressing these issues, we show that
when the dictionary to be learned is orthogonal, that an alternating
minimization method directly applied to the nonconvex and discrete formulation
of the problem exactly recovers the ground truth. For the huge-scale,
potentially online setting, we propose a minibatch version of our algorithm,
which can provably learn a complete dictionary from a huge-scale dataset with
minimal sample complexity, linear sparsity level, and linear convergence rate,
thereby negating the need for any convex relaxation for the problem. Our
numerical experiments showcase the superiority of our method compared with the
existing techniques when applied to tasks on real data.
Related papers
- MaxMatch: Semi-Supervised Learning with Worst-Case Consistency [149.03760479533855]
We propose a worst-case consistency regularization technique for semi-supervised learning (SSL)
We present a generalization bound for SSL consisting of the empirical loss terms observed on labeled and unlabeled training data separately.
Motivated by this bound, we derive an SSL objective that minimizes the largest inconsistency between an original unlabeled sample and its multiple augmented variants.
arXiv Detail & Related papers (2022-09-26T12:04:49Z) - New Tight Relaxations of Rank Minimization for Multi-Task Learning [161.23314844751556]
We propose two novel multi-task learning formulations based on two regularization terms.
We show that our methods can correctly recover the low-rank structure shared across tasks, and outperform related multi-task learning methods.
arXiv Detail & Related papers (2021-12-09T07:29:57Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - Dictionary Learning Using Rank-One Atomic Decomposition (ROAD) [6.367823813868024]
Dictionary learning aims at seeking a dictionary under which the training data can be sparsely represented.
Road outperforms other benchmark algorithms for both synthetic data and real data.
arXiv Detail & Related papers (2021-10-25T10:29:52Z) - Label-Descriptive Patterns and their Application to Characterizing
Classification Errors [31.272875287136426]
State-of-the-art deep learning methods achieve human-like performance on many tasks, but make errors nevertheless.
Characterizing these errors in easily interpretable terms gives insight into whether a model is prone to making systematic errors, but also gives a way to act and improve the model.
In this paper we propose a method that allows us to do so for arbitrary classifiers by mining a small set of patterns that together succinctly describe the input data that is partitioned according to correctness of prediction.
arXiv Detail & Related papers (2021-10-18T19:42:21Z) - Information-Theoretic Generalization Bounds for Iterative
Semi-Supervised Learning [81.1071978288003]
In particular, we seek to understand the behaviour of the em generalization error of iterative SSL algorithms using information-theoretic principles.
Our theoretical results suggest that when the class conditional variances are not too large, the upper bound on the generalization error decreases monotonically with the number of iterations, but quickly saturates.
arXiv Detail & Related papers (2021-10-03T05:38:49Z) - PUDLE: Implicit Acceleration of Dictionary Learning by Backpropagation [4.081440927534577]
This paper offers the first theoretical proof for empirical results through PUDLE, a Provable Unfolded Dictionary LEarning method.
We highlight the minimization impact of loss, unfolding, and backpropagation on convergence.
We complement our findings through synthetic and image denoising experiments.
arXiv Detail & Related papers (2021-05-31T18:49:58Z) - Online Orthogonal Dictionary Learning Based on Frank-Wolfe Method [3.198144010381572]
Dictionary learning is a widely used unsupervised learning method in signal processing and machine learning.
The proposed scheme includes a novel problem formulation and an efficient online algorithm design with convergence analysis.
Experiments with synthetic data and real-world sensor readings demonstrate the effectiveness and efficiency of the proposed scheme.
arXiv Detail & Related papers (2021-03-02T05:49:23Z) - Semi-Supervised Learning with Meta-Gradient [123.26748223837802]
We propose a simple yet effective meta-learning algorithm in semi-supervised learning.
We find that the proposed algorithm performs favorably against state-of-the-art methods.
arXiv Detail & Related papers (2020-07-08T08:48:56Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.