Samplets: A new paradigm for data compression
- URL: http://arxiv.org/abs/2107.03337v1
- Date: Wed, 7 Jul 2021 16:25:12 GMT
- Title: Samplets: A new paradigm for data compression
- Authors: Helmut Harbrecht and Michael Multerer
- Abstract summary: We introduce the novel concept of samplets by transferring the construction of Tausch-White wavelets to the realm of data.
This way we obtain a multilevel representation of discrete data which enables data compression, detection of singularities and adaptivity.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this article, we introduce the novel concept of samplets by transferring
the construction of Tausch-White wavelets to the realm of data. This way we
obtain a multilevel representation of discrete data which directly enables data
compression, detection of singularities and adaptivity. Applying samplets to
represent kernel matrices, as they arise in kernel based learning or Gaussian
process regression, we end up with quasi-sparse matrices. By thresholding small
entries, these matrices are compressible to O(N log N) relevant entries, where
N is the number of data points. This feature allows for the use of fill-in
reducing reorderings to obtain a sparse factorization of the compressed
matrices. Besides the comprehensive introduction to samplets and their
properties, we present extensive numerical studies to benchmark the approach.
Our results demonstrate that samplets mark a considerable step in the direction
of making large data sets accessible for analysis.
Related papers
- Low-rank Bayesian matrix completion via geodesic Hamiltonian Monte Carlo on Stiefel manifolds [0.18416014644193066]
We present a new sampling-based approach for enabling efficient computation of low-rank Bayesian matrix completion.
We show that our approach resolves the sampling difficulties encountered by standard Gibbs samplers for the common two-matrix factorization used in matrix completion.
Numerical examples demonstrate superior sampling performance, including better mixing and faster convergence to a stationary distribution.
arXiv Detail & Related papers (2024-10-27T03:12:53Z) - Unified Convergence Analysis for Score-Based Diffusion Models with Deterministic Samplers [49.1574468325115]
We introduce a unified convergence analysis framework for deterministic samplers.
Our framework achieves iteration complexity of $tilde O(d2/epsilon)$.
We also provide a detailed analysis of Denoising Implicit Diffusion Models (DDIM)-type samplers.
arXiv Detail & Related papers (2024-10-18T07:37:36Z) - Entrywise Inference for Missing Panel Data: A Simple and Instance-Optimal Approach [27.301741710016223]
We consider inferential questions associated with the missing data version of panel data induced by staggered adoption.
We develop and analyze a data-driven procedure for constructing entrywise confidence intervals with pre-specified coverage.
We prove non-asymptotic and high-probability bounds on its error in estimating each missing entry.
arXiv Detail & Related papers (2024-01-24T18:58:18Z) - Self-Supervised Dataset Distillation for Transfer Learning [77.4714995131992]
We propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL)
We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is textitbiased due to randomness originating from data augmentations or masking.
We empirically validate the effectiveness of our method on various applications involving transfer learning.
arXiv Detail & Related papers (2023-10-10T10:48:52Z) - Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction.
We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.
In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z) - Samplet basis pursuit: Multiresolution scattered data approximation with sparsity constraints [0.0]
We consider scattered data approximation in samplet coordinates with $ell_1$-regularization.
By using the Riesz isometry, we embed samplets into reproducing kernel Hilbert spaces.
We argue that the class of signals that are sparse with respect to the embedded samplet basis is considerably larger than the class of signals that are sparse with respect to the basis of kernel translates.
arXiv Detail & Related papers (2023-06-16T21:20:49Z) - Adaptive Sketches for Robust Regression with Importance Sampling [64.75899469557272]
We introduce data structures for solving robust regression through gradient descent (SGD)
Our algorithm effectively runs $T$ steps of SGD with importance sampling while using sublinear space and just making a single pass over the data.
arXiv Detail & Related papers (2022-07-16T03:09:30Z) - Adaptive Cholesky Gaussian Processes [7.684183064816171]
We present a method to fit exact Gaussian process models to large datasets by considering only a subset of the data.
Our approach is novel in that the size of the subset is selected on the fly during exact inference with little computational overhead.
arXiv Detail & Related papers (2022-02-22T09:43:46Z) - Towards Sample-Optimal Compressive Phase Retrieval with Sparse and
Generative Priors [59.33977545294148]
We show that $O(k log L)$ samples suffice to guarantee that the signal is close to any vector that minimizes an amplitude-based empirical loss function.
We adapt this result to sparse phase retrieval, and show that $O(s log n)$ samples are sufficient for a similar guarantee when the underlying signal is $s$-sparse and $n$-dimensional.
arXiv Detail & Related papers (2021-06-29T12:49:54Z) - Out-of-distribution Detection and Generation using Soft Brownian Offset
Sampling and Autoencoders [1.313418334200599]
Deep neural networks often suffer from overconfidence which can be partly remedied by improved out-of-distribution detection.
We propose a novel approach that allows for the generation of out-of-distribution datasets based on a given in-distribution dataset.
This new dataset can then be used to improve out-of-distribution detection for the given dataset and machine learning task at hand.
arXiv Detail & Related papers (2021-05-04T06:59:24Z) - Kernel learning approaches for summarising and combining posterior
similarity matrices [68.8204255655161]
We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models.
A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
arXiv Detail & Related papers (2020-09-27T14:16:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.