Low-Rank Approximation of Structural Redundancy for Self-Supervised Learning
- URL: http://arxiv.org/abs/2402.06884v2
- Date: Mon, 27 May 2024 22:11:00 GMT
- Title: Low-Rank Approximation of Structural Redundancy for Self-Supervised Learning
- Authors: Kang Du, Yu Xiang,
- Abstract summary: We study the data-generating mechanism for reconstructive SSL to shed light on its effectiveness.
With an infinite amount of labeled samples, we provide a sufficient and necessary condition for perfect linear approximation.
Motivated by the condition, we propose to approximate the redundant component by a low-rank factorization.
- Score: 2.3072402651280517
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the data-generating mechanism for reconstructive SSL to shed light on its effectiveness. With an infinite amount of labeled samples, we provide a sufficient and necessary condition for perfect linear approximation. The condition reveals a full-rank component that preserves the label classes of Y, along with a redundant component. Motivated by the condition, we propose to approximate the redundant component by a low-rank factorization and measure the approximation quality by introducing a new quantity $\epsilon_s$, parameterized by the rank of factorization s. We incorporate $\epsilon_s$ into the excess risk analysis under both linear regression and ridge regression settings, where the latter regularization approach is to handle scenarios when the dimension of the learned features is much larger than the number of labeled samples n for downstream tasks. We design three stylized experiments to compare SSL with supervised learning under different settings to support our theoretical findings.
Related papers
- Shuffled Linear Regression via Spectral Matching [6.24954299842136]
Shuffled linear regression seeks to estimate latent features through a linear transformation.
This problem extends traditional least-squares (LS) and Least Absolute Shrinkage and Selection Operator (LASSO) approaches.
We propose a spectral matching method that efficiently resolves permutations.
arXiv Detail & Related papers (2024-09-30T16:26:40Z) - Zero-Shot Class Unlearning in CLIP with Synthetic Samples [0.0]
We focus on unlearning within CLIP, a dual vision-language model trained on a massive dataset of image-text pairs.
We apply Lipschitz regularization to the multimodal context of CLIP.
Our forgetting procedure is iterative, where we track accuracy on a synthetic forget set and stop when accuracy falls below a chosen threshold.
arXiv Detail & Related papers (2024-07-10T09:16:14Z) - Minimum-Risk Recalibration of Classifiers [9.31067660373791]
We introduce the concept of minimum-risk recalibration within the framework of mean-squared-error decomposition.
We show that transferring a calibrated classifier requires significantly fewer target samples compared to recalibrating from scratch.
arXiv Detail & Related papers (2023-05-18T11:27:02Z) - On the Sample Complexity of Vanilla Model-Based Offline Reinforcement
Learning with Dependent Samples [32.707730631343416]
offline reinforcement learning (offline RL) considers problems where learning is performed using only previously collected samples.
In model-based offline RL, the learner performs estimation (or optimization) using a model constructed according to the empirical transition.
We analyze the sample complexity of vanilla model-based offline RL with dependent samples in the infinite-horizon discounted-reward setting.
arXiv Detail & Related papers (2023-03-07T22:39:23Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Vector-Valued Least-Squares Regression under Output Regularity
Assumptions [73.99064151691597]
We propose and analyse a reduced-rank method for solving least-squares regression problems with infinite dimensional output.
We derive learning bounds for our method, and study under which setting statistical performance is improved in comparison to full-rank method.
arXiv Detail & Related papers (2022-11-16T15:07:00Z) - GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot
Learning [55.79997930181418]
Generalized Zero-Shot Learning aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes.
It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes.
We propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation.
arXiv Detail & Related papers (2022-07-05T04:04:37Z) - Adaptive neighborhood Metric learning [184.95321334661898]
We propose a novel distance metric learning algorithm, named adaptive neighborhood metric learning (ANML)
ANML can be used to learn both the linear and deep embeddings.
The emphlog-exp mean function proposed in our method gives a new perspective to review the deep metric learning methods.
arXiv Detail & Related papers (2022-01-20T17:26:37Z) - Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks.
We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator.
To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z) - Semi-Supervised Empirical Risk Minimization: Using unlabeled data to
improve prediction [4.860671253873579]
We present a general methodology for using unlabeled data to design semi supervised learning (SSL) variants of the Empirical Risk Minimization (ERM) learning process.
We analyze of the effectiveness of our SSL approach in improving prediction performance.
arXiv Detail & Related papers (2020-09-01T17:55:51Z) - Finite-time Identification of Stable Linear Systems: Optimality of the
Least-Squares Estimator [79.3239137440876]
We present a new finite-time analysis of the estimation error of the Ordinary Least Squares (OLS) estimator for stable linear time-invariant systems.
We characterize the number of observed samples sufficient for the OLS estimator to be $(varepsilon,delta)$-PAC, i.e., to yield an estimation error less than $varepsilon$ with probability at least $1-delta$.
arXiv Detail & Related papers (2020-03-17T20:59:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.