On the Provable Advantage of Unsupervised Pretraining
- URL: http://arxiv.org/abs/2303.01566v1
- Date: Thu, 2 Mar 2023 20:42:05 GMT
- Title: On the Provable Advantage of Unsupervised Pretraining
- Authors: Jiawei Ge, Shange Tang, Jianqing Fan, Chi Jin
- Abstract summary: Unsupervised pretraining is a critical component of modern large-scale machine learning systems.
This paper studies a generic framework, where the unsupervised representation learning task is specified by an abstract class of latent variable models.
Under a mild ''informative'' condition, our algorithm achieves an excess risk of $tildemathcalO(sqrtmathcalC_Phi/m + sqrtmathcalC_Psi/n)$ for downstream tasks.
- Score: 26.065736182939222
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised pretraining, which learns a useful representation using a large
amount of unlabeled data to facilitate the learning of downstream tasks, is a
critical component of modern large-scale machine learning systems. Despite its
tremendous empirical success, the rigorous theoretical understanding of why
unsupervised pretraining generally helps remains rather limited -- most
existing results are restricted to particular methods or approaches for
unsupervised pretraining with specialized structural assumptions. This paper
studies a generic framework, where the unsupervised representation learning
task is specified by an abstract class of latent variable models $\Phi$ and the
downstream task is specified by a class of prediction functions $\Psi$. We
consider a natural approach of using Maximum Likelihood Estimation (MLE) for
unsupervised pretraining and Empirical Risk Minimization (ERM) for learning
downstream tasks. We prove that, under a mild ''informative'' condition, our
algorithm achieves an excess risk of
$\tilde{\mathcal{O}}(\sqrt{\mathcal{C}_\Phi/m} + \sqrt{\mathcal{C}_\Psi/n})$
for downstream tasks, where $\mathcal{C}_\Phi, \mathcal{C}_\Psi$ are complexity
measures of function classes $\Phi, \Psi$, and $m, n$ are the number of
unlabeled and labeled data respectively. Comparing to the baseline of
$\tilde{\mathcal{O}}(\sqrt{\mathcal{C}_{\Phi \circ \Psi}/n})$ achieved by
performing supervised learning using only the labeled data, our result
rigorously shows the benefit of unsupervised pretraining when $m \gg n$ and
$\mathcal{C}_{\Phi\circ \Psi} > \mathcal{C}_\Psi$. This paper further shows
that our generic framework covers a wide range of approaches for unsupervised
pretraining, including factor models, Gaussian mixture models, and contrastive
learning.
Related papers
- On Characterizing and Mitigating Imbalances in Multi-Instance Partial Label Learning [57.18649648182171]
We make contributions towards addressing a problem that hasn't been studied so far in the context of MI-PLL.
We derive class-specific risk bounds for MI-PLL, while making minimal assumptions.
Our theory reveals a unique phenomenon: that $sigma$ can greatly impact learning imbalances.
arXiv Detail & Related papers (2024-07-13T20:56:34Z) - Aligning Model Properties via Conformal Risk Control [4.710921988115686]
Post-training alignment via human feedback shows promise, but is often limited to generative AI settings.
In traditional non-generative settings with numerical or categorical outputs, detecting misalignment through single-sample outputs remains challenging.
We propose interpreting model alignment through property testing, defining an aligned model $f$ as one belonging to a subset $mathcalP$ of functions.
arXiv Detail & Related papers (2024-06-26T22:24:46Z) - Uncertainty-Aware Reward-Free Exploration with General Function Approximation [69.27868448449755]
In this paper, we propose a reward-free reinforcement learning algorithm called alg.
The key idea behind our algorithm is an uncertainty-aware intrinsic reward for exploring the environment.
Experiment results show that GFA-RFE outperforms or is comparable to the performance of state-of-the-art unsupervised RL algorithms.
arXiv Detail & Related papers (2024-06-24T01:37:18Z) - Blessing of Class Diversity in Pre-training [54.335530406959435]
We prove that when the classes of the pre-training task are sufficiently diverse, pre-training can significantly improve the sample efficiency of downstream tasks.
Our proof relies on a vector-form Rademacher complexity chain rule for composite function classes and a modified self-concordance condition.
arXiv Detail & Related papers (2022-09-07T20:10:12Z) - Randomized Exploration for Reinforcement Learning with General Value
Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm.
Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises.
We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z) - Learning to extrapolate using continued fractions: Predicting the
critical temperature of superconductor materials [5.905364646955811]
In the field of Artificial Intelligence (AI) and Machine Learning (ML), the approximation of unknown target functions $y=f(mathbfx)$ is a common objective.
We refer to $S$ as the training set and aim to identify a low-complexity mathematical model that can effectively approximate this target function for new instances $mathbfx$.
arXiv Detail & Related papers (2020-11-27T04:57:40Z) - For self-supervised learning, Rationality implies generalization,
provably [13.526562756159809]
We prove a new upper bound on the generalization gap of classifiers obtained by first using self-supervision.
We show that our bound is non-vacuous for many popular representation-learning based classifiers on CIFAR-10 and ImageNet.
arXiv Detail & Related papers (2020-10-16T17:07:52Z) - Breaking the Sample Size Barrier in Model-Based Reinforcement Learning
with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator)
We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$.
We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z) - Learning Near Optimal Policies with Low Inherent Bellman Error [115.16037976819331]
We study the exploration problem with approximate linear action-value functions in episodic reinforcement learning.
We show that exploration is possible using only emphbatch assumptions with an algorithm that achieves the optimal statistical rate for the setting we consider.
arXiv Detail & Related papers (2020-02-29T02:02:40Z) - Weighted Empirical Risk Minimization: Sample Selection Bias Correction
based on Importance Sampling [2.599882743586164]
We consider statistical learning problems when the distribution $P'$ of the training observations $Z'_i$ differs from the distribution $P'$ involved in the risk one seeks to minimize.
We show that, in various situations frequently encountered in practice, it takes a simple form and can be directly estimated from the $Phi(z)$.
We then prove that the capacity generalization of the approach aforementioned is preserved when plugging the resulting estimates of the $Phi(Z'_i)$'s into the weighted empirical risk.
arXiv Detail & Related papers (2020-02-12T18:42:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.