Masked prediction tasks: a parameter identifiability view
- URL: http://arxiv.org/abs/2202.09305v1
- Date: Fri, 18 Feb 2022 17:09:32 GMT
- Title: Masked prediction tasks: a parameter identifiability view
- Authors: Bingbin Liu, Daniel Hsu, Pradeep Ravikumar, Andrej Risteski
- Abstract summary: We focus on the widely used self-supervised learning method of predicting masked tokens.
We show that there is a rich landscape of possibilities, out of which some prediction tasks yield identifiability, while others do not.
- Score: 49.533046139235466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The vast majority of work in self-supervised learning, both theoretical and
empirical (though mostly the latter), have largely focused on recovering good
features for downstream tasks, with the definition of "good" often being
intricately tied to the downstream task itself. This lens is undoubtedly very
interesting, but suffers from the problem that there isn't a "canonical" set of
downstream tasks to focus on -- in practice, this problem is usually resolved
by competing on the benchmark dataset du jour.
In this paper, we present an alternative lens: one of parameter
identifiability. More precisely, we consider data coming from a parametric
probabilistic model, and train a self-supervised learning predictor with a
suitably chosen parametric form. Then, we ask whether we can read off the
ground truth parameters of the probabilistic model from the optimal predictor.
We focus on the widely used self-supervised learning method of predicting
masked tokens, which is popular for both natural languages and visual data.
While incarnations of this approach have already been successfully used for
simpler probabilistic models (e.g. learning fully-observed undirected graphical
models), we focus instead on latent-variable models capturing sequential
structures -- namely Hidden Markov Models with both discrete and conditionally
Gaussian observations. We show that there is a rich landscape of possibilities,
out of which some prediction tasks yield identifiability, while others do not.
Our results, borne of a theoretical grounding of self-supervised learning,
could thus potentially beneficially inform practice. Moreover, we uncover close
connections with uniqueness of tensor rank decompositions -- a widely used tool
in studying identifiability through the lens of the method of moments.
Related papers
- Estimating Causal Effects from Learned Causal Networks [56.14597641617531]
We propose an alternative paradigm for answering causal-effect queries over discrete observable variables.
We learn the causal Bayesian network and its confounding latent variables directly from the observational data.
We show that this emphmodel completion learning approach can be more effective than estimand approaches.
arXiv Detail & Related papers (2024-08-26T08:39:09Z) - The Trade-off between Universality and Label Efficiency of
Representations from Contrastive Learning [32.15608637930748]
We show that there exists a trade-off between the two desiderata so that one may not be able to achieve both simultaneously.
We provide analysis using a theoretical data model and show that, while more diverse pre-training data result in more diverse features for different tasks, it puts less emphasis on task-specific features.
arXiv Detail & Related papers (2023-02-28T22:14:33Z) - Certifying Fairness of Probabilistic Circuits [33.1089249944851]
We propose an algorithm to search for discrimination patterns in a general class of probabilistic models, namely probabilistic circuits.
We also introduce new classes of patterns such as minimal, maximal, and optimal patterns that can effectively summarize exponentially many discrimination patterns.
arXiv Detail & Related papers (2022-12-05T18:36:45Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Visual Recognition with Deep Learning from Biased Image Datasets [6.10183951877597]
We show how biasing models can be applied to remedy problems in the context of visual recognition.
Based on the (approximate) knowledge of the biasing mechanisms at work, our approach consists in reweighting the observations.
We propose to use a low dimensional image representation, shared across the image databases.
arXiv Detail & Related papers (2021-09-06T10:56:58Z) - A Low Rank Promoting Prior for Unsupervised Contrastive Learning [108.91406719395417]
We construct a novel probabilistic graphical model that effectively incorporates the low rank promoting prior into the framework of contrastive learning.
Our hypothesis explicitly requires that all the samples belonging to the same instance class lie on the same subspace with small dimension.
Empirical evidences show that the proposed algorithm clearly surpasses the state-of-the-art approaches on multiple benchmarks.
arXiv Detail & Related papers (2021-08-05T15:58:25Z) - Robust Out-of-Distribution Detection on Deep Probabilistic Generative
Models [0.06372261626436676]
Out-of-distribution (OOD) detection is an important task in machine learning systems.
Deep probabilistic generative models facilitate OOD detection by estimating the likelihood of a data sample.
We propose a new detection metric that operates without outlier exposure.
arXiv Detail & Related papers (2021-06-15T06:36:10Z) - Discriminative, Generative and Self-Supervised Approaches for
Target-Agnostic Learning [8.666667951130892]
generative and self-supervised learning models are shown to perform well at the task.
Our derived theorem for the pseudo-likelihood theory also shows that they are related for inferring a joint distribution model.
arXiv Detail & Related papers (2020-11-12T15:03:40Z) - Ambiguity in Sequential Data: Predicting Uncertain Futures with
Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data.
We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z) - Value-driven Hindsight Modelling [68.658900923595]
Value estimation is a critical component of the reinforcement learning (RL) paradigm.
Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function.
We develop an approach for representation learning in RL that sits in between these two extremes.
This provides tractable prediction targets that are directly relevant for a task, and can thus accelerate learning the value function.
arXiv Detail & Related papers (2020-02-19T18:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.