Masking schemes for universal marginalisers
- URL: http://arxiv.org/abs/2001.05895v1
- Date: Thu, 16 Jan 2020 15:35:06 GMT
- Title: Masking schemes for universal marginalisers
- Authors: Divya Gautam, Maria Lomeli, Kostis Gourgoulias, Daniel H. Thompson,
Saurabh Johri
- Abstract summary: We consider the effect of structure-agnostic and structure-dependent masking schemes when training a universal marginaliser.
We compare networks trained with different masking schemes in terms of their predictive performance and generalisation properties.
- Score: 1.0412114420493723
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the effect of structure-agnostic and structure-dependent masking
schemes when training a universal marginaliser (arXiv:1711.00695) in order to
learn conditional distributions of the form $P(x_i |\mathbf x_{\mathbf b})$,
where $x_i$ is a given random variable and $\mathbf x_{\mathbf b}$ is some
arbitrary subset of all random variables of the generative model of interest.
In other words, we mimic the self-supervised training of a denoising
autoencoder, where a dataset of unlabelled data is used as partially observed
input and the neural approximator is optimised to minimise reconstruction loss.
We focus on studying the underlying process of the partially observed
data---how good is the neural approximator at learning all conditional
distributions when the observation process at prediction time differs from the
masking process during training? We compare networks trained with different
masking schemes in terms of their predictive performance and generalisation
properties.
Related papers
- Mitigating covariate shift in non-colocated data with learned parameter priors [0.0]
We present textitFragmentation-induced co-shift remediation ($FIcsR$), which minimizes an $f$-divergence between a fragment's covariate distribution and that of the standard cross-validation baseline.
We run extensive classification experiments on multiple data classes, over $40$ datasets, and with data batched over multiple sequence lengths.
The results are promising under all these conditions; with improved accuracy against batch and fold state-of-the-art by more than $5%$ and $10%$, respectively.
arXiv Detail & Related papers (2024-11-10T15:48:29Z) - Amortizing intractable inference in diffusion models for vision, language, and control [89.65631572949702]
This paper studies amortized sampling of the posterior over data, $mathbfxsim prm post(mathbfx)propto p(mathbfx)r(mathbfx)$, in a model that consists of a diffusion generative model prior $p(mathbfx)$ and a black-box constraint or function $r(mathbfx)$.
We prove the correctness of a data-free learning objective, relative trajectory balance, for training a diffusion model that samples from
arXiv Detail & Related papers (2024-05-31T16:18:46Z) - Prediction with Incomplete Data under Agnostic Mask Distribution Shift [35.86200694774949]
We consider prediction with incomplete data in the presence of distribution shift.
We leverage the observation that for each mask, there is an invariant optimal predictor.
We propose a novel prediction method called StableMiss.
arXiv Detail & Related papers (2023-05-18T14:06:06Z) - DenseHybrid: Hybrid Anomaly Detection for Dense Open-set Recognition [1.278093617645299]
Anomaly detection can be conceived either through generative modelling of regular training data or by discriminating with respect to negative training data.
This paper presents a novel hybrid anomaly score which allows dense open-set recognition on large natural images.
Experiments evaluate our contributions on standard dense anomaly detection benchmarks as well as in terms of open-mIoU - a novel metric for dense open-set performance.
arXiv Detail & Related papers (2022-07-06T11:48:50Z) - CARD: Classification and Regression Diffusion Models [51.0421331214229]
We introduce classification and regression diffusion (CARD) models, which combine a conditional generative model and a pre-trained conditional mean estimator.
We demonstrate the outstanding ability of CARD in conditional distribution prediction with both toy examples and real-world datasets.
arXiv Detail & Related papers (2022-06-15T03:30:38Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Learning from Incomplete Features by Simultaneous Training of Neural
Networks and Sparse Coding [24.3769047873156]
This paper addresses the problem of training a classifier on a dataset with incomplete features.
We assume that different subsets of features (random or structured) are available at each data instance.
A new supervised learning method is developed to train a general classifier, using only a subset of features per sample.
arXiv Detail & Related papers (2020-11-28T02:20:39Z) - Network Classifiers Based on Social Learning [71.86764107527812]
We propose a new way of combining independently trained classifiers over space and time.
The proposed architecture is able to improve prediction performance over time with unlabeled data.
We show that this strategy results in consistent learning with high probability, and it yields a robust structure against poorly trained classifiers.
arXiv Detail & Related papers (2020-10-23T11:18:20Z) - Real-Time Regression with Dividing Local Gaussian Processes [62.01822866877782]
Local Gaussian processes are a novel, computationally efficient modeling approach based on Gaussian process regression.
Due to an iterative, data-driven division of the input space, they achieve a sublinear computational complexity in the total number of training points in practice.
A numerical evaluation on real-world data sets shows their advantages over other state-of-the-art methods in terms of accuracy as well as prediction and update speed.
arXiv Detail & Related papers (2020-06-16T18:43:31Z) - On the Preservation of Spatio-temporal Information in Machine Learning
Applications [0.0]
In machine learning applications, each data attribute is assumed to be of others.
Shift vectors-in $k$means is proposed in a novel framework with the help of sparse representations.
Experiments suggest that feature extraction as a simulation of shallow neural networks provides a little better performance than Gaboral dictionary learning.
arXiv Detail & Related papers (2020-06-15T12:22:36Z) - Neural Bayes: A Generic Parameterization Method for Unsupervised
Representation Learning [175.34232468746245]
We introduce a parameterization method called Neural Bayes.
It allows computing statistical quantities that are in general difficult to compute.
We show two independent use cases for this parameterization.
arXiv Detail & Related papers (2020-02-20T22:28:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.