TraDE: Transformers for Density Estimation
- URL: http://arxiv.org/abs/2004.02441v2
- Date: Wed, 14 Oct 2020 20:22:00 GMT
- Title: TraDE: Transformers for Density Estimation
- Authors: Rasool Fakoor, Pratik Chaudhari, Jonas Mueller, Alexander J. Smola
- Abstract summary: TraDE is a self-attention-based architecture for auto-regressive density estimation.
We present a suite of tasks such as regression using generated samples, out-of-distribution detection, and robustness to noise in the training data.
- Score: 101.20137732920718
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present TraDE, a self-attention-based architecture for auto-regressive
density estimation with continuous and discrete valued data. Our model is
trained using a penalized maximum likelihood objective, which ensures that
samples from the density estimate resemble the training data distribution. The
use of self-attention means that the model need not retain conditional
sufficient statistics during the auto-regressive process beyond what is needed
for each covariate. On standard tabular and image data benchmarks, TraDE
produces significantly better density estimates than existing approaches such
as normalizing flow estimators and recurrent auto-regressive models. However
log-likelihood on held-out data only partially reflects how useful these
estimates are in real-world applications. In order to systematically evaluate
density estimators, we present a suite of tasks such as regression using
generated samples, out-of-distribution detection, and robustness to noise in
the training data and demonstrate that TraDE works well in these scenarios.
Related papers
- Latent Space Score-based Diffusion Model for Probabilistic Multivariate Time Series Imputation [6.9295879301090535]
We propose the Latent Space Score-Based Diffusion Model (LSSDM) for probabilistic time series imputation.
LSSDM achieves superior imputation performance while also providing a better explanation and uncertainty analysis of the imputation mechanism.
arXiv Detail & Related papers (2024-09-13T15:32:26Z) - Double Debiased Covariate Shift Adaptation Robust to Density-Ratio Estimation [7.8856737627153874]
We propose a doubly robust estimator for covariate shift adaptation via importance weighting.
Our estimator reduces the bias arising from the density ratio estimation errors.
Notably, our estimator remains consistent if either the density ratio estimator or the regression function is consistent.
arXiv Detail & Related papers (2023-10-25T13:38:29Z) - MissDiff: Training Diffusion Models on Tabular Data with Missing Values [29.894691645801597]
This work presents a unified and principled diffusion-based framework for learning from data with missing values.
We first observe that the widely adopted "impute-then-generate" pipeline may lead to a biased learning objective.
We prove the proposed method is consistent in learning the score of data distributions, and the proposed training objective serves as an upper bound for the negative likelihood in certain cases.
arXiv Detail & Related papers (2023-07-02T03:49:47Z) - Learning to be a Statistician: Learned Estimator for Number of Distinct
Values [54.629042119819744]
Estimating the number of distinct values (NDV) in a column is useful for many tasks in database systems.
In this work, we focus on how to derive accurate NDV estimations from random (online/offline) samples.
We propose to formulate the NDV estimation task in a supervised learning framework, and aim to learn a model as the estimator.
arXiv Detail & Related papers (2022-02-06T15:42:04Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Hierarchical VAEs Know What They Don't Know [6.649455007186671]
We develop a fast, scalable and fully unsupervised likelihood-ratio score for OOD detection.
We achieve state-of-the-art results on out-of-distribution detection.
arXiv Detail & Related papers (2021-02-16T16:08:04Z) - Improving Nonparametric Density Estimation with Tensor Decompositions [14.917420021212912]
Nonparametric density estimators often perform well on low dimensional data, but suffer when applied to higher dimensional data.
This paper investigates whether these improvements can be extended to other simplified dependence assumptions.
We prove that restricting estimation to low-rank nonnegative PARAFAC or Tucker decompositions removes the dimensionality exponent on bin width rates for multidimensional histograms.
arXiv Detail & Related papers (2020-10-06T01:39:09Z) - Improving Maximum Likelihood Training for Text Generation with Density
Ratio Estimation [51.091890311312085]
We propose a new training scheme for auto-regressive sequence generative models, which is effective and stable when operating at large sample space encountered in text generation.
Our method stably outperforms Maximum Likelihood Estimation and other state-of-the-art sequence generative models in terms of both quality and diversity.
arXiv Detail & Related papers (2020-07-12T15:31:24Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.