Learning Probabilistic Sentence Representations from Paraphrases
- URL: http://arxiv.org/abs/2005.08105v1
- Date: Sat, 16 May 2020 21:10:28 GMT
- Title: Learning Probabilistic Sentence Representations from Paraphrases
- Authors: Mingda Chen, Kevin Gimpel
- Abstract summary: We define probabilistic models that produce distributions for sentences.
We train our models on paraphrases and demonstrate that they naturally capture sentence specificity.
Our model captures sentential entailment and provides ways to analyze the specificity and preciseness of individual words.
- Score: 47.528336088976744
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Probabilistic word embeddings have shown effectiveness in capturing notions
of generality and entailment, but there is very little work on doing the
analogous type of investigation for sentences. In this paper we define
probabilistic models that produce distributions for sentences. Our
best-performing model treats each word as a linear transformation operator
applied to a multivariate Gaussian distribution. We train our models on
paraphrases and demonstrate that they naturally capture sentence specificity.
While our proposed model achieves the best performance overall, we also show
that specificity is represented by simpler architectures via the norm of the
sentence vectors. Qualitative analysis shows that our probabilistic model
captures sentential entailment and provides ways to analyze the specificity and
preciseness of individual words.
Related papers
- The Distributional Hypothesis Does Not Fully Explain the Benefits of
Masked Language Model Pretraining [27.144616560712493]
We investigate whether better sample efficiency and the better generalization capability of models pretrained with masked language modeling can be attributed to the semantic similarity encoded in the pretraining data's distributional property.
Our results illustrate our limited understanding of model pretraining and provide future research directions.
arXiv Detail & Related papers (2023-10-25T00:31:29Z) - A Heavy-Tailed Algebra for Probabilistic Programming [53.32246823168763]
We propose a systematic approach for analyzing the tails of random variables.
We show how this approach can be used during the static analysis (before drawing samples) pass of a probabilistic programming language compiler.
Our empirical results confirm that inference algorithms that leverage our heavy-tailed algebra attain superior performance across a number of density modeling and variational inference tasks.
arXiv Detail & Related papers (2023-06-15T16:37:36Z) - Learning and Predicting Multimodal Vehicle Action Distributions in a
Unified Probabilistic Model Without Labels [26.303522885475406]
We present a unified probabilistic model that learns a representative set of discrete vehicle actions and predicts the probability of each action given a particular scenario.
Our model also enables us to estimate the distribution over continuous trajectories conditioned on a scenario, representing what each discrete action would look like if executed in that scenario.
arXiv Detail & Related papers (2022-12-14T04:01:19Z) - Explainability as statistical inference [29.74336283497203]
We propose a general deep probabilistic model designed to produce interpretable predictions.
The model parameters can be learned via maximum likelihood, and the method can be adapted to any predictor network architecture.
We show experimentally that using multiple imputation provides more reasonable interpretations.
arXiv Detail & Related papers (2022-12-06T16:55:10Z) - Distributional Gradient Boosting Machines [77.34726150561087]
Our framework is based on XGBoost and LightGBM.
We show that our framework achieves state-of-the-art forecast accuracy.
arXiv Detail & Related papers (2022-04-02T06:32:19Z) - Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model.
We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z) - PSD Representations for Effective Probability Models [117.35298398434628]
We show that a recently proposed class of positive semi-definite (PSD) models for non-negative functions is particularly suited to this end.
We characterize both approximation and generalization capabilities of PSD models, showing that they enjoy strong theoretical guarantees.
Our results open the way to applications of PSD models to density estimation, decision theory and inference.
arXiv Detail & Related papers (2021-06-30T15:13:39Z) - Contextuality scenarios arising from networks of stochastic processes [68.8204255655161]
An empirical model is said contextual if its distributions cannot be obtained marginalizing a joint distribution over X.
We present a different and classical source of contextual empirical models: the interaction among many processes.
The statistical behavior of the network in the long run makes the empirical model generically contextual and even strongly contextual.
arXiv Detail & Related papers (2020-06-22T16:57:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.