Generating Diverse Translation from Model Distribution with Dropout
- URL: http://arxiv.org/abs/2010.08178v1
- Date: Fri, 16 Oct 2020 05:50:00 GMT
- Title: Generating Diverse Translation from Model Distribution with Dropout
- Authors: Xuanfu Wu, Yang Feng, Chenze Shao
- Abstract summary: We propose to generate diverse translations by deriving a large number of possible models with Bayesian modelling and sampling models from them for inference.
We conducted experiments on Chinese-English and English-German translation tasks and the results show that our method makes a better trade-off between diversity and accuracy.
- Score: 13.223158914896727
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the improvement of translation quality, neural machine translation
(NMT) often suffers from the lack of diversity in its generation. In this
paper, we propose to generate diverse translations by deriving a large number
of possible models with Bayesian modelling and sampling models from them for
inference. The possible models are obtained by applying concrete dropout to the
NMT model and each of them has specific confidence for its prediction, which
corresponds to a posterior model distribution under specific training data in
the principle of Bayesian modeling. With variational inference, the posterior
model distribution can be approximated with a variational distribution, from
which the final models for inference are sampled. We conducted experiments on
Chinese-English and English-German translation tasks and the results shows that
our method makes a better trade-off between diversity and accuracy.
Related papers
- Generative Modeling with Bayesian Sample Inference [50.07758840675341]
We derive a novel generative model from the simple act of Gaussian posterior inference.
Treating the generated sample as an unknown variable to infer lets us formulate the sampling process in the language of Bayesian probability.
Our model uses a sequence of prediction and posterior update steps to narrow down the unknown sample from a broad initial belief.
arXiv Detail & Related papers (2025-02-11T14:27:10Z) - Local Bayesian Dirichlet mixing of imperfect models [0.0]
We study the ability of Bayesian model averaging and mixing techniques to mine nuclear masses.
We show that the global and local mixtures of models reach excellent performance on both prediction accuracy and uncertainty quantification.
arXiv Detail & Related papers (2023-11-02T21:02:40Z) - Diffusion models for probabilistic programming [56.47577824219207]
Diffusion Model Variational Inference (DMVI) is a novel method for automated approximate inference in probabilistic programming languages (PPLs)
DMVI is easy to implement, allows hassle-free inference in PPLs without the drawbacks of, e.g., variational inference using normalizing flows, and does not make any constraints on the underlying neural network model.
arXiv Detail & Related papers (2023-11-01T12:17:05Z) - Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method.
We develop practical bounds to apply it to language generation.
We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z) - Bayesian Additive Main Effects and Multiplicative Interaction Models
using Tensor Regression for Multi-environmental Trials [0.0]
We propose a Bayesian tensor regression model to accommodate the effect of multiple factors on phenotype prediction.
We adopt a set of prior distributions that resolve identifiability issues that may arise between the parameters in the model.
We explore the applicability of our model by analysing real-world data related to wheat production across Ireland from 2010 to 2019.
arXiv Detail & Related papers (2023-01-09T19:54:50Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Decision-Making with Auto-Encoding Variational Bayes [71.44735417472043]
We show that a posterior approximation distinct from the variational distribution should be used for making decisions.
Motivated by these theoretical results, we propose learning several approximate proposals for the best model.
In addition to toy examples, we present a full-fledged case study of single-cell RNA sequencing.
arXiv Detail & Related papers (2020-02-17T19:23:36Z) - Parameter Space Factorization for Zero-Shot Learning across Tasks and
Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters.
We infer the posteriors over such latent variables based on data from seen task-language combinations.
Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.