Learning the joint distribution of two sequences using little or no
paired data
- URL: http://arxiv.org/abs/2212.03232v1
- Date: Tue, 6 Dec 2022 18:56:15 GMT
- Title: Learning the joint distribution of two sequences using little or no
paired data
- Authors: Soroosh Mariooryad, Matt Shannon, Siyuan Ma, Tom Bagby, David Kao,
Daisy Stanton, Eric Battenberg, RJ Skerry-Ryan
- Abstract summary: We present a noisy channel generative model of two sequences, for example text and speech.
We show that even tiny amount of paired data is sufficient to learn to relate the two modalities when a massive amount of unpaired data is available.
- Score: 16.189575655434844
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a noisy channel generative model of two sequences, for example
text and speech, which enables uncovering the association between the two
modalities when limited paired data is available. To address the intractability
of the exact model under a realistic data setup, we propose a variational
inference approximation. To train this variational model with categorical data,
we propose a KL encoder loss approach which has connections to the wake-sleep
algorithm. Identifying the joint or conditional distributions by only observing
unpaired samples from the marginals is only possible under certain conditions
in the data distribution and we discuss under what type of conditional
independence assumptions that might be achieved, which guides the architecture
designs. Experimental results show that even tiny amount of paired data (5
minutes) is sufficient to learn to relate the two modalities (graphemes and
phonemes here) when a massive amount of unpaired data is available, paving the
path to adopting this principled approach for all seq2seq models in low data
resource regimes.
Related papers
- Sub-graph Based Diffusion Model for Link Prediction [43.15741675617231]
Denoising Diffusion Probabilistic Models (DDPMs) represent a contemporary class of generative models with exceptional qualities.
We build a novel generative model for link prediction using a dedicated design to decompose the likelihood estimation process via the Bayesian formula.
Our proposed method presents numerous advantages: (1) transferability across datasets without retraining, (2) promising generalization on limited training data, and (3) robustness against graph adversarial attacks.
arXiv Detail & Related papers (2024-09-13T02:23:55Z) - Constrained Diffusion Models via Dual Training [80.03953599062365]
Diffusion processes are prone to generating samples that reflect biases in a training dataset.
We develop constrained diffusion models by imposing diffusion constraints based on desired distributions.
We show that our constrained diffusion models generate new data from a mixture data distribution that achieves the optimal trade-off among objective and constraints.
arXiv Detail & Related papers (2024-08-27T14:25:42Z) - Dataset Condensation with Latent Quantile Matching [5.466962214217334]
Current distribution matching (DM) based DC methods learn a synthesized dataset by matching the mean of the latent embeddings between the synthetic and the real outliers.
We propose Latent Quantile Matching (LQM) which matches the quantiles of the latent embeddings to minimize the goodness of fit test statistic between two distributions.
arXiv Detail & Related papers (2024-06-14T09:20:44Z) - InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion [53.90516061351706]
We present InterHandGen, a novel framework that learns the generative prior of two-hand interaction.
For sampling, we combine anti-penetration and synthesis-free guidance to enable plausible generation.
Our method significantly outperforms baseline generative models in terms of plausibility and diversity.
arXiv Detail & Related papers (2024-03-26T06:35:55Z) - DAGnosis: Localized Identification of Data Inconsistencies using
Structures [73.39285449012255]
Identification and appropriate handling of inconsistencies in data at deployment time is crucial to reliably use machine learning models.
We use directed acyclic graphs (DAGs) to encode the training set's features probability distribution and independencies as a structure.
Our method, called DAGnosis, leverages these structural interactions to bring valuable and insightful data-centric conclusions.
arXiv Detail & Related papers (2024-02-26T11:29:16Z) - VertiBayes: Learning Bayesian network parameters from vertically partitioned data with missing values [2.9707233220536313]
Federated learning makes it possible to train a machine learning model on decentralized data.
We propose a novel method called VertiBayes to train Bayesian networks on vertically partitioned data.
We experimentally show our approach produces models comparable to those learnt using traditional algorithms.
arXiv Detail & Related papers (2022-10-31T11:13:35Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - One-Way Matching of Datasets with Low Rank Signals [4.582330307986793]
We show that linear assignment with projected data achieves fast rates of convergence and sometimes even minimax rate optimality for this task.
We illustrate practical use of the matching procedure on two single-cell data examples.
arXiv Detail & Related papers (2022-04-29T03:12:23Z) - MINIMALIST: Mutual INformatIon Maximization for Amortized Likelihood
Inference from Sampled Trajectories [61.3299263929289]
Simulation-based inference enables learning the parameters of a model even when its likelihood cannot be computed in practice.
One class of methods uses data simulated with different parameters to infer an amortized estimator for the likelihood-to-evidence ratio.
We show that this approach can be formulated in terms of mutual information between model parameters and simulated data.
arXiv Detail & Related papers (2021-06-03T12:59:16Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.