Sparse Communication via Mixed Distributions
- URL: http://arxiv.org/abs/2108.02658v1
- Date: Thu, 5 Aug 2021 14:49:03 GMT
- Title: Sparse Communication via Mixed Distributions
- Authors: Ant\'onio Farinhas and Wilker Aziz and Vlad Niculae and Andr\'e F. T.
Martins
- Abstract summary: We build theoretical foundations for "mixed random variables"
Our framework suggests two strategies for representing and sampling mixed random variables.
We experiment with both approaches on an emergent communication benchmark.
- Score: 29.170302047339174
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural networks and other machine learning models compute continuous
representations, while humans communicate mostly through discrete symbols.
Reconciling these two forms of communication is desirable for generating
human-readable interpretations or learning discrete latent variable models,
while maintaining end-to-end differentiability. Some existing approaches (such
as the Gumbel-Softmax transformation) build continuous relaxations that are
discrete approximations in the zero-temperature limit, while others (such as
sparsemax transformations and the Hard Concrete distribution) produce
discrete/continuous hybrids. In this paper, we build rigorous theoretical
foundations for these hybrids, which we call "mixed random variables." Our
starting point is a new "direct sum" base measure defined on the face lattice
of the probability simplex. From this measure, we introduce new entropy and
Kullback-Leibler divergence functions that subsume the discrete and
differential cases and have interpretations in terms of code optimality. Our
framework suggests two strategies for representing and sampling mixed random
variables, an extrinsic ("sample-and-project") and an intrinsic one (based on
face stratification). We experiment with both approaches on an emergent
communication benchmark and on modeling MNIST and Fashion-MNIST data with
variational auto-encoders with mixed latent variables.
Related papers
- Discrete Flow Matching [74.04153927689313]
We present a novel discrete flow paradigm designed specifically for generating discrete data.
Our approach is capable of generating high-quality discrete data in a non-autoregressive fashion.
arXiv Detail & Related papers (2024-07-22T12:33:27Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Bayesian Flow Networks [4.585102332532472]
This paper introduces Bayesian Flow Networks (BFNs), a new class of generative model in which the parameters of a set of independent distributions are modified with Bayesian inference.
Starting from a simple prior and iteratively updating the two distributions yields a generative procedure similar to the reverse process of diffusion models.
BFNs achieve competitive log-likelihoods for image modelling on dynamically binarized MNIST and CIFAR-10, and outperform all known discrete diffusion models on the text8 character-level language modelling task.
arXiv Detail & Related papers (2023-08-14T09:56:35Z) - Robust scalable initialization for Bayesian variational inference with
multi-modal Laplace approximations [0.0]
Variational mixtures with full-covariance structures suffer from a quadratic growth due to variational parameters with the number of parameters.
We propose a method for constructing an initial Gaussian model approximation that can be used to warm-start variational inference.
arXiv Detail & Related papers (2023-07-12T19:30:04Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - A Geometric Perspective on Diffusion Models [57.27857591493788]
We inspect the ODE-based sampling of a popular variance-exploding SDE.
We establish a theoretical relationship between the optimal ODE-based sampling and the classic mean-shift (mode-seeking) algorithm.
arXiv Detail & Related papers (2023-05-31T15:33:16Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Can Push-forward Generative Models Fit Multimodal Distributions? [3.8615905456206256]
We show that the Lipschitz constant of generative networks has to be large in order to fit multimodal distributions.
We validate our findings on one-dimensional and image datasets and empirically show that generative models consisting of stacked networks with input at each step do not suffer of such limitations.
arXiv Detail & Related papers (2022-06-29T09:03:30Z) - Equivariance Discovery by Learned Parameter-Sharing [153.41877129746223]
We study how to discover interpretable equivariances from data.
Specifically, we formulate this discovery process as an optimization problem over a model's parameter-sharing schemes.
Also, we theoretically analyze the method for Gaussian data and provide a bound on the mean squared gap between the studied discovery scheme and the oracle scheme.
arXiv Detail & Related papers (2022-04-07T17:59:19Z) - Reconciling the Discrete-Continuous Divide: Towards a Mathematical
Theory of Sparse Communication [0.0]
We build rigorous theoretical foundations for discrete/continuous hybrids.
We introduce "mixed languages" as strings of hybrid symbols and a new mixed weighted finite state automaton.
arXiv Detail & Related papers (2021-04-01T20:31:13Z) - Variational Mixture of Normalizing Flows [0.0]
Deep generative models, such as generative adversarial networks autociteGAN, variational autoencoders autocitevaepaper, and their variants, have seen wide adoption for the task of modelling complex data distributions.
Normalizing flows have overcome this limitation by leveraging the change-of-suchs formula for probability density functions.
The present work overcomes this by using normalizing flows as components in a mixture model and devising an end-to-end training procedure for such a model.
arXiv Detail & Related papers (2020-09-01T17:20:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.