Reconciling the Discrete-Continuous Divide: Towards a Mathematical
Theory of Sparse Communication
- URL: http://arxiv.org/abs/2104.00755v1
- Date: Thu, 1 Apr 2021 20:31:13 GMT
- Title: Reconciling the Discrete-Continuous Divide: Towards a Mathematical
Theory of Sparse Communication
- Authors: Andr\'e F. T. Martins
- Abstract summary: We build rigorous theoretical foundations for discrete/continuous hybrids.
We introduce "mixed languages" as strings of hybrid symbols and a new mixed weighted finite state automaton.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural networks and other machine learning models compute continuous
representations, while humans communicate with discrete symbols. Reconciling
these two forms of communication is desirable to generate human-readable
interpretations or to learn discrete latent variable models, while maintaining
end-to-end differentiability. Some existing approaches (such as the
Gumbel-softmax transformation) build continuous relaxations that are discrete
approximations in the zero-temperature limit, while others (such as sparsemax
transformations and the hard concrete distribution) produce discrete/continuous
hybrids. In this paper, we build rigorous theoretical foundations for these
hybrids. Our starting point is a new "direct sum" base measure defined on the
face lattice of the probability simplex. From this measure, we introduce a new
entropy function that includes the discrete and differential entropies as
particular cases, and has an interpretation in terms of code optimality, as
well as two other information-theoretic counterparts that generalize the mutual
information and Kullback-Leibler divergences. Finally, we introduce "mixed
languages" as strings of hybrid symbols and a new mixed weighted finite state
automaton that recognizes a class of regular mixed languages, generalizing
closure properties of regular languages.
Related papers
- DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained
Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states.
We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs.
Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z) - On the Convergence of the ELBO to Entropy Sums [3.345575993695074]
We show that the variational lower bound is at all stationary points of learning equal to a sum of entropies.
For a very large class of generative models, the variational lower bound is at all stationary points of learning.
arXiv Detail & Related papers (2022-09-07T11:33:32Z) - Information Theory with Kernel Methods [0.0]
We show that the von Neumann entropy and relative entropy of these operators are intimately related to the usual notions of Shannon entropy and relative entropy.
They come together with efficient estimation algorithms from various oracles on the probability distributions.
arXiv Detail & Related papers (2022-02-17T09:42:29Z) - Generation of data on discontinuous manifolds via continuous stochastic
non-invertible networks [6.201770337181472]
We show how to generate discontinuous distributions using continuous networks.
We derive a link between the cost functions and the information-theoretic formulation.
We apply our approach to synthetic 2D distributions to demonstrate both reconstruction and generation of discontinuous distributions.
arXiv Detail & Related papers (2021-12-17T17:39:59Z) - Sparse Communication via Mixed Distributions [29.170302047339174]
We build theoretical foundations for "mixed random variables"
Our framework suggests two strategies for representing and sampling mixed random variables.
We experiment with both approaches on an emergent communication benchmark.
arXiv Detail & Related papers (2021-08-05T14:49:03Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - R\'enyi divergence inequalities via interpolation, with applications to
generalised entropic uncertainty relations [91.3755431537592]
We investigate quantum R'enyi entropic quantities, specifically those derived from'sandwiched' divergence.
We present R'enyi mutual information decomposition rules, a new approach to the R'enyi conditional entropy tripartite chain rules and a more general bipartite comparison.
arXiv Detail & Related papers (2021-06-19T04:06:23Z) - The Connection between Discrete- and Continuous-Time Descriptions of
Gaussian Continuous Processes [60.35125735474386]
We show that discretizations yielding consistent estimators have the property of invariance under coarse-graining'
This result explains why combining differencing schemes for derivatives reconstruction and local-in-time inference approaches does not work for time series analysis of second or higher order differential equations.
arXiv Detail & Related papers (2021-01-16T17:11:02Z) - Mixture Representation Learning with Coupled Autoencoders [1.589915930948668]
We propose an unsupervised variational framework using multiple interacting networks called cpl-mixVAE.
In this framework, the mixture representation of each network is regularized by imposing a consensus constraint on the discrete factor.
We use the proposed method to jointly uncover discrete and continuous factors of variability describing gene expression in a single-cell transcriptomic dataset.
arXiv Detail & Related papers (2020-07-20T04:12:04Z) - Generalized Entropy Regularization or: There's Nothing Special about
Label Smoothing [83.78668073898001]
We introduce a family of entropy regularizers, which includes label smoothing as a special case.
We find that variance in model performance can be explained largely by the resulting entropy of the model.
We advise the use of other entropy regularization methods in its place.
arXiv Detail & Related papers (2020-05-02T12:46:28Z) - Discrete Variational Attention Models for Language Generation [51.88612022940496]
We propose a discrete variational attention model with categorical distribution over the attention mechanism owing to the discrete nature in languages.
Thanks to the property of discreteness, the training of our proposed approach does not suffer from posterior collapse.
arXiv Detail & Related papers (2020-04-21T05:49:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.