Kernel Deformed Exponential Families for Sparse Continuous Attention
- URL: http://arxiv.org/abs/2111.01222v1
- Date: Mon, 1 Nov 2021 19:21:22 GMT
- Title: Kernel Deformed Exponential Families for Sparse Continuous Attention
- Authors: Alexander Moreno, Supriya Nagesh, Zhenke Wu, Walter Dempsey, James M.
Rehg
- Abstract summary: We show existence results for kernel exponential and deformed exponential families.
Experiments show that kernel deformed exponential families can attend to multiple compact regions of the data domain.
- Score: 76.61129971916702
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Attention mechanisms take an expectation of a data representation with
respect to probability weights. This creates summary statistics that focus on
important features. Recently, (Martins et al. 2020, 2021) proposed continuous
attention mechanisms, focusing on unimodal attention densities from the
exponential and deformed exponential families: the latter has sparse support.
(Farinhas et al. 2021) extended this to use Gaussian mixture attention
densities, which are a flexible class with dense support. In this paper, we
extend this to two general flexible classes: kernel exponential families and
our new sparse counterpart kernel deformed exponential families. Theoretically,
we show new existence results for both kernel exponential and deformed
exponential families, and that the deformed case has similar approximation
capabilities to kernel exponential families. Experiments show that kernel
deformed exponential families can attend to multiple compact regions of the
data domain.
Related papers
- Inferring Kernel $ε$-Machines: Discovering Structure in Complex Systems [49.1574468325115]
We introduce causal diffusion components that encode the kernel causal-state estimates as a set of coordinates in a reduced dimension space.
We show how each component extracts predictive features from data and demonstrate their application on four examples.
arXiv Detail & Related papers (2024-10-01T21:14:06Z) - Clustering above Exponential Families with Tempered Exponential Measures [28.532545355403123]
Link with exponential families has allowed $k$-means clustering to be generalized to a wide variety of data generating distributions.
Getting the framework to work above exponential families is important to lift roadblocks like the lack of robustness of some population minimizers carved in their axiomatization.
arXiv Detail & Related papers (2022-11-04T21:58:40Z) - Hida-Mat\'ern Kernel [8.594140167290098]
We present the class of Hida-Mat'ern kernels, which is the canonical family of covariance functions over the entire space of stationary Gauss-Markov Processes.
We show how to represent such processes as state space models using only the kernel and its derivatives.
We also show how exploiting special properties of the state space representation enables improved numerical stability in addition to further reductions of computational complexity.
arXiv Detail & Related papers (2021-07-15T03:25:10Z) - Memory kernel and divisibility of Gaussian Collisional Models [0.0]
Memory effects in the dynamics of open systems have been the subject of significant interest in the last decades.
We analyze two types of interactions, a beam-splitter implementing a partial SWAP and a two-mode squeezing, which entangles the ancillas and feeds excitations into the system.
By analyzing the memory kernel and divisibility for these two representative scenarios, our results help to shed light on the intricate mechanisms behind memory effects in the quantum domain.
arXiv Detail & Related papers (2020-08-03T10:28:55Z) - Sparse and Continuous Attention Mechanisms [14.941013982958209]
We introduce continuous-domain attention mechanisms, deriving efficient gradient backpropagation algorithms for alpha in 1,2.
Experiments on attention-based text classification, machine translation, and visual question answering illustrate the use of continuous attention in 1D and 2D.
arXiv Detail & Related papers (2020-06-12T14:16:48Z) - Bayesian Sparse Factor Analysis with Kernelized Observations [67.60224656603823]
Multi-view problems can be faced with latent variable models.
High-dimensionality and non-linear issues are traditionally handled by kernel methods.
We propose merging both approaches into single model.
arXiv Detail & Related papers (2020-06-01T14:25:38Z) - Cumulant-free closed-form formulas for some common (dis)similarities
between densities of an exponential family [38.13659821903422]
In this work, we report (dis)similarity formulas which bypass the explicit use of the cumulant function.
Our method requires only to partially factorize the densities canonically of the considered exponential family.
arXiv Detail & Related papers (2020-03-05T07:46:22Z) - Semiparametric Nonlinear Bipartite Graph Representation Learning with
Provable Guarantees [106.91654068632882]
We consider the bipartite graph and formalize its representation learning problem as a statistical estimation problem of parameters in a semiparametric exponential family distribution.
We show that the proposed objective is strongly convex in a neighborhood around the ground truth, so that a gradient descent-based method achieves linear convergence rate.
Our estimator is robust to any model misspecification within the exponential family, which is validated in extensive experiments.
arXiv Detail & Related papers (2020-03-02T16:40:36Z) - Bayesian Deep Learning and a Probabilistic Perspective of Generalization [56.69671152009899]
We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization.
We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
arXiv Detail & Related papers (2020-02-20T15:13:27Z) - Block-Approximated Exponential Random Graphs [77.4792558024487]
An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs.
We propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) distributions.
Our methods are scalable to sparse graphs consisting of millions of nodes.
arXiv Detail & Related papers (2020-02-14T11:42:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.