A Complete Decomposition of KL Error using Refined Information and Mode Interaction Selection
- URL: http://arxiv.org/abs/2410.11964v1
- Date: Tue, 15 Oct 2024 18:08:32 GMT
- Title: A Complete Decomposition of KL Error using Refined Information and Mode Interaction Selection
- Authors: James Enouen, Mahito Sugiyama,
- Abstract summary: We revisit the classical formulation of the log-linear model with a focus on higher-order mode interactions.
We find that our learned distributions are able to more efficiently use the finite amount of data which is available in practice.
- Score: 11.994525728378603
- License:
- Abstract: The log-linear model has received a significant amount of theoretical attention in previous decades and remains the fundamental tool used for learning probability distributions over discrete variables. Despite its large popularity in statistical mechanics and high-dimensional statistics, the vast majority of such energy-based modeling approaches only focus on the two-variable relationships, such as Boltzmann machines and Markov graphical models. Although these approaches have easier-to-solve structure learning problems and easier-to-optimize parametric distributions, they often ignore the rich structure which exists in the higher-order interactions between different variables. Using more recent tools from the field of information geometry, we revisit the classical formulation of the log-linear model with a focus on higher-order mode interactions, going beyond the 1-body modes of independent distributions and the 2-body modes of Boltzmann distributions. This perspective allows us to define a complete decomposition of the KL error. This then motivates the formulation of a sparse selection problem over the set of possible mode interactions. In the same way as sparse graph selection allows for better generalization, we find that our learned distributions are able to more efficiently use the finite amount of data which is available in practice. On both synthetic and real-world datasets, we demonstrate our algorithm's effectiveness in maximizing the log-likelihood for the generative task and also the ease of adaptability to the discriminative task of classification.
Related papers
- Learning Divergence Fields for Shift-Robust Graph Representations [73.11818515795761]
In this work, we propose a geometric diffusion model with learnable divergence fields for the challenging problem with interdependent data.
We derive a new learning objective through causal inference, which can guide the model to learn generalizable patterns of interdependence that are insensitive across domains.
arXiv Detail & Related papers (2024-06-07T14:29:21Z) - Debiasing Multimodal Models via Causal Information Minimization [65.23982806840182]
We study bias arising from confounders in a causal graph for multimodal data.
Robust predictive features contain diverse information that helps a model generalize to out-of-distribution data.
We use these features as confounder representations and use them via methods motivated by causal theory to remove bias from models.
arXiv Detail & Related papers (2023-11-28T16:46:14Z) - Distribution learning via neural differential equations: a nonparametric
statistical perspective [1.4436965372953483]
This work establishes the first general statistical convergence analysis for distribution learning via ODE models trained through likelihood transformations.
We show that the latter can be quantified via the $C1$-metric entropy of the class $mathcal F$.
We then apply this general framework to the setting of $Ck$-smooth target densities, and establish nearly minimax-optimal convergence rates for two relevant velocity field classes $mathcal F$: $Ck$ functions and neural networks.
arXiv Detail & Related papers (2023-09-03T00:21:37Z) - Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints.
The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution.
We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z) - ER: Equivariance Regularizer for Knowledge Graph Completion [107.51609402963072]
We propose a new regularizer, namely, Equivariance Regularizer (ER)
ER can enhance the generalization ability of the model by employing the semantic equivariance between the head and tail entities.
The experimental results indicate a clear and substantial improvement over the state-of-the-art relation prediction methods.
arXiv Detail & Related papers (2022-06-24T08:18:05Z) - Graph-LDA: Graph Structure Priors to Improve the Accuracy in Few-Shot
Classification [6.037383467521294]
We introduce a generic model where observed class signals are supposed to be deteriorated with two sources of noise.
We derive an optimal methodology to classify such signals.
This methodology includes a single parameter, making it particularly suitable for cases where available data is scarce.
arXiv Detail & Related papers (2021-08-23T21:55:45Z) - A Meta Learning Approach to Discerning Causal Graph Structure [1.52292571922932]
We explore the usage of meta-learning to derive the causal direction between variables by optimizing over a measure of distribution simplicity.
We incorporate a graph representation which includes latent variables and allows for more generalizability and graph structure expression.
Our model is able to learn causal direction indicators for complex graph structures despite effects of latent confounders.
arXiv Detail & Related papers (2021-06-06T22:44:44Z) - Removing Bias in Multi-modal Classifiers: Regularization by Maximizing
Functional Entropies [88.0813215220342]
Some modalities can more easily contribute to the classification results than others.
We develop a method based on the log-Sobolev inequality, which bounds the functional entropy with the functional-Fisher-information.
On the two challenging multi-modal datasets VQA-CPv2 and SocialIQ, we obtain state-of-the-art results while more uniformly exploiting the modalities.
arXiv Detail & Related papers (2020-10-21T07:40:33Z) - Variational Mixture of Normalizing Flows [0.0]
Deep generative models, such as generative adversarial networks autociteGAN, variational autoencoders autocitevaepaper, and their variants, have seen wide adoption for the task of modelling complex data distributions.
Normalizing flows have overcome this limitation by leveraging the change-of-suchs formula for probability density functions.
The present work overcomes this by using normalizing flows as components in a mixture model and devising an end-to-end training procedure for such a model.
arXiv Detail & Related papers (2020-09-01T17:20:08Z) - LowFER: Low-rank Bilinear Pooling for Link Prediction [4.110108749051657]
We propose a factorized bilinear pooling model, commonly used in multi-modal learning, for better fusion of entities and relations.
Our model naturally generalizes decomposition Tucker based TuckER model, which has been shown to generalize other models.
We evaluate on real-world datasets, reaching on par or state-of-the-art performance.
arXiv Detail & Related papers (2020-08-25T07:33:52Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.