Interaction Asymmetry: A General Principle for Learning Composable Abstractions
- URL: http://arxiv.org/abs/2411.07784v1
- Date: Tue, 12 Nov 2024 13:33:26 GMT
- Title: Interaction Asymmetry: A General Principle for Learning Composable Abstractions
- Authors: Jack Brady, Julius von Kügelgen, Sébastien Lachapelle, Simon Buchholz, Thomas Kipf, Wieland Brendel,
- Abstract summary: We show that interaction asymmetry enables both disentanglement and compositional generalization.
We propose an implementation of these criteria using a flexible Transformer-based VAE, with a novel regularizer on the attention weights of the decoder.
- Score: 27.749478197803256
- License:
- Abstract: Learning disentangled representations of concepts and re-composing them in unseen ways is crucial for generalizing to out-of-domain situations. However, the underlying properties of concepts that enable such disentanglement and compositional generalization remain poorly understood. In this work, we propose the principle of interaction asymmetry which states: "Parts of the same concept have more complex interactions than parts of different concepts". We formalize this via block diagonality conditions on the $(n+1)$th order derivatives of the generator mapping concepts to observed data, where different orders of "complexity" correspond to different $n$. Using this formalism, we prove that interaction asymmetry enables both disentanglement and compositional generalization. Our results unify recent theoretical results for learning concepts of objects, which we show are recovered as special cases with $n\!=\!0$ or $1$. We provide results for up to $n\!=\!2$, thus extending these prior works to more flexible generator functions, and conjecture that the same proof strategies generalize to larger $n$. Practically, our theory suggests that, to disentangle concepts, an autoencoder should penalize its latent capacity and the interactions between concepts during decoding. We propose an implementation of these criteria using a flexible Transformer-based VAE, with a novel regularizer on the attention weights of the decoder. On synthetic image datasets consisting of objects, we provide evidence that this model can achieve comparable object disentanglement to existing models that use more explicit object-centric priors.
Related papers
- Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers [54.20763128054692]
We study how a two-attention-layer transformer is trained to perform ICL on $n$-gram Markov chain data.
We prove that the gradient flow with respect to a cross-entropy ICL loss converges to a limiting model.
arXiv Detail & Related papers (2024-09-09T18:10:26Z) - Tempered Calculus for ML: Application to Hyperbolic Model Embedding [70.61101116794549]
Most mathematical distortions used in ML are fundamentally integral in nature.
In this paper, we unveil a grounded theory and tools which can help improve these distortions to better cope with ML requirements.
We show how to apply it to a problem that has recently gained traction in ML: hyperbolic embeddings with a "cheap" and accurate encoding along the hyperbolic vsean scale.
arXiv Detail & Related papers (2024-02-06T17:21:06Z) - Object-centric architectures enable efficient causal representation
learning [51.6196391784561]
We show that when the observations are of multiple objects, the generative function is no longer injective and disentanglement fails in practice.
We develop an object-centric architecture that leverages weak supervision from sparse perturbations to disentangle each object's properties.
This approach is more data-efficient in the sense that it requires significantly fewer perturbations than a comparable approach that encodes to a Euclidean space.
arXiv Detail & Related papers (2023-10-29T16:01:03Z) - A Recursive Bateson-Inspired Model for the Generation of Semantic Formal
Concepts from Spatial Sensory Data [77.34726150561087]
This paper presents a new symbolic-only method for the generation of hierarchical concept structures from complex sensory data.
The approach is based on Bateson's notion of difference as the key to the genesis of an idea or a concept.
The model is able to produce fairly rich yet human-readable conceptual representations without training.
arXiv Detail & Related papers (2023-07-16T15:59:13Z) - Refining and relating fundamentals of functional theory [0.0]
We explain why there exist six equivalent universal functionals, prove concise relations among them and conclude that the important notion of $v$-representability is relative to the scope and choice of variable.
For systems with time-reversal symmetry, we explain why there exist six equivalent universal functionals, prove concise relations among them and conclude that the important notion of $v$-representability is relative to the scope and choice of variable.
arXiv Detail & Related papers (2023-01-24T18:09:47Z) - DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained
Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states.
We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs.
Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z) - On the Complexity of Bayesian Generalization [141.21610899086392]
We consider concept generalization at a large scale in the diverse and natural visual spectrum.
We study two modes when the problem space scales up, and the $complexity$ of concepts becomes diverse.
arXiv Detail & Related papers (2022-11-20T17:21:37Z) - Foundation of one-particle reduced density matrix functional theory for
excited states [0.0]
A reduced density matrix functional theory (RDMFT) has been proposed for calculating energies of selected eigenstates of interacting many-fermion systems.
Here, we develop a solid foundation for this so-called $boldsymbolw$-RDMFT and present the details of various derivations.
arXiv Detail & Related papers (2021-06-07T19:03:32Z) - Making Coherence Out of Nothing At All: Measuring the Evolution of
Gradient Alignment [15.2292571922932]
We propose a new metric ($m$-coherence) to experimentally study the alignment of per-example gradients during training.
We show that $m$-coherence is more interpretable, cheaper to compute ($O(m)$ instead of $O(m2)$ and mathematically cleaner.
arXiv Detail & Related papers (2020-08-03T21:51:24Z) - Beyond $\mathcal{H}$-Divergence: Domain Adaptation Theory With
Jensen-Shannon Divergence [21.295136514836788]
We reveal the incoherence between the widely-adopted empirical domain adversarial training and its generally-assumed theoretical counterpart based on $mathcalH$-divergence.
We establish a new theoretical framework by directly proving the upper and lower target risk bounds based on joint distributional Jensen-Shannon divergence.
arXiv Detail & Related papers (2020-07-30T16:19:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.