Properties from Mechanisms: An Equivariance Perspective on Identifiable
Representation Learning
- URL: http://arxiv.org/abs/2110.15796v1
- Date: Fri, 29 Oct 2021 14:04:08 GMT
- Title: Properties from Mechanisms: An Equivariance Perspective on Identifiable
Representation Learning
- Authors: Kartik Ahuja, Jason Hartford, Yoshua Bengio
- Abstract summary: Key goal of unsupervised representation learning is "inverting" a data generating process to recover its latent properties.
This paper asks, "Can we instead identify latent properties by leveraging knowledge of the mechanisms that govern their evolution?"
We provide a complete characterization of the sources of non-identifiability as we vary knowledge about a set of possible mechanisms.
- Score: 79.4957965474334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key goal of unsupervised representation learning is "inverting" a data
generating process to recover its latent properties. Existing work that
provably achieves this goal relies on strong assumptions on relationships
between the latent variables (e.g., independence conditional on auxiliary
information). In this paper, we take a very different perspective on the
problem and ask, "Can we instead identify latent properties by leveraging
knowledge of the mechanisms that govern their evolution?" We provide a complete
characterization of the sources of non-identifiability as we vary knowledge
about a set of possible mechanisms. In particular, we prove that if we know the
exact mechanisms under which the latent properties evolve, then identification
can be achieved up to any equivariances that are shared by the underlying
mechanisms. We generalize this characterization to settings where we only know
some hypothesis class over possible mechanisms, as well as settings where the
mechanisms are stochastic. We demonstrate the power of this mechanism-based
perspective by showing that we can leverage our results to generalize existing
identifiable representation learning results. These results suggest that by
exploiting inductive biases on mechanisms, it is possible to design a range of
new identifiable representation learning approaches.
Related papers
- Identifiable Representation and Model Learning for Latent Dynamic Systems [0.0]
We study the problem of identifiable representation and model learning for latent dynamic systems.
We prove that, for linear or affine nonlinear latent dynamic systems, it is possible to identify the representations up to scaling and determine the models up to some simple transformations.
arXiv Detail & Related papers (2024-10-23T13:55:42Z) - Identifiable Exchangeable Mechanisms for Causal Structure and Representation Learning [54.69189620971405]
We provide a unified framework, termed Identifiable Exchangeable Mechanisms (IEM), for representation and structure learning.
IEM provides new insights that let us relax the necessary conditions for causal structure identification in exchangeable non-i.i.d. data.
We also demonstrate the existence of a duality condition in identifiable representation learning, leading to new identifiability results.
arXiv Detail & Related papers (2024-06-20T13:30:25Z) - Binding Dynamics in Rotating Features [72.80071820194273]
We propose an alternative "cosine binding" mechanism, which explicitly computes the alignment between features and adjusts weights accordingly.
This allows us to draw direct connections to self-attention and biological neural processes, and to shed light on the fundamental dynamics for object-centric representations to emerge in Rotating Features.
arXiv Detail & Related papers (2024-02-08T12:31:08Z) - Nonparametric Partial Disentanglement via Mechanism Sparsity: Sparse
Actions, Interventions and Sparse Temporal Dependencies [58.179981892921056]
This work introduces a novel principle for disentanglement we call mechanism sparsity regularization.
We propose a representation learning method that induces disentanglement by simultaneously learning the latent factors.
We show that the latent factors can be recovered by regularizing the learned causal graph to be sparse.
arXiv Detail & Related papers (2024-01-10T02:38:21Z) - Attention: Marginal Probability is All You Need? [0.0]
We propose an alternative Bayesian foundation for attentional mechanisms.
We show how this unifies different attentional architectures in machine learning.
We hope this work will guide more sophisticated intuitions into the key properties of attention architectures.
arXiv Detail & Related papers (2023-04-07T14:38:39Z) - Counterfactual Inference of Second Opinions [13.93477033094828]
Automated decision support systems that are able to infer second opinions from experts can potentially facilitate a more efficient allocation of resources.
This paper looks at the design of this type of support systems from the perspective of counterfactual inference.
Experiments on both synthetic and real data show that our model can be used to infer second opinions more accurately than its non-causal counterpart.
arXiv Detail & Related papers (2022-03-16T14:40:41Z) - Discovering Latent Causal Variables via Mechanism Sparsity: A New
Principle for Nonlinear ICA [81.4991350761909]
Independent component analysis (ICA) refers to an ensemble of methods which formalize this goal and provide estimation procedure for practical application.
We show that the latent variables can be recovered up to a permutation if one regularizes the latent mechanisms to be sparse.
arXiv Detail & Related papers (2021-07-21T14:22:14Z) - Transformers with Competitive Ensembles of Independent Mechanisms [97.93090139318294]
We propose a new Transformer layer which divides the hidden representation and parameters into multiple mechanisms, which only exchange information through attention.
We study TIM on a large-scale BERT model, on the Image Transformer, and on speech enhancement and find evidence for semantically meaningful specialization as well as improved performance.
arXiv Detail & Related papers (2021-02-27T21:48:46Z) - A theory of independent mechanisms for extrapolation in generative
models [20.794692397859755]
Generative models can be trained to emulate complex empirical data, but are they useful to make predictions in the context of previously unobserved environments?
We develop a theoretical framework to address this challenging situation by defining a weaker form of identifiability, based on the principle of independence of mechanisms.
We demonstrate on toy examples that classical gradient descent can hinder the model's extrapolation capabilities, suggesting independence of mechanisms should be enforced explicitly during training.
arXiv Detail & Related papers (2020-04-01T01:01:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.