Leveraging Task Structures for Improved Identifiability in Neural Network Representations
- URL: http://arxiv.org/abs/2306.14861v3
- Date: Fri, 23 Aug 2024 14:26:46 GMT
- Title: Leveraging Task Structures for Improved Identifiability in Neural Network Representations
- Authors: Wenlin Chen, Julien Horwood, Juyeon Heo, José Miguel Hernández-Lobato,
- Abstract summary: We extend the theory of identifiability in supervised learning by considering the consequences of having access to a distribution of tasks.
We show that linear identifiability is achievable in the general multi-task regression setting.
- Score: 31.863998589693065
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work extends the theory of identifiability in supervised learning by considering the consequences of having access to a distribution of tasks. In such cases, we show that linear identifiability is achievable in the general multi-task regression setting. Furthermore, we show that the existence of a task distribution which defines a conditional prior over latent factors reduces the equivalence class for identifiability to permutations and scaling of the true latent factors, a stronger and more useful result than linear identifiability. Crucially, when we further assume a causal structure over these tasks, our approach enables simple maximum marginal likelihood optimization, and suggests potential downstream applications to causal representation learning. Empirically, we find that this straightforward optimization procedure enables our model to outperform more general unsupervised models in recovering canonical representations for both synthetic data and real-world molecular data.
Related papers
- Features are fate: a theory of transfer learning in high-dimensional regression [23.840251319669907]
We show that when the target task is well represented by the feature space of the pre-trained model, transfer learning outperforms training from scratch.
For this model, we establish rigorously that when the feature space overlap between the source and target tasks is sufficiently strong, both linear transfer and fine-tuning improve performance.
arXiv Detail & Related papers (2024-10-10T17:58:26Z) - Likelihood-based Differentiable Structure Learning [38.25218464260965]
Existing approaches to differentiable structure learning of directed acyclic graphs (DAGs) rely on strong identifiability assumptions.
We explain and remedy these issues by studying the behavior of differentiable acyclicity-constrained programs under general likelihoods.
arXiv Detail & Related papers (2024-10-08T16:08:24Z) - Identifiable Latent Neural Causal Models [82.14087963690561]
Causal representation learning seeks to uncover latent, high-level causal representations from low-level observed data.
We determine the types of distribution shifts that do contribute to the identifiability of causal representations.
We translate our findings into a practical algorithm, allowing for the acquisition of reliable latent causal representations.
arXiv Detail & Related papers (2024-03-23T04:13:55Z) - Matrix Completion-Informed Deep Unfolded Equilibrium Models for
Self-Supervised k-Space Interpolation in MRI [8.33626757808923]
Regularization model-driven deep learning (DL) has gained significant attention due to its ability to leverage the potent representational capabilities of DL.
We propose a self-supervised DL approach for accelerated MRI that is theoretically guaranteed and does not rely on fully sampled labels.
arXiv Detail & Related papers (2023-09-24T07:25:06Z) - Predictive Coding beyond Correlations [59.47245250412873]
We show how one of such algorithms, called predictive coding, is able to perform causal inference tasks.
First, we show how a simple change in the inference process of predictive coding enables to compute interventions without the need to mutilate or redefine a causal graph.
arXiv Detail & Related papers (2023-06-27T13:57:16Z) - Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks.
The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data.
Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z) - Self-Supervised Learning via Maximum Entropy Coding [57.56570417545023]
We propose Maximum Entropy Coding (MEC) as a principled objective that explicitly optimize on the structure of the representation.
MEC learns a more generalizable representation than previous methods based on specific pretext tasks.
It achieves state-of-the-art performance consistently on various downstream tasks, including not only ImageNet linear probe, but also semi-supervised classification, object detection, instance segmentation, and object tracking.
arXiv Detail & Related papers (2022-10-20T17:58:30Z) - Generalizable Information Theoretic Causal Representation [37.54158138447033]
We propose to learn causal representation from observational data by regularizing the learning procedure with mutual information measures according to our hypothetical causal graph.
The optimization involves a counterfactual loss, based on which we deduce a theoretical guarantee that the causality-inspired learning is with reduced sample complexity and better generalization ability.
arXiv Detail & Related papers (2022-02-17T00:38:35Z) - Structural Causal Models Are (Solvable by) Credal Networks [70.45873402967297]
Causal inferences can be obtained by standard algorithms for the updating of credal nets.
This contribution should be regarded as a systematic approach to represent structural causal models by credal networks.
Experiments show that approximate algorithms for credal networks can immediately be used to do causal inference in real-size problems.
arXiv Detail & Related papers (2020-08-02T11:19:36Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.