Related papers: Disentangling Representations through Multi-task Learning

Disentangling Representations through Multi-task Learning

URL: http://arxiv.org/abs/2407.11249v2
Date: Tue, 15 Oct 2024 07:03:07 GMT
Title: Disentangling Representations through Multi-task Learning
Authors: Pantelis Vafidis, Aman Bhargava, Antonio Rangel,
Abstract summary: We provide experimental and theoretical results guaranteeing the emergence of disentangled representations in agents that optimally solve classification tasks. We experimentally validate these predictions in RNNs trained on multi-task classification. We find that transformers are particularly suited for disentangling representations, which might explain their unique world understanding abilities.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Intelligent perception and interaction with the world hinges on internal representations that capture its underlying structure ("disentangled" or "abstract" representations). Disentangled representations serve as world models, isolating latent factors of variation in the world along orthogonal directions, thus facilitating feature-based generalization. We provide experimental and theoretical results guaranteeing the emergence of disentangled representations in agents that optimally solve multi-task evidence aggregation classification tasks, canonical in the cognitive neuroscience literature. The key conceptual finding is that, by producing accurate multi-task classification estimates, a system implicitly represents a set of coordinates specifying a disentangled representation of the underlying latent state of the data it receives. The theory provides conditions for the emergence of these representations in terms of noise, number of tasks, and evidence aggregation time. We experimentally validate these predictions in RNNs trained on multi-task classification, which learn disentangled representations in the form of continuous attractors, leading to zero-shot out-of-distribution (OOD) generalization in predicting latent factors. We demonstrate the robustness of our framework across autoregressive architectures, decision boundary geometries and in tasks requiring classification confidence estimation. We find that transformers are particularly suited for disentangling representations, which might explain their unique world understanding abilities. Overall, our framework puts forth parallel processing as a general principle for the formation of cognitive maps that capture the structure of the world in both biological and artificial systems, and helps explain why ANNs often arrive at human-interpretable concepts, and how they both may acquire exceptional zero-shot generalization capabilities.

Related papers

Convergent World Representations and Divergent Tasks [7.378937711027778]
We develop a framework clearly separating the underlying world, the data generation process and the resulting model representations.<n>We find that different tasks give rise to qualitatively and quantitatively distinct world representation geometries.<n>To study adaptation, we pretrain models on all tasks, then test whether new entities (cities) can be consistently integrated into the representation space.
arXiv Detail & Related papers (2026-01-31T05:59:15Z)
Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark [69.8473923357969]
Unified multimodal models aim to jointly enable visual understanding and generation, yet current benchmarks rarely examine their true integration.<n>We present Uni-MMMU, a comprehensive benchmark that unfolds the bidirectional synergy between generation and understanding across eight reasoning-centric domains.
arXiv Detail & Related papers (2025-10-15T17:10:35Z)
Task-Driven Discrete Representation Learning [1.604511025616605]
We propose a unified framework that explores the usefulness of discrete features in relation to downstream tasks.<n>We provide an additional theoretical analysis of the trade-off between representational capacity and sample complexity.
arXiv Detail & Related papers (2025-06-13T07:12:49Z)
Continuous Representation Methods, Theories, and Applications: An Overview and Perspectives [55.22101595974193]
Recently, continuous representation methods emerge as novel paradigms that characterize the intrinsic structures of real-world data.<n>This review focuses on three aspects: (i) Continuous representation method designs such as basis function representation, statistical modeling, tensor function decomposition, and implicit neural representation; (ii) Theoretical foundations of continuous representations such as approximation error analysis, convergence property, and implicit regularization; and (iii) Real-world applications of continuous representations derived from computer vision, graphics, bioinformatics, and remote sensing.
arXiv Detail & Related papers (2025-05-21T07:50:19Z)
Understanding Task Representations in Neural Networks via Bayesian Ablation [1.3980986259786223]
We introduce a novel probabilistic framework for interpreting latent task representations in neural networks.<n>Inspired by Bayesian inference, our approach defines a distribution over representational units to infer their causal contributions to task performance.
arXiv Detail & Related papers (2025-05-19T21:36:09Z)
Enhancing Zero-Shot Image Recognition in Vision-Language Models through Human-like Concept Guidance [41.6755826072905]
In zero-shot image recognition tasks, humans demonstrate remarkable flexibility in classifying unseen categories. Existing vision-language models often underperform in real-world applications because of sub-optimal prompt engineering. We propose a Concept-guided Human-like Bayesian Reasoning framework to address these issues.
arXiv Detail & Related papers (2025-03-20T06:20:13Z)
Uniting contrastive and generative learning for event sequences models [51.547576949425604]
This study investigates the integration of two self-supervised learning techniques - instance-wise contrastive learning and a generative approach based on restoring masked events in latent space. Experiments conducted on several public datasets, focusing on sequence classification and next-event type prediction, show that the integrated method achieves superior performance compared to individual approaches.
arXiv Detail & Related papers (2024-08-19T13:47:17Z)
Latent Communication in Artificial Neural Networks [2.5947832846531886]
This dissertation focuses on the universality and reusability of neural representations. A salient observation from our research is the emergence of similarities in latent representations.
arXiv Detail & Related papers (2024-06-16T17:13:58Z)
Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales [54.78115855552886]
We show how to construct over-complete invariants with a Convolutional Neural Networks (CNN)-like hierarchical architecture. With the over-completeness, discriminative features w.r.t. the task can be adaptively formed in a Neural Architecture Search (NAS)-like manner. For robust and interpretable vision tasks at larger scales, hierarchical invariant representation can be considered as an effective alternative to traditional CNN and invariants.
arXiv Detail & Related papers (2024-02-23T16:50:07Z)
Understanding Distributed Representations of Concepts in Deep Neural Networks without Supervision [25.449397570387802]
We propose an unsupervised method for discovering distributed representations of concepts by selecting a principal subset of neurons. Our empirical findings demonstrate that instances with similar neuron activation states tend to share coherent concepts. It can be utilized to identify unlabeled subclasses within data and to detect the causes of misclassifications.
arXiv Detail & Related papers (2023-12-28T07:33:51Z)
Leveraging sparse and shared feature activations for disentangled representation learning [112.22699167017471]
We propose to leverage knowledge extracted from a diversified set of supervised tasks to learn a common disentangled representation. We validate our approach on six real world distribution shift benchmarks, and different data modalities.
arXiv Detail & Related papers (2023-04-17T01:33:24Z)
Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations. Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z)
Generalized Representations Learning for Time Series Classification [28.230863650758447]
We argue that the temporal complexity attributes to the unknown latent distributions within time series classification. We present experiments on gesture recognition, speech commands recognition, wearable stress and affect detection, and sensor-based human activity recognition.
arXiv Detail & Related papers (2022-09-15T03:36:31Z)
On Neural Architecture Inductive Biases for Relational Tasks [76.18938462270503]
We introduce a simple architecture based on similarity-distribution scores which we name Compositional Network generalization (CoRelNet) We find that simple architectural choices can outperform existing models in out-of-distribution generalizations.
arXiv Detail & Related papers (2022-06-09T16:24:01Z)
Exploring the Trade-off between Plausibility, Change Intensity and Adversarial Power in Counterfactual Explanations using Multi-objective Optimization [73.89239820192894]
We argue that automated counterfactual generation should regard several aspects of the produced adversarial instances. We present a novel framework for the generation of counterfactual examples.
arXiv Detail & Related papers (2022-05-20T15:02:53Z)
Interpretable part-whole hierarchies and conceptual-semantic relationships in neural networks [4.153804257347222]
We present Agglomerator, a framework capable of providing a representation of part-whole hierarchies from visual cues. We evaluate our method on common datasets, such as SmallNORB, MNIST, FashionMNIST, CIFAR-10, and CIFAR-100.
arXiv Detail & Related papers (2022-03-07T10:56:13Z)
Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules. inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z)
A Minimalist Dataset for Systematic Generalization of Perception, Syntax, and Semantics [131.93113552146195]
We present a new dataset, Handwritten arithmetic with INTegers (HINT), to examine machines' capability of learning generalizable concepts. In HINT, machines are tasked with learning how concepts are perceived from raw signals such as images. We undertake extensive experiments with various sequence-to-sequence models, including RNNs, Transformers, and GPT-3.
arXiv Detail & Related papers (2021-03-02T01:32:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.