Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity
for Abstract Visual Reasoning
- URL: http://arxiv.org/abs/2403.00352v1
- Date: Fri, 1 Mar 2024 08:31:58 GMT
- Title: Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity
for Abstract Visual Reasoning
- Authors: Ruiqian Nai, Zixin Wen, Ji Li, Yuanzhi Li, Yang Gao
- Abstract summary: In representation learning, a disentangled representation is highly desirable as it encodes generative factors of data in a separable and compact pattern.
This paper further investigates the necessity of disentangled representation in downstream applications.
- Score: 43.29587373211267
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In representation learning, a disentangled representation is highly desirable
as it encodes generative factors of data in a separable and compact pattern.
Researchers have advocated leveraging disentangled representations to complete
downstream tasks with encouraging empirical evidence. This paper further
investigates the necessity of disentangled representation in downstream
applications. Specifically, we show that dimension-wise disentangled
representations are unnecessary on a fundamental downstream task, abstract
visual reasoning. We provide extensive empirical evidence against the necessity
of disentanglement, covering multiple datasets, representation learning
methods, and downstream network architectures. Furthermore, our findings
suggest that the informativeness of representations is a better indicator of
downstream performance than disentanglement. Finally, the positive correlation
between informativeness and disentanglement explains the claimed usefulness of
disentangled representations in previous works. The source code is available at
https://github.com/Richard-coder-Nai/disentanglement-lib-necessity.git.
Related papers
- Disentangled Representation Learning via Flow Matching [48.12507436294143]
Disentangled representation learning aims to capture the underlying explanatory factors of observed data.<n>Existing diffusion-based methods encourage factor independence via inductive biases, yet frequently lack strong semantic alignment.<n>We propose a flow matching-based framework for disentangled representation learning, which casts disentanglement as learning factor-conditioned flows in a compact latent space.
arXiv Detail & Related papers (2026-02-05T02:14:36Z) - Latent Diffusion U-Net Representations Contain Positional Embeddings and Anomalies [2.1261727383260043]
We analyze popular Stable Diffusion models using representational similarity and norms.
Our findings reveal three phenomena: (1) the presence of a learned positional embedding in intermediate representations, (2) high-similarity corner artifacts, and (3) anomalous high-norm artifacts.
arXiv Detail & Related papers (2025-04-09T16:26:26Z) - MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion [14.907473847787541]
We propose Masked Diffusion Conditional (MacDiff) as a unified framework for human skeleton modeling.
For the first time, we leverage diffusion models as effective skeleton representation learners.
MacDiff achieves state-of-the-art performance on representation learning benchmarks while maintaining the competence for generative tasks.
arXiv Detail & Related papers (2024-09-16T17:06:10Z) - Abstraction requires breadth: a renormalisation group approach [0.0]
We argue that the level of abstraction depends crucially on how broad the training set is.
We take the unique fixed point of this transformation -- the Hierarchical Feature Model -- as a candidate for an abstract representation.
arXiv Detail & Related papers (2024-07-01T14:13:11Z) - Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement [58.9768112704998]
Disentangled representation learning strives to extract the intrinsic factors within observed data.
We introduce a new perspective and framework, demonstrating that diffusion models with cross-attention can serve as a powerful inductive bias.
This is the first work to reveal the potent disentanglement capability of diffusion models with cross-attention, requiring no complex designs.
arXiv Detail & Related papers (2024-02-15T05:07:54Z) - Disentangled Representation Learning with Transmitted Information Bottleneck [57.22757813140418]
We present textbfDisTIB (textbfTransmitted textbfInformation textbfBottleneck for textbfDisd representation learning), a novel objective that navigates the balance between information compression and preservation.
arXiv Detail & Related papers (2023-11-03T03:18:40Z) - What Are You Token About? Dense Retrieval as Distributions Over the
Vocabulary [68.77983831618685]
We propose to interpret the vector representations produced by dual encoders by projecting them into the model's vocabulary space.
We show that the resulting projections contain rich semantic information, and draw connection between them and sparse retrieval.
arXiv Detail & Related papers (2022-12-20T16:03:25Z) - StreamHover: Livestream Transcript Summarization and Annotation [54.41877742041611]
We present StreamHover, a framework for annotating and summarizing livestream transcripts.
With a total of over 500 hours of videos annotated with both extractive and abstractive summaries, our benchmark dataset is significantly larger than currently existing annotated corpora.
We show that our model generalizes better and improves performance over strong baselines.
arXiv Detail & Related papers (2021-09-11T02:19:37Z) - Disentangled Recurrent Wasserstein Autoencoder [17.769077848342334]
recurrent Wasserstein Autoencoder (R-WAE) is a new framework for generative modeling of sequential data.
R-WAE disentangles the representation of an input sequence into static and dynamic factors.
Our models outperform other baselines with the same settings in terms of disentanglement and unconditional video generation.
arXiv Detail & Related papers (2021-01-19T07:43:25Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.