Related papers: Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning

Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning

URL: http://arxiv.org/abs/2403.00352v1
Date: Fri, 1 Mar 2024 08:31:58 GMT
Title: Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning
Authors: Ruiqian Nai, Zixin Wen, Ji Li, Yuanzhi Li, Yang Gao
Abstract summary: In representation learning, a disentangled representation is highly desirable as it encodes generative factors of data in a separable and compact pattern. This paper further investigates the necessity of disentangled representation in downstream applications.
Score: 43.29587373211267
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In representation learning, a disentangled representation is highly desirable as it encodes generative factors of data in a separable and compact pattern. Researchers have advocated leveraging disentangled representations to complete downstream tasks with encouraging empirical evidence. This paper further investigates the necessity of disentangled representation in downstream applications. Specifically, we show that dimension-wise disentangled representations are unnecessary on a fundamental downstream task, abstract visual reasoning. We provide extensive empirical evidence against the necessity of disentanglement, covering multiple datasets, representation learning methods, and downstream network architectures. Furthermore, our findings suggest that the informativeness of representations is a better indicator of downstream performance than disentanglement. Finally, the positive correlation between informativeness and disentanglement explains the claimed usefulness of disentangled representations in previous works. The source code is available at https://github.com/Richard-coder-Nai/disentanglement-lib-necessity.git.

Related papers

Disentangled Representation Learning via Flow Matching [48.12507436294143]
Disentangled representation learning aims to capture the underlying explanatory factors of observed data.<n>Existing diffusion-based methods encourage factor independence via inductive biases, yet frequently lack strong semantic alignment.<n>We propose a flow matching-based framework for disentangled representation learning, which casts disentanglement as learning factor-conditioned flows in a compact latent space.
arXiv Detail & Related papers (2026-02-05T02:14:36Z)
Latent Diffusion U-Net Representations Contain Positional Embeddings and Anomalies [2.1261727383260043]
We analyze popular Stable Diffusion models using representational similarity and norms. Our findings reveal three phenomena: (1) the presence of a learned positional embedding in intermediate representations, (2) high-similarity corner artifacts, and (3) anomalous high-norm artifacts.
arXiv Detail & Related papers (2025-04-09T16:26:26Z)
MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion [14.907473847787541]
We propose Masked Diffusion Conditional (MacDiff) as a unified framework for human skeleton modeling. For the first time, we leverage diffusion models as effective skeleton representation learners. MacDiff achieves state-of-the-art performance on representation learning benchmarks while maintaining the competence for generative tasks.
arXiv Detail & Related papers (2024-09-16T17:06:10Z)
Abstraction requires breadth: a renormalisation group approach [0.0]
We argue that the level of abstraction depends crucially on how broad the training set is. We take the unique fixed point of this transformation -- the Hierarchical Feature Model -- as a candidate for an abstract representation.
arXiv Detail & Related papers (2024-07-01T14:13:11Z)
Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement [58.9768112704998]
Disentangled representation learning strives to extract the intrinsic factors within observed data. We introduce a new perspective and framework, demonstrating that diffusion models with cross-attention can serve as a powerful inductive bias. This is the first work to reveal the potent disentanglement capability of diffusion models with cross-attention, requiring no complex designs.
arXiv Detail & Related papers (2024-02-15T05:07:54Z)
Disentangled Representation Learning with Transmitted Information Bottleneck [57.22757813140418]
We present textbfDisTIB (textbfTransmitted textbfInformation textbfBottleneck for textbfDisd representation learning), a novel objective that navigates the balance between information compression and preservation.
arXiv Detail & Related papers (2023-11-03T03:18:40Z)
What Are You Token About? Dense Retrieval as Distributions Over the Vocabulary [68.77983831618685]
We propose to interpret the vector representations produced by dual encoders by projecting them into the model's vocabulary space. We show that the resulting projections contain rich semantic information, and draw connection between them and sparse retrieval.
arXiv Detail & Related papers (2022-12-20T16:03:25Z)
StreamHover: Livestream Transcript Summarization and Annotation [54.41877742041611]
We present StreamHover, a framework for annotating and summarizing livestream transcripts. With a total of over 500 hours of videos annotated with both extractive and abstractive summaries, our benchmark dataset is significantly larger than currently existing annotated corpora. We show that our model generalizes better and improves performance over strong baselines.
arXiv Detail & Related papers (2021-09-11T02:19:37Z)
Disentangled Recurrent Wasserstein Autoencoder [17.769077848342334]
recurrent Wasserstein Autoencoder (R-WAE) is a new framework for generative modeling of sequential data. R-WAE disentangles the representation of an input sequence into static and dynamic factors. Our models outperform other baselines with the same settings in terms of disentanglement and unconditional video generation.
arXiv Detail & Related papers (2021-01-19T07:43:25Z)
Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations. Our framework well preserves the relations between samples. By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.