Explicitly Disentangled Representations in Object-Centric Learning
- URL: http://arxiv.org/abs/2401.10148v1
- Date: Thu, 18 Jan 2024 17:22:11 GMT
- Title: Explicitly Disentangled Representations in Object-Centric Learning
- Authors: Riccardo Majellaro, Jonathan Collu, Aske Plaat, Thomas M. Moerland
- Abstract summary: We propose a novel architecture that biases object-centric models toward disentangling shape and texture components.
In particular, we propose a novel architecture that biases object-centric models toward disentangling shape and texture components.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extracting structured representations from raw visual data is an important
and long-standing challenge in machine learning. Recently, techniques for
unsupervised learning of object-centric representations have raised growing
interest. In this context, enhancing the robustness of the latent features can
improve the efficiency and effectiveness of the training of downstream tasks. A
promising step in this direction is to disentangle the factors that cause
variation in the data. Previously, Invariant Slot Attention disentangled
position, scale, and orientation from the remaining features. Extending this
approach, we focus on separating the shape and texture components. In
particular, we propose a novel architecture that biases object-centric models
toward disentangling shape and texture components into two non-overlapping
subsets of the latent space dimensions. These subsets are known a priori, hence
before the training process. Experiments on a range of object-centric
benchmarks reveal that our approach achieves the desired disentanglement while
also numerically improving baseline performance in most cases. In addition, we
show that our method can generate novel textures for a specific object or
transfer textures between objects with distinct shapes.
Related papers
- Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - Learning Disentangled Representation in Object-Centric Models for Visual Dynamics Prediction via Transformers [11.155818952879146]
Recent work has shown that object-centric representations can greatly help improve the accuracy of learning dynamics.
Can learning disentangled representation further improve the accuracy of visual dynamics prediction in object-centric models?
We try to learn such disentangled representations for the case of static images citepnsb, without making any specific assumptions about the kind of attributes that an object might have.
arXiv Detail & Related papers (2024-07-03T15:43:54Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Disentangling Visual Embeddings for Attributes and Objects [38.27308243429424]
We study the problem of compositional zero-shot learning for object-attribute recognition.
Prior works use visual features extracted with a backbone network, pre-trained for object classification.
We propose a novel architecture that can disentangle attribute and object features in the visual space.
arXiv Detail & Related papers (2022-05-17T17:59:36Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - Generalization and Robustness Implications in Object-Centric Learning [23.021791024676986]
In this paper, we train state-of-the-art unsupervised models on five common multi-object datasets.
From our experimental study, we find object-centric representations to be generally useful for downstream tasks.
arXiv Detail & Related papers (2021-07-01T17:51:11Z) - Unsupervised Learning of 3D Object Categories from Videos in the Wild [75.09720013151247]
We focus on learning a model from multiple views of a large collection of object instances.
We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction.
Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
arXiv Detail & Related papers (2021-03-30T17:57:01Z) - Context Decoupling Augmentation for Weakly Supervised Semantic
Segmentation [53.49821324597837]
Weakly supervised semantic segmentation is a challenging problem that has been deeply studied in recent years.
We present a Context Decoupling Augmentation ( CDA) method to change the inherent context in which the objects appear.
To validate the effectiveness of the proposed method, extensive experiments on PASCAL VOC 2012 dataset with several alternative network architectures demonstrate that CDA can boost various popular WSSS methods to the new state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-03-02T15:05:09Z) - Combining Semantic Guidance and Deep Reinforcement Learning For
Generating Human Level Paintings [22.889059874754242]
Generation of stroke-based non-photorealistic imagery is an important problem in the computer vision community.
Previous methods have been limited to datasets with little variation in position, scale and saliency of the foreground object.
We propose a Semantic Guidance pipeline with 1) a bi-level painting procedure for learning the distinction between foreground and background brush strokes at training time.
arXiv Detail & Related papers (2020-11-25T09:00:04Z) - Saliency-driven Class Impressions for Feature Visualization of Deep
Neural Networks [55.11806035788036]
It is advantageous to visualize the features considered to be essential for classification.
Existing visualization methods develop high confidence images consisting of both background and foreground features.
In this work, we propose a saliency-driven approach to visualize discriminative features that are considered most important for a given task.
arXiv Detail & Related papers (2020-07-31T06:11:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.