ORGAN: Object-Centric Representation Learning using Cycle Consistent Generative Adversarial Networks
- URL: http://arxiv.org/abs/2603.02063v1
- Date: Mon, 02 Mar 2026 16:51:52 GMT
- Title: ORGAN: Object-Centric Representation Learning using Cycle Consistent Generative Adversarial Networks
- Authors: Joël Küchler, Ellen van Maren, Vaiva Vasiliauskaitė, Katarina Vulić, Reza Abbasi-Asl, Stephan J. Ihle,
- Abstract summary: We present ORGAN, a novel approach for object-centric representation learning.<n>It is based on cycle-consistent Generative Adversarial Networks instead.<n>We show that it performs similarly to other state-of-the-art approaches on synthetic datasets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Although data generation is often straightforward, extracting information from data is more difficult. Object-centric representation learning can extract information from images in an unsupervised manner. It does so by segmenting an image into its subcomponents: the objects. Each object is then represented in a low-dimensional latent space that can be used for downstream processing. Object-centric representation learning is dominated by autoencoder architectures (AEs). Here, we present ORGAN, a novel approach for object-centric representation learning, which is based on cycle-consistent Generative Adversarial Networks instead. We show that it performs similarly to other state-of-the-art approaches on synthetic datasets, while at the same time being the only approach tested here capable of handling more challenging real-world datasets with many objects and low visual contrast. Complementing these results, ORGAN creates expressive latent space representations that allow for object manipulation. Finally, we show that ORGAN scales well both with respect to the number of objects and the size of the images, giving it a unique edge over current state-of-the-art approaches.
Related papers
- Finding NeMO: A Geometry-Aware Representation of Template Views for Few-Shot Perception [9.145558382187524]
We present a novel object-centric representation that can be used to detect, segment and estimate the 6DoF pose of objects unseen during training using RGB images.<n>Our method consists of an encoder that requires only a few RGB template views depicting an object to generate a sparse object-like point cloud.<n>Next, a decoder takes the object encoding together with a query image to generate a variety of dense predictions.
arXiv Detail & Related papers (2026-02-04T09:12:05Z) - Object Pose Estimation Using Implicit Representation For Transparent Objects [0.0]
The render-and-compare method renders the object from multiple views and compares it against the given 2D image.
We show that if the object is represented as an implicit (neural) representation in the form of Neural Radiance Field (NeRF), it exhibits a more realistic rendering of the actual scene.
We evaluated our NeRF implementation of the render-and-compare method on transparent datasets and found that it surpassed the current state-of-the-art results.
arXiv Detail & Related papers (2024-10-17T11:51:12Z) - Top-Down Guidance for Learning Object-Centric Representations [30.06924788022504]
Top-Down Guided Network (TDGNet) is a top-down pathway to improve object-centric representations.<n>We show that TDGNet outperforms current object-centric models on multiple datasets of varying complexity.
arXiv Detail & Related papers (2024-05-17T07:48:27Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Object Scene Representation Transformer [56.40544849442227]
We introduce Object Scene Representation Transformer (OSRT), a 3D-centric model in which individual object representations naturally emerge through novel view synthesis.
OSRT scales to significantly more complex scenes with larger diversity of objects and backgrounds than existing methods.
It is multiple orders of magnitude faster at compositional rendering thanks to its light field parametrization and the novel Slot Mixer decoder.
arXiv Detail & Related papers (2022-06-14T15:40:47Z) - Self-Supervised Learning of Object Parts for Semantic Segmentation [7.99536002595393]
We argue that self-supervised learning of object parts is a solution to this issue.
Our method surpasses the state-of-the-art on three semantic segmentation benchmarks by 17%-3%.
arXiv Detail & Related papers (2022-04-27T17:55:17Z) - Complex-Valued Autoencoders for Object Discovery [62.26260974933819]
We propose a distributed approach to object-centric representations: the Complex AutoEncoder.
We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets.
We also show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.
arXiv Detail & Related papers (2022-04-05T09:25:28Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Learning RGB-D Feature Embeddings for Unseen Object Instance
Segmentation [67.88276573341734]
We propose a new method for unseen object instance segmentation by learning RGB-D feature embeddings from synthetic data.
A metric learning loss function is utilized to learn to produce pixel-wise feature embeddings.
We further improve the segmentation accuracy with a new two-stage clustering algorithm.
arXiv Detail & Related papers (2020-07-30T00:23:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.