Grouped Discrete Representation for Object-Centric Learning
- URL: http://arxiv.org/abs/2411.02299v1
- Date: Mon, 04 Nov 2024 17:25:10 GMT
- Title: Grouped Discrete Representation for Object-Centric Learning
- Authors: Rongzhen Zhao, Vivienne Wang, Juho Kannala, Joni Pajarinen,
- Abstract summary: We propose textitGroup Discrete Representation (GDR) for Object-Centric Learning.
GDR decomposes features into attributes via organized channel grouping, and composes these attributes into discrete representation via indexes.
- Score: 18.44580501357929
- License:
- Abstract: Object-Centric Learning (OCL) can discover objects in images or videos by simply reconstructing the input. For better object discovery, representative OCL methods reconstruct the input as its Variational Autoencoder (VAE) intermediate representation, which suppresses pixel noises and promotes object separability by discretizing continuous super-pixels with template features. However, treating features as units overlooks their composing attributes, thus impeding model generalization; indexing features with scalar numbers loses attribute-level similarities and differences, thus hindering model convergence. We propose \textit{Grouped Discrete Representation} (GDR) for OCL. We decompose features into combinatorial attributes via organized channel grouping, and compose these attributes into discrete representation via tuple indexes. Experiments show that our GDR improves both Transformer- and Diffusion-based OCL methods consistently on various datasets. Visualizations show that our GDR captures better object separability.
Related papers
- Organized Grouped Discrete Representation for Object-Centric Learning [18.44580501357929]
Representative methods suppress pixel-level information redundancy and guide object-level feature aggregation.
The most recent advancement, Grouped Discrete Representation (GDR), further decomposes these template features into attributes.
We propose Organized GDR (OGDR) to organize channels belonging to the same attributes together for correct decomposition from features into attributes.
arXiv Detail & Related papers (2024-09-05T14:13:05Z) - Grouped Discrete Representation Guides Object-Centric Learning [18.44580501357929]
Transformer-based Object-Centric Discrete Learning can abstract dense images or textures into sparse object-level features.
We propose textitGrouped Representation (GDR) to address these issues by grouping features into attributes and indexing them with numbers.
arXiv Detail & Related papers (2024-07-01T19:00:40Z) - Triplet Contrastive Learning for Unsupervised Vehicle Re-identification [55.445358749042384]
Part feature learning is a critical technology for fine semantic understanding in vehicle re-identification.
We propose a novel Triplet Contrastive Learning framework (TCL) which leverages cluster features to bridge the part features and global features.
arXiv Detail & Related papers (2023-01-23T15:52:12Z) - Complex-Valued Autoencoders for Object Discovery [62.26260974933819]
We propose a distributed approach to object-centric representations: the Complex AutoEncoder.
We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets.
We also show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.
arXiv Detail & Related papers (2022-04-05T09:25:28Z) - Self-Supervised Learning Disentangled Group Representation as Feature [82.07737719232972]
We show that existing Self-Supervised Learning (SSL) only disentangles simple augmentation features such as rotation and colorization.
We propose an iterative SSL algorithm: Iterative Partition-based Invariant Risk Minimization (IP-IRM)
We prove that IP-IRM converges to a fully disentangled representation and show its effectiveness on various benchmarks.
arXiv Detail & Related papers (2021-10-28T16:12:33Z) - Enhancing Latent Space Clustering in Multi-filter Seq2Seq Model: A
Reinforcement Learning Approach [0.0]
We design a latent-enhanced multi-filter seq2seq model (LMS2S) that analyzes the latent space representations using a clustering algorithm.
Our experiments on semantic parsing and machine translation demonstrate the positive correlation between the clustering quality and the model's performance.
arXiv Detail & Related papers (2021-09-25T16:36:31Z) - Invariant Deep Compressible Covariance Pooling for Aerial Scene
Categorization [80.55951673479237]
We propose a novel invariant deep compressible covariance pooling (IDCCP) to solve nuisance variations in aerial scene categorization.
We conduct extensive experiments on the publicly released aerial scene image data sets and demonstrate the superiority of this method compared with state-of-the-art methods.
arXiv Detail & Related papers (2020-11-11T11:13:07Z) - Image Clustering using an Augmented Generative Adversarial Network and
Information Maximization [9.614694312155798]
We propose a deep clustering framework consisting of a modified generative adversarial network (GAN) and an auxiliary classifier.
The proposed method significantly outperforms state-of-the-art clustering methods on CIFAR-10 and CIFAR-100, and is competitive on the STL10 and MNIST datasets.
arXiv Detail & Related papers (2020-11-08T22:20:33Z) - Closed-Form Factorization of Latent Semantics in GANs [65.42778970898534]
A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images.
In this work, we examine the internal representation learned by GANs to reveal the underlying variation factors in an unsupervised manner.
We propose a closed-form factorization algorithm for latent semantic discovery by directly decomposing the pre-trained weights.
arXiv Detail & Related papers (2020-07-13T18:05:36Z) - Invariant Feature Coding using Tensor Product Representation [75.62232699377877]
We prove that the group-invariant feature vector contains sufficient discriminative information when learning a linear classifier.
A novel feature model that explicitly consider group action is proposed for principal component analysis and k-means clustering.
arXiv Detail & Related papers (2019-06-05T07:15:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.