Grouped Discrete Representation for Object-Centric Learning
- URL: http://arxiv.org/abs/2411.02299v1
- Date: Mon, 04 Nov 2024 17:25:10 GMT
- Title: Grouped Discrete Representation for Object-Centric Learning
- Authors: Rongzhen Zhao, Vivienne Wang, Juho Kannala, Joni Pajarinen,
- Abstract summary: We propose textitGroup Discrete Representation (GDR) for Object-Centric Learning.
GDR decomposes features into attributes via organized channel grouping, and composes these attributes into discrete representation via indexes.
- Score: 18.44580501357929
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Object-Centric Learning (OCL) can discover objects in images or videos by simply reconstructing the input. For better object discovery, representative OCL methods reconstruct the input as its Variational Autoencoder (VAE) intermediate representation, which suppresses pixel noises and promotes object separability by discretizing continuous super-pixels with template features. However, treating features as units overlooks their composing attributes, thus impeding model generalization; indexing features with scalar numbers loses attribute-level similarities and differences, thus hindering model convergence. We propose \textit{Grouped Discrete Representation} (GDR) for OCL. We decompose features into combinatorial attributes via organized channel grouping, and compose these attributes into discrete representation via tuple indexes. Experiments show that our GDR improves both Transformer- and Diffusion-based OCL methods consistently on various datasets. Visualizations show that our GDR captures better object separability.
Related papers
- Verbalized Representation Learning for Interpretable Few-Shot Generalization [162.2636511438425]
Verbalized Representation Learning (VRL) is a novel approach for automatically extracting human-interpretable features for object recognition.<n>Our method captures inter-class differences and intra-class commonalities in the form of natural language.<n>VRL achieves a 24% absolute improvement over prior state-of-the-art methods.
arXiv Detail & Related papers (2024-11-27T01:55:08Z) - Organized Grouped Discrete Representation for Object-Centric Learning [18.44580501357929]
Representative methods suppress pixel-level information redundancy and guide object-level feature aggregation.
The most recent advancement, Grouped Discrete Representation (GDR), further decomposes these template features into attributes.
We propose Organized GDR (OGDR) to organize channels belonging to the same attributes together for correct decomposition from features into attributes.
arXiv Detail & Related papers (2024-09-05T14:13:05Z) - Graph-guided Cross-composition Feature Disentanglement for Compositional Zero-shot Learning [54.08741382593959]
Disentanglement of visual features of primitives (i.e., attributes and objects) has shown exceptional results in Compositional Zero-shot Learning (CZSL)<n>It is challenging to learn disentangled primitive features that are general across different compositions.<n>We propose the solution of cross-composition feature disentanglement, which takes multiple primitive-sharing compositions as inputs.
arXiv Detail & Related papers (2024-08-19T08:23:09Z) - Grouped Discrete Representation Guides Object-Centric Learning [18.44580501357929]
Transformer-based Object-Centric Discrete Learning can abstract dense images or textures into sparse object-level features.
We propose textitGrouped Representation (GDR) to address these issues by grouping features into attributes and indexing them with numbers.
arXiv Detail & Related papers (2024-07-01T19:00:40Z) - Attributes Grouping and Mining Hashing for Fine-Grained Image Retrieval [24.8065557159198]
We propose an Attributes Grouping and Mining Hashing (AGMH) for fine-grained image retrieval.
AGMH groups and embeds the category-specific visual attributes in multiple descriptors to generate a comprehensive feature representation.
AGMH consistently yields the best performance against state-of-the-art methods on fine-grained benchmark datasets.
arXiv Detail & Related papers (2023-11-10T14:01:56Z) - Hierarchical Visual Primitive Experts for Compositional Zero-Shot
Learning [52.506434446439776]
Compositional zero-shot learning (CZSL) aims to recognize compositions with prior knowledge of known primitives (attribute and object)
We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues.
Our method achieves SoTA performance on several benchmarks, including MIT-States, C-GQA, and VAW-CZSL.
arXiv Detail & Related papers (2023-08-08T03:24:21Z) - Semantics-Aware Dynamic Localization and Refinement for Referring Image
Segmentation [102.25240608024063]
Referring image segments an image from a language expression.
We develop an algorithm that shifts from being localization-centric to segmentation-language.
Compared to its counterparts, our method is more versatile yet effective.
arXiv Detail & Related papers (2023-03-11T08:42:40Z) - Triplet Contrastive Learning for Unsupervised Vehicle Re-identification [55.445358749042384]
Part feature learning is a critical technology for fine semantic understanding in vehicle re-identification.
We propose a novel Triplet Contrastive Learning framework (TCL) which leverages cluster features to bridge the part features and global features.
arXiv Detail & Related papers (2023-01-23T15:52:12Z) - Complex-Valued Autoencoders for Object Discovery [62.26260974933819]
We propose a distributed approach to object-centric representations: the Complex AutoEncoder.
We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets.
We also show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.
arXiv Detail & Related papers (2022-04-05T09:25:28Z) - Self-Supervised Learning Disentangled Group Representation as Feature [82.07737719232972]
We show that existing Self-Supervised Learning (SSL) only disentangles simple augmentation features such as rotation and colorization.
We propose an iterative SSL algorithm: Iterative Partition-based Invariant Risk Minimization (IP-IRM)
We prove that IP-IRM converges to a fully disentangled representation and show its effectiveness on various benchmarks.
arXiv Detail & Related papers (2021-10-28T16:12:33Z) - Enhancing Latent Space Clustering in Multi-filter Seq2Seq Model: A
Reinforcement Learning Approach [0.0]
We design a latent-enhanced multi-filter seq2seq model (LMS2S) that analyzes the latent space representations using a clustering algorithm.
Our experiments on semantic parsing and machine translation demonstrate the positive correlation between the clustering quality and the model's performance.
arXiv Detail & Related papers (2021-09-25T16:36:31Z) - Invariant Deep Compressible Covariance Pooling for Aerial Scene
Categorization [80.55951673479237]
We propose a novel invariant deep compressible covariance pooling (IDCCP) to solve nuisance variations in aerial scene categorization.
We conduct extensive experiments on the publicly released aerial scene image data sets and demonstrate the superiority of this method compared with state-of-the-art methods.
arXiv Detail & Related papers (2020-11-11T11:13:07Z) - Image Clustering using an Augmented Generative Adversarial Network and
Information Maximization [9.614694312155798]
We propose a deep clustering framework consisting of a modified generative adversarial network (GAN) and an auxiliary classifier.
The proposed method significantly outperforms state-of-the-art clustering methods on CIFAR-10 and CIFAR-100, and is competitive on the STL10 and MNIST datasets.
arXiv Detail & Related papers (2020-11-08T22:20:33Z) - Closed-Form Factorization of Latent Semantics in GANs [65.42778970898534]
A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images.
In this work, we examine the internal representation learned by GANs to reveal the underlying variation factors in an unsupervised manner.
We propose a closed-form factorization algorithm for latent semantic discovery by directly decomposing the pre-trained weights.
arXiv Detail & Related papers (2020-07-13T18:05:36Z) - Invariant Feature Coding using Tensor Product Representation [75.62232699377877]
We prove that the group-invariant feature vector contains sufficient discriminative information when learning a linear classifier.
A novel feature model that explicitly consider group action is proposed for principal component analysis and k-means clustering.
arXiv Detail & Related papers (2019-06-05T07:15:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.