Organized Grouped Discrete Representation for Object-Centric Learning
- URL: http://arxiv.org/abs/2409.03553v3
- Date: Wed, 2 Oct 2024 12:40:01 GMT
- Title: Organized Grouped Discrete Representation for Object-Centric Learning
- Authors: Rongzhen Zhao, Vivienne Wang, Juho Kannala, Joni Pajarinen,
- Abstract summary: Representative methods suppress pixel-level information redundancy and guide object-level feature aggregation.
The most recent advancement, Grouped Discrete Representation (GDR), further decomposes these template features into attributes.
We propose Organized GDR (OGDR) to organize channels belonging to the same attributes together for correct decomposition from features into attributes.
- Score: 18.44580501357929
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Object-Centric Learning (OCL) represents dense image or video pixels as sparse object features. Representative methods utilize discrete representation composed of Variational Autoencoder (VAE) template features to suppress pixel-level information redundancy and guide object-level feature aggregation. The most recent advancement, Grouped Discrete Representation (GDR), further decomposes these template features into attributes. However, its naive channel grouping as decomposition may erroneously group channels belonging to different attributes together and discretize them as sub-optimal template attributes, which losses information and harms expressivity. We propose Organized GDR (OGDR) to organize channels belonging to the same attributes together for correct decomposition from features into attributes. In unsupervised segmentation experiments, OGDR is fully superior to GDR in augmentating classical transformer-based OCL methods; it even improves state-of-the-art diffusion-based ones. Codebook PCA and representation similarity analyses show that compared with GDR, our OGDR eliminates redundancy and preserves information better for guiding object representation learning. The source code is available in the supplementary material.
Related papers
- Grouped Discrete Representation for Object-Centric Learning [18.44580501357929]
We propose textitGroup Discrete Representation (GDR) for Object-Centric Learning.
GDR decomposes features into attributes via organized channel grouping, and composes these attributes into discrete representation via indexes.
arXiv Detail & Related papers (2024-11-04T17:25:10Z) - Grouped Discrete Representation Guides Object-Centric Learning [18.44580501357929]
Transformer-based Object-Centric Discrete Learning can abstract dense images or textures into sparse object-level features.
We propose textitGrouped Representation (GDR) to address these issues by grouping features into attributes and indexing them with numbers.
arXiv Detail & Related papers (2024-07-01T19:00:40Z) - Attribute-Aware Deep Hashing with Self-Consistency for Large-Scale
Fine-Grained Image Retrieval [65.43522019468976]
We propose attribute-aware hashing networks with self-consistency for generating attribute-aware hash codes.
We develop an encoder-decoder structure network of a reconstruction task to unsupervisedly distill high-level attribute-specific vectors.
Our models are equipped with a feature decorrelation constraint upon these attribute vectors to strengthen their representative abilities.
arXiv Detail & Related papers (2023-11-21T08:20:38Z) - Attributes Grouping and Mining Hashing for Fine-Grained Image Retrieval [24.8065557159198]
We propose an Attributes Grouping and Mining Hashing (AGMH) for fine-grained image retrieval.
AGMH groups and embeds the category-specific visual attributes in multiple descriptors to generate a comprehensive feature representation.
AGMH consistently yields the best performance against state-of-the-art methods on fine-grained benchmark datasets.
arXiv Detail & Related papers (2023-11-10T14:01:56Z) - Triplet Contrastive Learning for Unsupervised Vehicle Re-identification [55.445358749042384]
Part feature learning is a critical technology for fine semantic understanding in vehicle re-identification.
We propose a novel Triplet Contrastive Learning framework (TCL) which leverages cluster features to bridge the part features and global features.
arXiv Detail & Related papers (2023-01-23T15:52:12Z) - Deep Diversity-Enhanced Feature Representation of Hyperspectral Images [87.47202258194719]
We rectify 3D convolution by modifying its topology to enhance the rank upper-bound.
We also propose a novel diversity-aware regularization (DA-Reg) term that acts on the feature maps to maximize independence among elements.
To demonstrate the superiority of the proposed Re$3$-ConvSet and DA-Reg, we apply them to various HS image processing and analysis tasks.
arXiv Detail & Related papers (2023-01-15T16:19:18Z) - Self-Supervised Learning Disentangled Group Representation as Feature [82.07737719232972]
We show that existing Self-Supervised Learning (SSL) only disentangles simple augmentation features such as rotation and colorization.
We propose an iterative SSL algorithm: Iterative Partition-based Invariant Risk Minimization (IP-IRM)
We prove that IP-IRM converges to a fully disentangled representation and show its effectiveness on various benchmarks.
arXiv Detail & Related papers (2021-10-28T16:12:33Z) - Invariant Deep Compressible Covariance Pooling for Aerial Scene
Categorization [80.55951673479237]
We propose a novel invariant deep compressible covariance pooling (IDCCP) to solve nuisance variations in aerial scene categorization.
We conduct extensive experiments on the publicly released aerial scene image data sets and demonstrate the superiority of this method compared with state-of-the-art methods.
arXiv Detail & Related papers (2020-11-11T11:13:07Z) - Image Clustering using an Augmented Generative Adversarial Network and
Information Maximization [9.614694312155798]
We propose a deep clustering framework consisting of a modified generative adversarial network (GAN) and an auxiliary classifier.
The proposed method significantly outperforms state-of-the-art clustering methods on CIFAR-10 and CIFAR-100, and is competitive on the STL10 and MNIST datasets.
arXiv Detail & Related papers (2020-11-08T22:20:33Z) - Representation Decomposition for Image Manipulation and Beyond [29.991777603295816]
decomposition-GAN (dec-GAN) is able to achieve the decomposition of an existing latent representation into content and attribute features.
Our experiments on multiple image datasets confirm the effectiveness and robustness of our dec-GAN over recent representation disentanglement models.
arXiv Detail & Related papers (2020-11-02T07:36:13Z) - Closed-Form Factorization of Latent Semantics in GANs [65.42778970898534]
A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images.
In this work, we examine the internal representation learned by GANs to reveal the underlying variation factors in an unsupervised manner.
We propose a closed-form factorization algorithm for latent semantic discovery by directly decomposing the pre-trained weights.
arXiv Detail & Related papers (2020-07-13T18:05:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.