Grouped Discrete Representation Guides Object-Centric Learning
- URL: http://arxiv.org/abs/2407.01726v2
- Date: Wed, 02 Oct 2024 11:49:31 GMT
- Title: Grouped Discrete Representation Guides Object-Centric Learning
- Authors: Rongzhen Zhao, Vivienne Wang, Juho Kannala, Joni Pajarinen,
- Abstract summary: Transformer-based Object-Centric Discrete Learning can abstract dense images or textures into sparse object-level features.
We propose textitGrouped Representation (GDR) to address these issues by grouping features into attributes and indexing them with numbers.
- Score: 18.44580501357929
- License:
- Abstract: Similar to humans perceiving visual scenes as objects, Object-Centric Learning (OCL) can abstract dense images or videos into sparse object-level features. Transformer-based OCL handles complex textures well due to the decoding guidance of discrete representation, obtained by discretizing noisy features in image or video feature maps using template features from a codebook. However, treating features as minimal units overlooks their composing attributes, thus impeding model generalization; indexing features with natural numbers loses attribute-level commonalities and characteristics, thus diminishing heuristics for model convergence. We propose \textit{Grouped Discrete Representation} (GDR) to address these issues by grouping features into attributes and indexing them with tuple numbers. In extensive experiments across different query initializations, dataset modalities, and model architectures, GDR consistently improves convergence and generalizability. Visualizations show that our method effectively captures attribute-level information in features. The source code will be available upon acceptance.
Related papers
- Grouped Discrete Representation for Object-Centric Learning [18.44580501357929]
We propose textitGroup Discrete Representation (GDR) for Object-Centric Learning.
GDR decomposes features into attributes via organized channel grouping, and composes these attributes into discrete representation via indexes.
arXiv Detail & Related papers (2024-11-04T17:25:10Z) - Organized Grouped Discrete Representation for Object-Centric Learning [18.44580501357929]
Representative methods suppress pixel-level information redundancy and guide object-level feature aggregation.
The most recent advancement, Grouped Discrete Representation (GDR), further decomposes these template features into attributes.
We propose Organized GDR (OGDR) to organize channels belonging to the same attributes together for correct decomposition from features into attributes.
arXiv Detail & Related papers (2024-09-05T14:13:05Z) - Cross-composition Feature Disentanglement for Compositional Zero-shot Learning [49.919635694894204]
Disentanglement of visual features of primitives (i.e., attributes and objects) has shown exceptional results in Compositional Zero-shot Learning (CZSL)
We propose the solution of cross-composition feature disentanglement, which takes multiple primitive-sharing compositions as inputs and constrains the disentangled primitive features to be general across these compositions.
arXiv Detail & Related papers (2024-08-19T08:23:09Z) - Attribute-Aware Deep Hashing with Self-Consistency for Large-Scale
Fine-Grained Image Retrieval [65.43522019468976]
We propose attribute-aware hashing networks with self-consistency for generating attribute-aware hash codes.
We develop an encoder-decoder structure network of a reconstruction task to unsupervisedly distill high-level attribute-specific vectors.
Our models are equipped with a feature decorrelation constraint upon these attribute vectors to strengthen their representative abilities.
arXiv Detail & Related papers (2023-11-21T08:20:38Z) - Attributes Grouping and Mining Hashing for Fine-Grained Image Retrieval [24.8065557159198]
We propose an Attributes Grouping and Mining Hashing (AGMH) for fine-grained image retrieval.
AGMH groups and embeds the category-specific visual attributes in multiple descriptors to generate a comprehensive feature representation.
AGMH consistently yields the best performance against state-of-the-art methods on fine-grained benchmark datasets.
arXiv Detail & Related papers (2023-11-10T14:01:56Z) - Exploring Fine-Grained Representation and Recomposition for Cloth-Changing Person Re-Identification [78.52704557647438]
We propose a novel FIne-grained Representation and Recomposition (FIRe$2$) framework to tackle both limitations without any auxiliary annotation or data.
Experiments demonstrate that FIRe$2$ can achieve state-of-the-art performance on five widely-used cloth-changing person Re-ID benchmarks.
arXiv Detail & Related papers (2023-08-21T12:59:48Z) - Hierarchical Visual Primitive Experts for Compositional Zero-Shot
Learning [52.506434446439776]
Compositional zero-shot learning (CZSL) aims to recognize compositions with prior knowledge of known primitives (attribute and object)
We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues.
Our method achieves SoTA performance on several benchmarks, including MIT-States, C-GQA, and VAW-CZSL.
arXiv Detail & Related papers (2023-08-08T03:24:21Z) - Triplet Contrastive Learning for Unsupervised Vehicle Re-identification [55.445358749042384]
Part feature learning is a critical technology for fine semantic understanding in vehicle re-identification.
We propose a novel Triplet Contrastive Learning framework (TCL) which leverages cluster features to bridge the part features and global features.
arXiv Detail & Related papers (2023-01-23T15:52:12Z) - Learning Invariant Visual Representations for Compositional Zero-Shot
Learning [30.472541551048508]
Compositional Zero-Shot Learning (CZSL) aims to recognize novel compositions using knowledge learned from seen-object compositions in the training set.
We propose an invariant feature learning framework to align different domains at the representation and gradient levels.
Experiments on two CZSL benchmarks demonstrate that the proposed method significantly outperforms the previous state-of-the-art.
arXiv Detail & Related papers (2022-06-01T11:33:33Z) - Boosting Generative Zero-Shot Learning by Synthesizing Diverse Features
with Attribute Augmentation [21.72622601533585]
We propose a novel framework to boost Zero-Shot Learning (ZSL) by synthesizing diverse features.
This method uses augmented semantic attributes to train the generative model, so as to simulate the real distribution of visual features.
We evaluate the proposed model on four benchmark datasets, observing significant performance improvement against the state-of-the-art.
arXiv Detail & Related papers (2021-12-23T14:32:51Z) - Semantic Disentangling Generalized Zero-Shot Learning [50.259058462272435]
Generalized Zero-Shot Learning (GZSL) aims to recognize images from both seen and unseen categories.
In this paper, we propose a novel feature disentangling approach based on an encoder-decoder architecture.
The proposed model aims to distill quality semantic-consistent representations that capture intrinsic features of seen images.
arXiv Detail & Related papers (2021-01-20T05:46:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.