Related papers: Bootstrapping Top-down Information for Self-modulating Slot Attention

Bootstrapping Top-down Information for Self-modulating Slot Attention

URL: http://arxiv.org/abs/2411.01801v2
Date: Fri, 08 Nov 2024 03:30:52 GMT
Title: Bootstrapping Top-down Information for Self-modulating Slot Attention
Authors: Dongwon Kim, Seoyeon Kim, Suha Kwak,
Abstract summary: We propose a novel OCL framework incorporating a top-down pathway. This pathway bootstraps the semantics of individual objects and then modulates the model to prioritize features relevant to these semantics. Our framework achieves state-of-the-art performance across multiple synthetic and real-world object-discovery benchmarks.
Score: 29.82550058869251
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Object-centric learning (OCL) aims to learn representations of individual objects within visual scenes without manual supervision, facilitating efficient and effective visual reasoning. Traditional OCL methods primarily employ bottom-up approaches that aggregate homogeneous visual features to represent objects. However, in complex visual environments, these methods often fall short due to the heterogeneous nature of visual features within an object. To address this, we propose a novel OCL framework incorporating a top-down pathway. This pathway first bootstraps the semantics of individual objects and then modulates the model to prioritize features relevant to these semantics. By dynamically modulating the model based on its own output, our top-down pathway enhances the representational quality of objects. Our framework achieves state-of-the-art performance across multiple synthetic and real-world object-discovery benchmarks.

Related papers

Are We Done with Object-Centric Learning? [65.67948794110212]
Object-centric learning (OCL) seeks to learn representations that only encode an object, isolated from other objects or background cues in a scene. With recent sample-efficient segmentation models, we can separate objects in the pixel space and encode them independently. We address the OOD generalization challenge caused by spurious background cues through the lens of OCL.
arXiv Detail & Related papers (2025-04-09T17:59:05Z)
v-CLR: View-Consistent Learning for Open-World Instance Segmentation [24.32192108470939]
Vanilla visual networks are biased toward learning appearance information, eg texture, to recognize objects. This implicit bias causes the model to fail in detecting novel objects with textures unseen in the open-world setting. We propose view-Consistent LeaRning (v-CLR), which aims to enforce the model to learn appearance-invariant representations for robust instance segmentation.
arXiv Detail & Related papers (2025-04-02T05:52:30Z)
Object-Aware DINO (Oh-A-Dino): Enhancing Self-Supervised Representations for Multi-Object Instance Retrieval [1.4272411349249627]
Self-supervised vision models like DINO have shown emergent object understanding. DINO representations excel at capturing global object attributes but struggle with object-level details like colour. We propose a method that combines global and local features by augmenting DINO representations with object-centric latent vectors.
arXiv Detail & Related papers (2025-03-12T21:57:41Z)
Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation [47.047267066525265]
We introduce a novel approach that incorporates object-level contextual knowledge within images. Our proposed approach achieves state-of-the-art performance with strong generalizability across diverse datasets.
arXiv Detail & Related papers (2024-11-26T06:34:48Z)
Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization. We introduce a benchmark comprising eight different synthetic and real-world datasets. We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z)
Learning Object-Centric Representation via Reverse Hierarchy Guidance [73.05170419085796]
Object-Centric Learning (OCL) seeks to enable Neural Networks to identify individual objects in visual scenes. RHGNet introduces a top-down pathway that works in different ways in the training and inference processes. Our model achieves SOTA performance on several commonly used datasets.
arXiv Detail & Related papers (2024-05-17T07:48:27Z)
Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner. We design a semantic-guided self-supervised learning model to extract high-level semantic features from images. We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z)
SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric Action Recognition [35.4163266882568]
We introduce Self-Supervised Learning Over Sets (SOS) to pre-train a generic Objects In Contact (OIC) representation model. Our OIC significantly boosts the performance of multiple state-of-the-art video classification models.
arXiv Detail & Related papers (2022-04-10T23:27:19Z)
Synthesizing the Unseen for Zero-shot Object Detection [72.38031440014463]
We propose to synthesize visual features for unseen classes, so that the model learns both seen and unseen objects in the visual domain. We use a novel generative model that uses class-semantics to not only generate the features but also to discriminatively separate them.
arXiv Detail & Related papers (2020-10-19T12:36:11Z)
Look-into-Object: Self-supervised Structure Modeling for Object Recognition [71.68524003173219]
We propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions. We show the recognition backbone can be substantially enhanced for more robust representation learning. Our approach achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft)
arXiv Detail & Related papers (2020-03-31T12:22:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.