Spotlight Attention: Robust Object-Centric Learning With a Spatial
Locality Prior
- URL: http://arxiv.org/abs/2305.19550v1
- Date: Wed, 31 May 2023 04:35:50 GMT
- Title: Spotlight Attention: Robust Object-Centric Learning With a Spatial
Locality Prior
- Authors: Ayush Chakravarthy, Trang Nguyen, Anirudh Goyal, Yoshua Bengio,
Michael C. Mozer
- Abstract summary: Object-centric vision aims to construct an explicit representation of the objects in a scene.
We incorporate a spatial-locality prior into state-of-the-art object-centric vision models.
We obtain significant improvements in segmenting objects in both synthetic and real-world datasets.
- Score: 88.9319150230121
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The aim of object-centric vision is to construct an explicit representation
of the objects in a scene. This representation is obtained via a set of
interchangeable modules called \emph{slots} or \emph{object files} that compete
for local patches of an image. The competition has a weak inductive bias to
preserve spatial continuity; consequently, one slot may claim patches scattered
diffusely throughout the image. In contrast, the inductive bias of human vision
is strong, to the degree that attention has classically been described with a
spotlight metaphor. We incorporate a spatial-locality prior into
state-of-the-art object-centric vision models and obtain significant
improvements in segmenting objects in both synthetic and real-world datasets.
Similar to human visual attention, the combination of image content and spatial
constraints yield robust unsupervised object-centric learning, including less
sensitivity to model hyperparameters.
Related papers
- Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - Seeing Objects in a Cluttered World: Computational Objectness from
Motion in Video [0.0]
Perception of the visually disjoint surfaces of our world as whole objects physically distinct from those overlapping them forms the basis of our visual perception.
We present a simple but novel approach to infer objectness from phenomenology without object models.
We show that it delivers robust perception of individual attended objects in cluttered scenes, even with blur and camera shake.
arXiv Detail & Related papers (2024-02-02T03:57:11Z) - Hyperbolic Contrastive Learning for Visual Representations beyond
Objects [30.618032825306187]
We focus on learning representations for objects and scenes that preserve the structure among them.
Motivated by the observation that visually similar objects are close in the representation space, we argue that the scenes and objects should instead follow a hierarchical structure.
arXiv Detail & Related papers (2022-12-01T16:58:57Z) - Compositional Scene Modeling with Global Object-Centric Representations [44.43366905943199]
Humans can easily identify the same object, even if occlusions exist, by completing the occluded parts based on its canonical image in the memory.
This paper proposes a compositional scene modeling method to infer global representations of canonical images of objects without any supervision.
arXiv Detail & Related papers (2022-11-21T14:36:36Z) - Robust and Controllable Object-Centric Learning through Energy-based
Models [95.68748828339059]
ours is a conceptually simple and general approach to learning object-centric representations through an energy-based model.
We show that ours can be easily integrated into existing architectures and can effectively extract high-quality object-centric representations.
arXiv Detail & Related papers (2022-10-11T15:11:15Z) - Bi-directional Object-context Prioritization Learning for Saliency
Ranking [60.62461793691836]
Existing approaches focus on learning either object-object or object-scene relations.
We observe that spatial attention works concurrently with object-based attention in the human visual recognition system.
We propose a novel bi-directional method to unify spatial attention and object-based attention for saliency ranking.
arXiv Detail & Related papers (2022-03-17T16:16:03Z) - Towards Self-Supervised Learning of Global and Object-Centric
Representations [4.36572039512405]
We discuss key aspects of learning structured object-centric representations with self-supervision.
We validate our insights through several experiments on the CLEVR dataset.
arXiv Detail & Related papers (2022-03-11T15:18:47Z) - Synthesizing the Unseen for Zero-shot Object Detection [72.38031440014463]
We propose to synthesize visual features for unseen classes, so that the model learns both seen and unseen objects in the visual domain.
We use a novel generative model that uses class-semantics to not only generate the features but also to discriminatively separate them.
arXiv Detail & Related papers (2020-10-19T12:36:11Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.