Towards causal generative scene models via competition of experts
- URL: http://arxiv.org/abs/2004.12906v1
- Date: Mon, 27 Apr 2020 16:10:04 GMT
- Title: Towards causal generative scene models via competition of experts
- Authors: Julius von K\"ugelgen, Ivan Ustyuzhaninov, Peter Gehler, Matthias
Bethge, Bernhard Sch\"olkopf
- Abstract summary: We present an alternative approach which uses an inductive bias encouraging modularity by training an ensemble of generative models (experts)
During training, experts compete for explaining parts of a scene, and thus specialise on different object classes, with objects being identified as parts that re-occur across multiple scenes.
Our model allows for controllable sampling of individual objects and recombination of experts in physically plausible ways.
- Score: 26.181132737834826
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Learning how to model complex scenes in a modular way with recombinable
components is a pre-requisite for higher-order reasoning and acting in the
physical world. However, current generative models lack the ability to capture
the inherently compositional and layered nature of visual scenes. While recent
work has made progress towards unsupervised learning of object-based scene
representations, most models still maintain a global representation space
(i.e., objects are not explicitly separated), and cannot generate scenes with
novel object arrangement and depth ordering. Here, we present an alternative
approach which uses an inductive bias encouraging modularity by training an
ensemble of generative models (experts). During training, experts compete for
explaining parts of a scene, and thus specialise on different object classes,
with objects being identified as parts that re-occur across multiple scenes.
Our model allows for controllable sampling of individual objects and
recombination of experts in physically plausible ways. In contrast to other
methods, depth layering and occlusion are handled correctly, moving this
approach closer to a causal generative scene model. Experiments on simple toy
data qualitatively demonstrate the conceptual advantages of the proposed
approach.
Related papers
- Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - Robust and Controllable Object-Centric Learning through Energy-based
Models [95.68748828339059]
ours is a conceptually simple and general approach to learning object-centric representations through an energy-based model.
We show that ours can be easily integrated into existing architectures and can effectively extract high-quality object-centric representations.
arXiv Detail & Related papers (2022-10-11T15:11:15Z) - Conditional Object-Centric Learning from Video [34.012087337046005]
We introduce a sequential extension to Slot Attention to predict optical flow for realistic looking synthetic scenes.
We show that conditioning the initial state of this model on a small set of hints, such as center of mass of objects in the first frame, is sufficient to significantly improve instance segmentation.
These benefits generalize beyond the training distribution to novel objects, novel backgrounds, and to longer video sequences.
arXiv Detail & Related papers (2021-11-24T16:10:46Z) - Unsupervised Object Learning via Common Fate [61.14802390241075]
Learning generative object models from unlabelled videos is a long standing problem and required for causal scene modeling.
We decompose this problem into three easier subtasks, and provide candidate solutions for each of them.
We show that our approach allows learning generative models that generalize beyond the occlusions present in the input videos.
arXiv Detail & Related papers (2021-10-13T08:22:04Z) - Hierarchical Relational Inference [80.00374471991246]
We propose a novel approach to physical reasoning that models objects as hierarchies of parts that may locally behave separately, but also act more globally as a single whole.
Unlike prior approaches, our method learns in an unsupervised fashion directly from raw visual images.
It explicitly distinguishes multiple levels of abstraction and improves over a strong baseline at modeling synthetic and real-world videos.
arXiv Detail & Related papers (2020-10-07T20:19:10Z) - RELATE: Physically Plausible Multi-Object Scene Synthesis Using
Structured Latent Spaces [77.07767833443256]
We present RELATE, a model that learns to generate physically plausible scenes and videos of multiple interacting objects.
In contrast to state-of-the-art methods in object-centric generative modeling, RELATE also extends naturally to dynamic scenes and generates videos of high visual fidelity.
arXiv Detail & Related papers (2020-07-02T17:27:27Z) - Object-Centric Image Generation with Factored Depths, Locations, and
Appearances [30.541425619507184]
We present a generative model of images that explicitly reasons over the set of objects they show.
Our model learns a structured latent representation that separates objects from each other and from the background.
It can be trained from images alone in a purely unsupervised fashion without the need for object masks or depth information.
arXiv Detail & Related papers (2020-04-01T18:00:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.