Prediction of Scene Plausibility
- URL: http://arxiv.org/abs/2212.01470v2
- Date: Tue, 6 Dec 2022 08:49:40 GMT
- Title: Prediction of Scene Plausibility
- Authors: Or Nachmias, Ohad Fried and Ariel Shamir
- Abstract summary: Plausibility can be defined both in terms of physical properties and in terms of functional and typical arrangements.
We build a dataset of synthetic images containing both plausible and implausible scenes.
We test the success of various vision models in the task of recognizing and understanding plausibility.
- Score: 11.641785968519114
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding the 3D world from 2D images involves more than detection and
segmentation of the objects within the scene. It also includes the
interpretation of the structure and arrangement of the scene elements. Such
understanding is often rooted in recognizing the physical world and its
limitations, and in prior knowledge as to how similar typical scenes are
arranged. In this research we pose a new challenge for neural network (or
other) scene understanding algorithms - can they distinguish between plausible
and implausible scenes? Plausibility can be defined both in terms of physical
properties and in terms of functional and typical arrangements. Hence, we
define plausibility as the probability of encountering a given scene in the
real physical world. We build a dataset of synthetic images containing both
plausible and implausible scenes, and test the success of various vision models
in the task of recognizing and understanding plausibility.
Related papers
- Physically Plausible 3D Human-Scene Reconstruction from Monocular RGB
Image using an Adversarial Learning Approach [26.827712050966]
A key challenge in holistic 3D human-scene reconstruction is to generate a physically plausible 3D scene from a single monocular RGB image.
This paper proposes using an implicit feature representation of the scene elements to distinguish a physically plausible alignment of humans and objects.
Unlike the existing inference-time optimization-based approaches, we use this adversarially trained model to produce a per-frame 3D reconstruction of the scene.
arXiv Detail & Related papers (2023-07-27T01:07:15Z) - Understanding Cross-modal Interactions in V&L Models that Generate Scene
Descriptions [3.7957452405531256]
This paper explores the potential of a state-of-the-art Vision and Language model, VinVL, to caption images at the scene level.
We show (3) that a small amount of curated data suffices to generate scene descriptions without losing the capability to identify object-level concepts in the scene.
We discuss the parallels between these results and insights from computational and cognitive science research on scene perception.
arXiv Detail & Related papers (2022-11-09T15:33:51Z) - Compositional Law Parsing with Latent Random Functions [54.26307134687171]
We propose a deep latent variable model for Compositional LAw Parsing (CLAP)
CLAP achieves the human-like compositionality ability through an encoding-decoding architecture to represent concepts in the scene as latent variables.
Our experimental results demonstrate that CLAP outperforms the compared baseline methods in multiple visual tasks.
arXiv Detail & Related papers (2022-09-15T06:57:23Z) - Neural Groundplans: Persistent Neural Scene Representations from a
Single Image [90.04272671464238]
We present a method to map 2D image observations of a scene to a persistent 3D scene representation.
We propose conditional neural groundplans as persistent and memory-efficient scene representations.
arXiv Detail & Related papers (2022-07-22T17:41:24Z) - Recognizing Scenes from Novel Viewpoints [99.90914180489456]
Humans can perceive scenes in 3D from a handful of 2D views. For AI agents, the ability to recognize a scene from any viewpoint given only a few images enables them to efficiently interact with the scene and its objects.
We propose a model which takes as input a few RGB images of a new scene and recognizes the scene from novel viewpoints by segmenting it into semantic categories.
arXiv Detail & Related papers (2021-12-02T18:59:40Z) - Learning Object-Compositional Neural Radiance Field for Editable Scene
Rendering [42.37007176376849]
We present a novel neural scene rendering system, which learns an object-compositional neural radiance field and produces realistic rendering for a clustered and real-world scene.
To survive the training in heavily cluttered scenes, we propose a scene-guided training strategy to solve the 3D space ambiguity in the occluded regions and learn sharp boundaries for each object.
arXiv Detail & Related papers (2021-09-04T11:37:18Z) - Visiting the Invisible: Layer-by-Layer Completed Scene Decomposition [57.088328223220934]
Existing scene understanding systems mainly focus on recognizing the visible parts of a scene, ignoring the intact appearance of physical objects in the real-world.
In this work, we propose a higher-level scene understanding system to tackle both visible and invisible parts of objects and backgrounds in a given scene.
arXiv Detail & Related papers (2021-04-12T11:37:23Z) - GIRAFFE: Representing Scenes as Compositional Generative Neural Feature
Fields [45.21191307444531]
Deep generative models allow for photorealistic image synthesis at high resolutions.
But for many applications, this is not enough: content creation also needs to be controllable.
Our key hypothesis is that incorporating a compositional 3D scene representation into the generative model leads to more controllable image synthesis.
arXiv Detail & Related papers (2020-11-24T14:14:15Z) - Neural Scene Graphs for Dynamic Scenes [57.65413768984925]
We present the first neural rendering method that decomposes dynamic scenes into scene graphs.
We learn implicitly encoded scenes, combined with a jointly learned latent representation to describe objects with a single implicit function.
arXiv Detail & Related papers (2020-11-20T12:37:10Z) - Long-term Human Motion Prediction with Scene Context [60.096118270451974]
We propose a novel three-stage framework for predicting human motion.
Our method first samples multiple human motion goals, then plans 3D human paths towards each goal, and finally predicts 3D human pose sequences following each path.
arXiv Detail & Related papers (2020-07-07T17:59:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.