Learning Object Permanence from Video
- URL: http://arxiv.org/abs/2003.10469v4
- Date: Thu, 16 Jul 2020 09:16:04 GMT
- Title: Learning Object Permanence from Video
- Authors: Aviv Shamsian, Ofri Kleinfeld, Amir Globerson, Gal Chechik
- Abstract summary: This paper introduces the setup of learning Object Permanence from data.
We explain why this learning problem should be dissected into four components, where objects are visible, (2) occluded, (3) contained by another object and (4) carried by a containing object.
We then present a unified deep architecture that learns to predict object location under these four scenarios.
- Score: 46.34427538905761
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object Permanence allows people to reason about the location of non-visible
objects, by understanding that they continue to exist even when not perceived
directly. Object Permanence is critical for building a model of the world,
since objects in natural visual scenes dynamically occlude and contain
each-other. Intensive studies in developmental psychology suggest that object
permanence is a challenging task that is learned through extensive experience.
Here we introduce the setup of learning Object Permanence from data. We explain
why this learning problem should be dissected into four components, where
objects are (1) visible, (2) occluded, (3) contained by another object and (4)
carried by a containing object. The fourth subtask, where a target object is
carried by a containing object, is particularly challenging because it requires
a system to reason about a moving location of an invisible object. We then
present a unified deep architecture that learns to predict object location
under these four scenarios. We evaluate the architecture and system on a new
dataset based on CATER, and find that it outperforms previous localization
methods and various baselines.
Related papers
- Interacted Object Grounding in Spatio-Temporal Human-Object Interactions [70.8859442754261]
We introduce a new open-world benchmark: Grounding Interacted Objects (GIO)
An object grounding task is proposed expecting vision systems to discover interacted objects.
We propose a 4D question-answering framework (4D-QA) to discover interacted objects from diverse videos.
arXiv Detail & Related papers (2024-12-27T09:08:46Z) - AffordanceLLM: Grounding Affordance from Vision Language Models [36.97072698640563]
Affordance grounding refers to the task of finding the area of an object with which one can interact.
Much of the knowledge is hidden and beyond the image content with the supervised labels from a limited training set.
We make an attempt to improve the generalization capability of the current affordance grounding by taking the advantage of the rich world, abstract, and human-object-interaction knowledge.
arXiv Detail & Related papers (2024-01-12T03:21:02Z) - The Background Also Matters: Background-Aware Motion-Guided Objects
Discovery [2.6442319761949875]
We propose a Background-aware Motion-guided Objects Discovery method.
We leverage masks of moving objects extracted from optical flow and design a learning mechanism to extend them to the true foreground.
This enables a joint learning of the objects discovery task and the object/non-object separation.
arXiv Detail & Related papers (2023-11-05T12:35:47Z) - Finding Fallen Objects Via Asynchronous Audio-Visual Integration [89.75296559813437]
This paper introduces a setting in which to study multi-modal object localization in 3D virtual environments.
An embodied robot agent, equipped with a camera and microphone, must determine what object has been dropped -- and where -- by combining audio and visual signals with knowledge of the underlying physics.
The dataset uses the ThreeDWorld platform which can simulate physics-based impact sounds and complex physical interactions between objects in a photorealistic setting.
arXiv Detail & Related papers (2022-07-07T17:59:59Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - Bi-directional Object-context Prioritization Learning for Saliency
Ranking [60.62461793691836]
Existing approaches focus on learning either object-object or object-scene relations.
We observe that spatial attention works concurrently with object-based attention in the human visual recognition system.
We propose a novel bi-directional method to unify spatial attention and object-based attention for saliency ranking.
arXiv Detail & Related papers (2022-03-17T16:16:03Z) - SafePicking: Learning Safe Object Extraction via Object-Level Mapping [19.502587411252946]
We present a system, SafePicking, that integrates object-level mapping and learning-based motion planning.
Planning is done by learning a deep Q-network that receives observations of predicted poses and a depth-based heightmap to output a motion trajectory.
Our results show that the observation fusion of poses and depth-sensing gives both better performance and robustness to the model.
arXiv Detail & Related papers (2022-02-11T18:55:10Z) - Contrastive Object Detection Using Knowledge Graph Embeddings [72.17159795485915]
We compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs.
We propose a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
arXiv Detail & Related papers (2021-12-21T17:10:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.