Object Files and Schemata: Factorizing Declarative and Procedural
Knowledge in Dynamical Systems
- URL: http://arxiv.org/abs/2006.16225v5
- Date: Fri, 13 Nov 2020 01:47:12 GMT
- Title: Object Files and Schemata: Factorizing Declarative and Procedural
Knowledge in Dynamical Systems
- Authors: Anirudh Goyal, Alex Lamb, Phanideep Gampa, Philippe Beaudoin, Sergey
Levine, Charles Blundell, Yoshua Bengio, Michael Mozer
- Abstract summary: Black-box models with a monolithic hidden state often fail to apply procedural knowledge consistently and uniformly.
We address this issue via an architecture that factorizes declarative and procedural knowledge.
- Score: 135.10772866688404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modeling a structured, dynamic environment like a video game requires keeping
track of the objects and their states declarative knowledge) as well as
predicting how objects behave (procedural knowledge). Black-box models with a
monolithic hidden state often fail to apply procedural knowledge consistently
and uniformly, i.e., they lack systematicity. For example, in a video game,
correct prediction of one enemy's trajectory does not ensure correct prediction
of another's. We address this issue via an architecture that factorizes
declarative and procedural knowledge and that imposes modularity within each
form of knowledge. The architecture consists of active modules called object
files that maintain the state of a single object and invoke passive external
knowledge sources called schemata that prescribe state updates. To use a video
game as an illustration, two enemies of the same type will share schemata but
will have separate object files to encode their distinct state (e.g., health,
position). We propose to use attention to determine which object files to
update, the selection of schemata, and the propagation of information between
object files. The resulting architecture is a drop-in replacement conforming to
the same input-output interface as normal recurrent networks (e.g., LSTM, GRU)
yet achieves substantially better generalization on environments that have
multiple object tokens of the same type, including a challenging intuitive
physics benchmark.
Related papers
- Do Pre-trained Vision-Language Models Encode Object States? [13.4206464539947]
We investigate if vision-language models (VLMs) learn to encode object states on web-scale data.
We evaluate nine open-source VLMs, including models trained with contrastive and generative objectives.
We identify three areas for improvements for better encode object states.
arXiv Detail & Related papers (2024-09-16T17:22:18Z) - Learning State-Invariant Representations of Objects from Image Collections with State, Pose, and Viewpoint Changes [0.6577148087211809]
We present a novel dataset, ObjectsWithStateChange, that captures state and pose variations in the object images recorded from arbitrary viewpoints.
The goal of such research would be to train models capable of generating object embeddings that remain invariant to state changes.
We propose a curriculum learning strategy that uses the similarity relationships in the learned embedding space after each epoch to guide the training process.
arXiv Detail & Related papers (2024-04-09T17:17:48Z) - OSCaR: Object State Captioning and State Change Representation [52.13461424520107]
This paper introduces the Object State Captioning and State Change Representation (OSCaR) dataset and benchmark.
OSCaR consists of 14,084 annotated video segments with nearly 1,000 unique objects from various egocentric video collections.
It sets a new testbed for evaluating multimodal large language models (MLLMs)
arXiv Detail & Related papers (2024-02-27T01:48:19Z) - Tuning-less Object Naming with a Foundation Model [0.0]
We implement a real-time object naming system that enables learning a set of named entities never seen.
Our contribution is using the association mechanism known from transformers as attention.
As a result, the system can work in a one-shot manner and correctly name objects named in different contents.
arXiv Detail & Related papers (2023-11-03T09:11:49Z) - Contrastive Object Detection Using Knowledge Graph Embeddings [72.17159795485915]
We compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs.
We propose a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
arXiv Detail & Related papers (2021-12-21T17:10:21Z) - Object-Region Video Transformers [100.23380634952083]
We present Object-Region Transformers Video (ORViT), an emphobject-centric approach that extends transformer video layers with object representations.
Our ORViT block consists of two object-level streams: appearance and dynamics.
We show strong improvement in performance across all tasks and considered, demonstrating the value of a model that incorporates object representations into a transformer architecture.
arXiv Detail & Related papers (2021-10-13T17:51:46Z) - Learning visual policies for building 3D shape categories [130.7718618259183]
Previous work in this domain often assembles particular instances of objects from known sets of primitives.
We learn a visual policy to assemble other instances of the same category.
Our visual assembly policies are trained with no real images and reach up to 95% success rate when evaluated on a real robot.
arXiv Detail & Related papers (2020-04-15T17:29:10Z) - Look-into-Object: Self-supervised Structure Modeling for Object
Recognition [71.68524003173219]
We propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions.
We show the recognition backbone can be substantially enhanced for more robust representation learning.
Our approach achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft)
arXiv Detail & Related papers (2020-03-31T12:22:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.