Fast Concept Mapping: The Emergence of Human Abilities in Artificial
Neural Networks when Learning Embodied and Self-Supervised
- URL: http://arxiv.org/abs/2102.02153v1
- Date: Wed, 3 Feb 2021 17:19:49 GMT
- Title: Fast Concept Mapping: The Emergence of Human Abilities in Artificial
Neural Networks when Learning Embodied and Self-Supervised
- Authors: Viviane Clay, Peter K\"onig, Gordon Pipa, Kai-Uwe K\"uhnberger
- Abstract summary: We introduce a setup in which an artificial agent first learns in a simulated world through self-supervised exploration.
We use a method we call fast concept mapping which uses correlated firing patterns of neurons to define and detect semantic concepts.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most artificial neural networks used for object detection and recognition are
trained in a fully supervised setup. This is not only very resource consuming
as it requires large data sets of labeled examples but also very different from
how humans learn. We introduce a setup in which an artificial agent first
learns in a simulated world through self-supervised exploration. Following
this, the representations learned through interaction with the world can be
used to associate semantic concepts such as different types of doors. To do
this, we use a method we call fast concept mapping which uses correlated firing
patterns of neurons to define and detect semantic concepts. This association
works instantaneous with very few labeled examples, similar to what we observe
in humans in a phenomenon called fast mapping. Strikingly, this method already
identifies objects with as little as one labeled example which highlights the
quality of the encoding learned self-supervised through embodiment using
curiosity-driven exploration. It therefor presents a feasible strategy for
learning concepts without much supervision and shows that through pure
interaction with the world meaningful representations of an environment can be
learned.
Related papers
- Human-oriented Representation Learning for Robotic Manipulation [64.59499047836637]
Humans inherently possess generalizable visual representations that empower them to efficiently explore and interact with the environments in manipulation tasks.
We formalize this idea through the lens of human-oriented multi-task fine-tuning on top of pre-trained visual encoders.
Our Task Fusion Decoder consistently improves the representation of three state-of-the-art visual encoders for downstream manipulation policy-learning.
arXiv Detail & Related papers (2023-10-04T17:59:38Z) - ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds.
The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled.
The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z) - Navigating to Objects in the Real World [76.1517654037993]
We present a large-scale empirical study of semantic visual navigation methods comparing methods from classical, modular, and end-to-end learning approaches.
We find that modular learning works well in the real world, attaining a 90% success rate.
In contrast, end-to-end learning does not, dropping from 77% simulation to 23% real-world success rate due to a large image domain gap between simulation and reality.
arXiv Detail & Related papers (2022-12-02T01:10:47Z) - Multi-Object Navigation with dynamically learned neural implicit
representations [10.182418917501064]
We propose to structure neural networks with two neural implicit representations, which are learned dynamically during each episode.
We evaluate the agent on Multi-Object Navigation and show the high impact of using neural implicit representations as a memory source.
arXiv Detail & Related papers (2022-10-11T04:06:34Z) - Stochastic Coherence Over Attention Trajectory For Continuous Learning
In Video Streams [64.82800502603138]
This paper proposes a novel neural-network-based approach to progressively and autonomously develop pixel-wise representations in a video stream.
The proposed method is based on a human-like attention mechanism that allows the agent to learn by observing what is moving in the attended locations.
Our experiments leverage 3D virtual environments and they show that the proposed agents can learn to distinguish objects just by observing the video stream.
arXiv Detail & Related papers (2022-04-26T09:52:31Z) - SegDiscover: Visual Concept Discovery via Unsupervised Semantic
Segmentation [29.809900593362844]
SegDiscover is a novel framework that discovers semantically meaningful visual concepts from imagery datasets with complex scenes without supervision.
Our method generates concept primitives from raw images, discovering concepts by clustering in the latent space of a self-supervised pretrained encoder, and concept refinement via neural network smoothing.
arXiv Detail & Related papers (2022-04-22T20:44:42Z) - HAKE: A Knowledge Engine Foundation for Human Activity Understanding [65.24064718649046]
Human activity understanding is of widespread interest in artificial intelligence and spans diverse applications like health care and behavior analysis.
We propose a novel paradigm to reformulate this task in two stages: first mapping pixels to an intermediate space spanned by atomic activity primitives, then programming detected primitives with interpretable logic rules to infer semantics.
Our framework, the Human Activity Knowledge Engine (HAKE), exhibits superior generalization ability and performance upon challenging benchmarks.
arXiv Detail & Related papers (2022-02-14T16:38:31Z) - Learning from One and Only One Shot [15.835306446986856]
Humans can generalize from only a few examples and from little pretraining on similar tasks.
Motivated by nativism and artificial general intelligence, we model human-innate priors in abstract visual tasks.
We achieve human-level recognition with only $1$--$10$ examples per class and no pretraining.
arXiv Detail & Related papers (2022-01-14T08:11:21Z) - Understanding the Role of Individual Units in a Deep Neural Network [85.23117441162772]
We present an analytic framework to systematically identify hidden units within image classification and image generation networks.
First, we analyze a convolutional neural network (CNN) trained on scene classification and discover units that match a diverse set of object concepts.
Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes.
arXiv Detail & Related papers (2020-09-10T17:59:10Z) - Grounded Language Learning Fast and Slow [23.254765095715054]
We show that an embodied agent can exhibit similar one-shot word learning when trained with conventional reinforcement learning algorithms.
We find that, under certain training conditions, the agent's one-shot word-object binding generalizes to novel exemplars within the same ShapeNet category.
We further show how dual-coding memory can be exploited as a signal for intrinsic motivation, stimulating the agent to seek names for objects that may be useful for executing later instructions.
arXiv Detail & Related papers (2020-09-03T14:52:03Z) - Learning Intermediate Features of Object Affordances with a
Convolutional Neural Network [1.52292571922932]
We train a deep convolutional neural network (CNN) to recognize affordances from images and to learn the underlying features or the dimensionality of affordances.
We view this representational analysis as the first step towards a more formal account of how humans perceive and interact with the environment.
arXiv Detail & Related papers (2020-02-20T19:04:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.