Learning in Factored Domains with Information-Constrained Visual
Representations
- URL: http://arxiv.org/abs/2303.17508v1
- Date: Thu, 30 Mar 2023 16:22:10 GMT
- Title: Learning in Factored Domains with Information-Constrained Visual
Representations
- Authors: Tyler Malloy, Miao Liu, Matthew D. Riemer, Tim Klinger, Gerald
Tesauro, Chris R. Sims
- Abstract summary: We present a model of human factored representation learning based on an altered form of a $beta$-Variational Auto-encoder used in a visual learning task.
Results demonstrate a trade-off in the informational complexity of model latent dimension spaces, between the speed of learning and the accuracy of reconstructions.
- Score: 14.674830543204317
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans learn quickly even in tasks that contain complex visual information.
This is due in part to the efficient formation of compressed representations of
visual information, allowing for better generalization and robustness. However,
compressed representations alone are insufficient for explaining the high speed
of human learning. Reinforcement learning (RL) models that seek to replicate
this impressive efficiency may do so through the use of factored
representations of tasks. These informationally simplistic representations of
tasks are similarly motivated as the use of compressed representations of
visual information. Recent studies have connected biological visual perception
to disentangled and compressed representations. This raises the question of how
humans learn to efficiently represent visual information in a manner useful for
learning tasks. In this paper we present a model of human factored
representation learning based on an altered form of a $\beta$-Variational
Auto-encoder used in a visual learning task. Modelling results demonstrate a
trade-off in the informational complexity of model latent dimension spaces,
between the speed of learning and the accuracy of reconstructions.
Related papers
- What Makes Pre-Trained Visual Representations Successful for Robust
Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture.
We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z) - Visual Grounding Helps Learn Word Meanings in Low-Data Regimes [47.7950860342515]
Modern neural language models (LMs) are powerful tools for modeling human sentence production and comprehension.
But to achieve these results, LMs must be trained in distinctly un-human-like ways.
Do models trained more naturalistically -- with grounded supervision -- exhibit more humanlike language learning?
We investigate this question in the context of word learning, a key sub-task in language acquisition.
arXiv Detail & Related papers (2023-10-20T03:33:36Z) - Learning Transferable Pedestrian Representation from Multimodal
Information Supervision [174.5150760804929]
VAL-PAT is a novel framework that learns transferable representations to enhance various pedestrian analysis tasks with multimodal information.
We first perform pre-training on LUPerson-TA dataset, where each image contains text and attribute annotations.
We then transfer the learned representations to various downstream tasks, including person reID, person attribute recognition and text-based person search.
arXiv Detail & Related papers (2023-04-12T01:20:58Z) - Understanding Self-Supervised Pretraining with Part-Aware Representation
Learning [88.45460880824376]
We study the capability that self-supervised representation pretraining methods learn part-aware representations.
Results show that the fully-supervised model outperforms self-supervised models for object-level recognition.
arXiv Detail & Related papers (2023-01-27T18:58:42Z) - Entropy-driven Unsupervised Keypoint Representation Learning in Videos [7.940371647421243]
We present a novel approach for unsupervised learning of meaningful representations from videos.
We argue that textitlocal entropy of pixel neighborhoods and their temporal evolution create valuable intrinsic supervisory signals for learning prominent features.
Our empirical results show superior performance for our information-driven keypoints that resolve challenges like attendance to static and dynamic objects or objects abruptly entering and leaving the scene.
arXiv Detail & Related papers (2022-09-30T12:03:52Z) - A Benchmark for Compositional Visual Reasoning [5.576460160219606]
We introduce a novel visual reasoning benchmark, Compositional Visual Relations (CVR), to drive progress towards more data-efficient learning algorithms.
We take inspiration from fluidic intelligence and non-verbal reasoning tests and describe a novel method for creating compositions of abstract rules and associated image datasets at scale.
Our proposed benchmark includes measures of sample efficiency, generalization and transfer across task rules, as well as the ability to leverage compositionality.
arXiv Detail & Related papers (2022-06-11T00:04:49Z) - Task-Induced Representation Learning [14.095897879222672]
We evaluate the effectiveness of representation learning approaches for decision making in visually complex environments.
We find that representation learning generally improves sample efficiency on unseen tasks even in visually complex scenes.
arXiv Detail & Related papers (2022-04-25T17:57:10Z) - Curious Representation Learning for Embodied Intelligence [81.21764276106924]
Self-supervised representation learning has achieved remarkable success in recent years.
Yet to build truly intelligent agents, we must construct representation learning algorithms that can learn from environments.
We propose a framework, curious representation learning, which jointly learns a reinforcement learning policy and a visual representation model.
arXiv Detail & Related papers (2021-05-03T17:59:20Z) - Heterogeneous Contrastive Learning: Encoding Spatial Information for
Compact Visual Representations [183.03278932562438]
This paper presents an effective approach that adds spatial information to the encoding stage to alleviate the learning inconsistency between the contrastive objective and strong data augmentation operations.
We show that our approach achieves higher efficiency in visual representations and thus delivers a key message to inspire the future research of self-supervised visual representation learning.
arXiv Detail & Related papers (2020-11-19T16:26:25Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z) - Acceleration of Actor-Critic Deep Reinforcement Learning for Visual
Grasping in Clutter by State Representation Learning Based on Disentanglement
of a Raw Input Image [4.970364068620608]
Actor-critic deep reinforcement learning (RL) methods typically perform very poorly when grasping diverse objects.
We employ state representation learning (SRL), where we encode essential information first for subsequent use in RL.
We found that preprocessing based on the disentanglement of a raw input image is the key to effectively capturing a compact representation.
arXiv Detail & Related papers (2020-02-27T03:58:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.