Can deep learning match the efficiency of human visual long-term memory
in storing object details?
- URL: http://arxiv.org/abs/2204.13061v2
- Date: Thu, 28 Apr 2022 16:09:38 GMT
- Title: Can deep learning match the efficiency of human visual long-term memory
in storing object details?
- Authors: A. Emin Orhan
- Abstract summary: Humans have a remarkably large capacity to store detailed visual information in long-term memory.
This paper asks whether deep learning via gradient descent can match the efficiency of human visual long-term memory.
- Score: 21.067139116005592
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans have a remarkably large capacity to store detailed visual information
in long-term memory even after a single exposure, as demonstrated by classic
experiments in psychology. For example, Standing (1973) showed that humans
could recognize with high accuracy thousands of pictures that they had seen
only once a few days prior to a recognition test. In deep learning, the primary
mode of incorporating new information into a model is through gradient descent
in the model's parameter space. This paper asks whether deep learning via
gradient descent can match the efficiency of human visual long-term memory to
incorporate new information in a rigorous, head-to-head, quantitative
comparison. We answer this in the negative: even in the best case, models
learning via gradient descent appear to require approximately 10 exposures to
the same visual materials in order to reach a recognition memory performance
humans achieve after only a single exposure. Prior knowledge induced via
pretraining and bigger model sizes improve performance, but these improvements
are not very visible after a single exposure (it takes a few exposures for the
improvements to become apparent), suggesting that simply scaling up the
pretraining data size or model size might not be enough for the model to reach
human-level memory efficiency.
Related papers
- Causal Estimation of Memorisation Profiles [58.20086589761273]
Understanding memorisation in language models has practical and societal implications.
Memorisation is the causal effect of training with an instance on the model's ability to predict that instance.
This paper proposes a new, principled, and efficient method to estimate memorisation based on the difference-in-differences design from econometrics.
arXiv Detail & Related papers (2024-06-06T17:59:09Z) - Learning Human Action Recognition Representations Without Real Humans [66.61527869763819]
We present a benchmark that leverages real-world videos with humans removed and synthetic data containing virtual humans to pre-train a model.
We then evaluate the transferability of the representation learned on this data to a diverse set of downstream action recognition benchmarks.
Our approach outperforms previous baselines by up to 5%.
arXiv Detail & Related papers (2023-11-10T18:38:14Z) - What do larger image classifiers memorise? [64.01325988398838]
We show that training examples exhibit an unexpectedly diverse set of memorisation trajectories across model sizes.
We find that knowledge distillation, an effective and popular model compression technique, tends to inhibit memorisation, while also improving generalisation.
arXiv Detail & Related papers (2023-10-09T01:52:07Z) - Recognition, recall, and retention of few-shot memories in large
language models [21.067139116005592]
We investigate simple recognition, recall, and retention experiments with large language models.
We find that a single exposure is generally sufficient for a model to achieve near perfect accuracy.
The flip side of this remarkable capacity for fast learning is that precise memories are quickly overwritten.
arXiv Detail & Related papers (2023-03-30T17:26:16Z) - Gestalt-Guided Image Understanding for Few-Shot Learning [19.83265038667386]
This paper introduces Gestalt psychology to few-shot learning and proposes a plug-and-play method called GGIU.
We design Totality-Guided Image Understanding and Closure-Guided Image Understanding to extract image features.
Our method can improve the performance of existing models effectively and flexibly without retraining or fine-tuning.
arXiv Detail & Related papers (2023-02-08T07:39:18Z) - On Data Scaling in Masked Image Modeling [36.00347416479826]
Masked image modeling (MIM) is suspected to be unable to benefit from larger data.
Data scales ranging from 10% of ImageNet-1K to full ImageNet-22K, model sizes ranging from 49 million to 1 billion, and training lengths ranging from 125K iterations to 500K iterations.
validation loss in pre-training is a good indicator to measure how well the model performs for fine-tuning on multiple tasks.
arXiv Detail & Related papers (2022-06-09T17:58:24Z) - A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental
Learning [56.450090618578]
Class-Incremental Learning (CIL) aims to train a model with limited memory size to meet this requirement.
We show that when counting the model size into the total budget and comparing methods with aligned memory size, saving models do not consistently work.
We propose a simple yet effective baseline, denoted as MEMO for Memory-efficient Expandable MOdel.
arXiv Detail & Related papers (2022-05-26T08:24:01Z) - Saliency Guided Experience Packing for Replay in Continual Learning [6.417011237981518]
We propose a new approach for experience replay, where we select the past experiences by looking at the saliency maps.
While learning a new task, we replay these memory patches with appropriate zero-padding to remind the model about its past decisions.
arXiv Detail & Related papers (2021-09-10T15:54:58Z) - STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data.
Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.