On the surprising similarities between supervised and self-supervised
models
- URL: http://arxiv.org/abs/2010.08377v1
- Date: Fri, 16 Oct 2020 13:28:13 GMT
- Title: On the surprising similarities between supervised and self-supervised
models
- Authors: Robert Geirhos, Kantharaju Narayanappa, Benjamin Mitzkus, Matthias
Bethge, Felix A. Wichmann, Wieland Brendel
- Abstract summary: We compare self-supervised networks to supervised models and human behaviour.
Current self-supervised CNNs share four key characteristics of their supervised counterparts.
We are hopeful that future self-supervised models behave differently from supervised ones.
- Score: 29.04088957917865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How do humans learn to acquire a powerful, flexible and robust representation
of objects? While much of this process remains unknown, it is clear that humans
do not require millions of object labels. Excitingly, recent algorithmic
advancements in self-supervised learning now enable convolutional neural
networks (CNNs) to learn useful visual object representations without
supervised labels, too. In the light of this recent breakthrough, we here
compare self-supervised networks to supervised models and human behaviour. We
tested models on 15 generalisation datasets for which large-scale human
behavioural data is available (130K highly controlled psychophysical trials).
Surprisingly, current self-supervised CNNs share four key characteristics of
their supervised counterparts: (1.) relatively poor noise robustness (with the
notable exception of SimCLR), (2.) non-human category-level error patterns,
(3.) non-human image-level error patterns (yet high similarity to supervised
model errors) and (4.) a bias towards texture. Taken together, these results
suggest that the strategies learned through today's supervised and
self-supervised training objectives end up being surprisingly similar, but
distant from human-like behaviour. That being said, we are clearly just at the
beginning of what could be called a self-supervised revolution of machine
vision, and we are hopeful that future self-supervised models behave
differently from supervised ones, and---perhaps---more similar to robust human
object recognition.
Related papers
- Aligning Machine and Human Visual Representations across Abstraction Levels [42.86478924838503]
Deep neural networks have achieved success across a wide range of applications, including as models of human behavior in vision tasks.
However, neural network training and human learning differ in fundamental ways, and neural networks often fail to generalize as robustly as humans do.
We highlight a key misalignment between vision models and humans: whereas human conceptual knowledge is hierarchically organized from fine- to coarse-scale distinctions, model representations do not accurately capture all these levels of abstraction.
To address this misalignment, we first train a teacher model to imitate human judgments, then transfer human-like structure from its representations into pretrained state-of-the
arXiv Detail & Related papers (2024-09-10T13:41:08Z) - Approaching human 3D shape perception with neurally mappable models [15.090436065092716]
Humans effortlessly infer the 3D shape of objects.
None of current computational models capture the human ability to match object shape across viewpoints.
This work provides a foundation for understanding human shape inferences within neurally mappable computational architectures.
arXiv Detail & Related papers (2023-08-22T09:29:05Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - Human alignment of neural network representations [22.671101285994013]
We investigate the factors that affect the alignment between the representations learned by neural networks and human mental representations inferred from behavioral responses.
We find that model scale and architecture have essentially no effect on the alignment with human behavioral responses.
We find that some human concepts such as food and animals are well-represented by neural networks whereas others such as royal or sports-related objects are not.
arXiv Detail & Related papers (2022-11-02T15:23:16Z) - Learning Theory of Mind via Dynamic Traits Attribution [59.9781556714202]
We propose a new neural ToM architecture that learns to generate a latent trait vector of an actor from the past trajectories.
This trait vector then multiplicatively modulates the prediction mechanism via a fast weights' scheme in the prediction neural network.
We empirically show that the fast weights provide a good inductive bias to model the character traits of agents and hence improves mindreading ability.
arXiv Detail & Related papers (2022-04-17T11:21:18Z) - Overcoming the Domain Gap in Neural Action Representations [60.47807856873544]
3D pose data can now be reliably extracted from multi-view video sequences without manual intervention.
We propose to use it to guide the encoding of neural action representations together with a set of neural and behavioral augmentations.
To reduce the domain gap, during training, we swap neural and behavioral data across animals that seem to be performing similar actions.
arXiv Detail & Related papers (2021-12-02T12:45:46Z) - Partial success in closing the gap between human and machine vision [30.78663978510427]
A few years ago, the first CNN surpassed human performance on ImageNet.
Here we ask: Are we making progress in closing the gap between human and machine vision?
We tested human observers on a broad range of out-of-distribution (OOD) datasets.
arXiv Detail & Related papers (2021-06-14T13:23:35Z) - Are Convolutional Neural Networks or Transformers more like human
vision? [9.83454308668432]
We show that attention-based networks can achieve higher accuracy than CNNs on vision tasks.
These results have implications both for building more human-like vision models, as well as for understanding visual object recognition in humans.
arXiv Detail & Related papers (2021-05-15T10:33:35Z) - Visual Distant Supervision for Scene Graph Generation [66.10579690929623]
Scene graph models usually require supervised learning on large quantities of labeled data with intensive human annotation.
We propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data.
Comprehensive experimental results show that our distantly supervised model outperforms strong weakly supervised and semi-supervised baselines.
arXiv Detail & Related papers (2021-03-29T06:35:24Z) - How Well Do Self-Supervised Models Transfer? [92.16372657233394]
We evaluate the transfer performance of 13 top self-supervised models on 40 downstream tasks.
We find ImageNet Top-1 accuracy to be highly correlated with transfer to many-shot recognition.
No single self-supervised method dominates overall, suggesting that universal pre-training is still unsolved.
arXiv Detail & Related papers (2020-11-26T16:38:39Z) - Self-Supervised Viewpoint Learning From Image Collections [116.56304441362994]
We propose a novel learning framework which incorporates an analysis-by-synthesis paradigm to reconstruct images in a viewpoint aware manner.
We show that our approach performs competitively to fully-supervised approaches for several object categories like human faces, cars, buses, and trains.
arXiv Detail & Related papers (2020-04-03T22:01:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.