Learning Online Visual Invariances for Novel Objects via Supervised and
Self-Supervised Training
- URL: http://arxiv.org/abs/2110.01476v2
- Date: Thu, 7 Oct 2021 05:36:56 GMT
- Title: Learning Online Visual Invariances for Novel Objects via Supervised and
Self-Supervised Training
- Authors: Valerio Biscione and Jeffrey S. Bowers
- Abstract summary: This paper assesses whether standard CNNs can support human-like online invariance by training models to recognize images of synthetic 3D objects that undergo several transformations.
We show that standard supervised CNNs trained on transformed objects can acquire strong invariances on novel classes even when trained with as few as 50 objects taken from 10 classes.
- Score: 0.76146285961466
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Humans can identify objects following various spatial transformations such as
scale and viewpoint. This extends to novel objects, after a single presentation
at a single pose, sometimes referred to as online invariance. CNNs have been
proposed as a compelling model of human vision, but their ability to identify
objects across transformations is typically tested on held-out samples of
trained categories after extensive data augmentation. This paper assesses
whether standard CNNs can support human-like online invariance by training
models to recognize images of synthetic 3D objects that undergo several
transformations: rotation, scaling, translation, brightness, contrast, and
viewpoint. Through the analysis of models' internal representations, we show
that standard supervised CNNs trained on transformed objects can acquire strong
invariances on novel classes even when trained with as few as 50 objects taken
from 10 classes. This extended to a different dataset of photographs of real
objects. We also show that these invariances can be acquired in a
self-supervised way, through solving the same/different task. We suggest that
this latter approach may be similar to how humans acquire invariances.
Related papers
- Evaluating Multiview Object Consistency in Humans and Image Models [68.36073530804296]
We leverage an experimental design from the cognitive sciences which requires zero-shot visual inferences about object shape.
We collect 35K trials of behavioral data from over 500 participants.
We then evaluate the performance of common vision models.
arXiv Detail & Related papers (2024-09-09T17:59:13Z) - Finding Differences Between Transformers and ConvNets Using
Counterfactual Simulation Testing [82.67716657524251]
We present a counterfactual framework that allows us to study the robustness of neural networks with respect to naturalistic variations.
Our method allows for a fair comparison of the robustness of recently released, state-of-the-art Convolutional Neural Networks and Vision Transformers.
arXiv Detail & Related papers (2022-11-29T18:59:23Z) - Neural Novel Actor: Learning a Generalized Animatable Neural
Representation for Human Actors [98.24047528960406]
We propose a new method for learning a generalized animatable neural representation from a sparse set of multi-view imagery of multiple persons.
The learned representation can be used to synthesize novel view images of an arbitrary person from a sparse set of cameras, and further animate them with the user's pose control.
arXiv Detail & Related papers (2022-08-25T07:36:46Z) - TransformNet: Self-supervised representation learning through predicting
geometric transformations [0.8098097078441623]
We describe the unsupervised semantic feature learning approach for recognition of the geometric transformation applied to the input data.
The basic concept of our approach is that if someone is unaware of the objects in the images, he/she would not be able to quantitatively predict the geometric transformation that was applied to them.
arXiv Detail & Related papers (2022-02-08T22:41:01Z) - Quantised Transforming Auto-Encoders: Achieving Equivariance to
Arbitrary Transformations in Deep Networks [23.673155102696338]
Convolutional Neural Networks (CNNs) are equivariant to image translation.
We propose an auto-encoder architecture whose embedding obeys an arbitrary set of equivariance relations simultaneously.
We demonstrate results of successful re-rendering of transformed versions of input images on several datasets.
arXiv Detail & Related papers (2021-11-25T02:26:38Z) - Recognizing Actions in Videos from Unseen Viewpoints [80.6338404141284]
We show that current convolutional neural network models are unable to recognize actions from camera viewpoints not present in training data.
We introduce a new dataset for unseen view recognition and show the approaches ability to learn viewpoint invariant representations.
arXiv Detail & Related papers (2021-03-30T17:17:54Z) - Self-Supervised Representation Learning from Flow Equivariance [97.13056332559526]
We present a new self-supervised learning representation framework that can be directly deployed on a video stream of complex scenes.
Our representations, learned from high-resolution raw video, can be readily used for downstream tasks on static images.
arXiv Detail & Related papers (2021-01-16T23:44:09Z) - What Does CNN Shift Invariance Look Like? A Visualization Study [87.79405274610681]
Feature extraction with convolutional neural networks (CNNs) is a popular method to represent images for machine learning tasks.
We focus on measuring and visualizing the shift invariance of extracted features from popular off-the-shelf CNN models.
We conclude that features extracted from popular networks are not globally invariant, and that biases and artifacts exist within this variance.
arXiv Detail & Related papers (2020-11-09T01:16:30Z) - Demystifying Contrastive Self-Supervised Learning: Invariances,
Augmentations and Dataset Biases [34.02639091680309]
Recent gains in performance come from training instance classification models, treating each image and it's augmented versions as samples of a single class.
We demonstrate that approaches like MOCO and PIRL learn occlusion-invariant representations.
Second, we demonstrate that these approaches obtain further gains from access to a clean object-centric training dataset like Imagenet.
arXiv Detail & Related papers (2020-07-28T00:11:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.