Related papers: CrossTransformers: spatially-aware few-shot transfer

CrossTransformers: spatially-aware few-shot transfer

URL: http://arxiv.org/abs/2007.11498v5
Date: Wed, 17 Feb 2021 18:05:48 GMT
Title: CrossTransformers: spatially-aware few-shot transfer
Authors: Carl Doersch, Ankush Gupta, Andrew Zisserman
Abstract summary: Given new tasks with very little data, modern vision systems degrade remarkably quickly. We show how the neural network representations which underpin modern vision systems are subject to supervision collapse. We propose self-supervised learning to encourage general-purpose features that transfer better.
Score: 92.33252608837947
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Given new tasks with very little data$-$such as new classes in a classification problem or a domain shift in the input$-$performance of modern vision systems degrades remarkably quickly. In this work, we illustrate how the neural network representations which underpin modern vision systems are subject to supervision collapse, whereby they lose any information that is not necessary for performing the training task, including information that may be necessary for transfer to new tasks or domains. We then propose two methods to mitigate this problem. First, we employ self-supervised learning to encourage general-purpose features that transfer better. Second, we propose a novel Transformer based neural network architecture called CrossTransformers, which can take a small number of labeled images and an unlabeled query, find coarse spatial correspondence between the query and the labeled images, and then infer class membership by computing distances between spatially-corresponding features. The result is a classifier that is more robust to task and domain shift, which we demonstrate via state-of-the-art performance on Meta-Dataset, a recent dataset for evaluating transfer from ImageNet to many other vision datasets.

Related papers

GoogLe2Net: Going Transverse with Convolutions [0.0]
We propose a novel CNN architecture called GoogLe2Net. It consists of residual feature-reutilization inceptions (ResFRI) or split residual feature-reutilization inceptions (Split-ResFRI) Our GoogLe2Net is able to reutilize information captured by foregoing groups of convolutional layers and express multi-scale features at a fine-grained level.
arXiv Detail & Related papers (2023-01-01T15:16:10Z)
Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts. We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query. Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z)
Feature Representation Learning for Unsupervised Cross-domain Image Retrieval [73.3152060987961]
Current supervised cross-domain image retrieval methods can achieve excellent performance. The cost of data collection and labeling imposes an intractable barrier to practical deployment in real applications. We introduce a new cluster-wise contrastive learning mechanism to help extract class semantic-aware features.
arXiv Detail & Related papers (2022-07-20T07:52:14Z)
Deep transfer learning for image classification: a survey [4.590533239391236]
Best practice for image classification is when large deep models can be trained on abundant labelled data. In these scenarios transfer learning can help improve performance. We present a new taxonomy of the applications of transfer learning for image classification.
arXiv Detail & Related papers (2022-05-20T00:03:39Z)
Evolving Architectures with Gradient Misalignment toward Low Adversarial Transferability [4.415977307120616]
We propose an architecture searching framework that employs neuroevolution to evolve network architectures. Our experiments show that the proposed framework successfully discovers architectures that reduce transferability from four standard networks. In addition, the evolved networks trained with gradient misalignment exhibit significantly lower transferability compared to standard networks trained with gradient misalignment.
arXiv Detail & Related papers (2021-09-13T12:41:53Z)
Do Vision Transformers See Like Convolutional Neural Networks? [45.69780772718875]
Recent work has shown that (Vision) Transformer models (ViT) can achieve comparable or even superior performance on image classification tasks. Are they acting like convolutional networks, or learning entirely different visual representations? We find striking differences between the two architectures, such as ViT having more uniform representations across all layers.
arXiv Detail & Related papers (2021-08-19T17:27:03Z)
Less is More: Pay Less Attention in Vision Transformers [61.05787583247392]
Less attention vIsion Transformer builds upon the fact that convolutions, fully-connected layers, and self-attentions have almost equivalent mathematical expressions for processing image patch sequences. The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation.
arXiv Detail & Related papers (2021-05-29T05:26:07Z)
Joint Learning of Neural Transfer and Architecture Adaptation for Image Recognition [77.95361323613147]
Current state-of-the-art visual recognition systems rely on pretraining a neural network on a large-scale dataset and finetuning the network weights on a smaller dataset. In this work, we prove that dynamically adapting network architectures tailored for each domain task along with weight finetuning benefits in both efficiency and effectiveness. Our method can be easily generalized to an unsupervised paradigm by replacing supernet training with self-supervised learning in the source domain tasks and performing linear evaluation in the downstream tasks.
arXiv Detail & Related papers (2021-03-31T08:15:17Z)
Adversarially-Trained Deep Nets Transfer Better: Illustration on Image Classification [53.735029033681435]
Transfer learning is a powerful methodology for adapting pre-trained deep neural networks on image recognition tasks to new domains. In this work, we demonstrate that adversarially-trained models transfer better than non-adversarially-trained models.
arXiv Detail & Related papers (2020-07-11T22:48:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.