Related papers: Self-supervised Learning with Fully Convolutional Networks

Self-supervised Learning with Fully Convolutional Networks

URL: http://arxiv.org/abs/2012.10017v1
Date: Fri, 18 Dec 2020 02:31:28 GMT
Title: Self-supervised Learning with Fully Convolutional Networks
Authors: Zhengeng Yang, Hongshan Yu, Yong He, Zhi-Hong Mao, Ajmal Mian
Abstract summary: We focus on the problem of learning representation from unlabeled data for semantic segmentation. Inspired by two patch-based methods, we develop a novel self-supervised learning framework. We achieve a 5.8 percentage point improvement over the baseline model.
Score: 24.660086792201263
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although deep learning based methods have achieved great success in many computer vision tasks, their performance relies on a large number of densely annotated samples that are typically difficult to obtain. In this paper, we focus on the problem of learning representation from unlabeled data for semantic segmentation. Inspired by two patch-based methods, we develop a novel self-supervised learning framework by formulating the Jigsaw Puzzle problem as a patch-wise classification process and solving it with a fully convolutional network. By learning to solve a Jigsaw Puzzle problem with 25 patches and transferring the learned features to semantic segmentation task on Cityscapes dataset, we achieve a 5.8 percentage point improvement over the baseline model that initialized from random values. Moreover, experiments show that our self-supervised learning method can be applied to different datasets and models. In particular, we achieved competitive performance with the state-of-the-art methods on the PASCAL VOC2012 dataset using significant fewer training images.

Related papers

Limits of Transformer Language Models on Learning to Compose Algorithms [77.2443883991608]
We evaluate training LLaMA models and prompting GPT-4 and Gemini on four tasks demanding to learn a composition of several discrete sub-tasks. Our results indicate that compositional learning in state-of-the-art Transformer language models is highly sample inefficient.
arXiv Detail & Related papers (2024-02-08T16:23:29Z)
DiverseNet: Decision Diversified Semi-supervised Semantic Segmentation Networks for Remote Sensing Imagery [17.690698736544626]
We propose DiverseNet which explores multi-head and multi-model semi-supervised learning algorithms by simultaneously enhancing precision and diversity during training. The two proposed methods in the DiverseNet family, namely DiverseHead and DiverseModel, both achieve the better semantic segmentation performance in four widely utilised remote sensing imagery data sets.
arXiv Detail & Related papers (2023-11-22T22:20:10Z)
Dynamic Task and Weight Prioritization Curriculum Learning for Multimodal Imagery [0.5439020425819]
This paper explores post-disaster analytics using multimodal deep learning models trained with curriculum learning method. Curriculum learning emulates the progressive learning sequence in human education by training deep learning models on increasingly complex data.
arXiv Detail & Related papers (2023-10-29T18:46:33Z)
BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning [93.38239238988719]
We propose to enable deep neural networks with the ability to learn the sample relationships from each mini-batch. BatchFormer is applied into the batch dimension of each mini-batch to implicitly explore sample relationships during training. We perform extensive experiments on over ten datasets and the proposed method achieves significant improvements on different data scarcity applications.
arXiv Detail & Related papers (2022-03-03T05:31:33Z)
Large-scale Unsupervised Semantic Segmentation [163.3568726730319]
We propose a new problem of large-scale unsupervised semantic segmentation (LUSS) with a newly created benchmark dataset to track the research progress. Based on the ImageNet dataset, we propose the ImageNet-S dataset with 1.2 million training images and 40k high-quality semantic segmentation annotations for evaluation.
arXiv Detail & Related papers (2021-06-06T15:02:11Z)
Distribution Alignment: A Unified Framework for Long-tail Visual Recognition [52.36728157779307]
We propose a unified distribution alignment strategy for long-tail visual recognition. We then introduce a generalized re-weight method in the two-stage learning to balance the class prior. Our approach achieves the state-of-the-art results across all four recognition tasks with a simple and unified framework.
arXiv Detail & Related papers (2021-03-30T14:09:53Z)
Learning Visual Representations for Transfer Learning by Suppressing Texture [38.901410057407766]
In self-supervised learning, texture as a low-level cue may provide shortcuts that prevent the network from learning higher level representations. We propose to use classic methods based on anisotropic diffusion to augment training using images with suppressed texture. We empirically show that our method achieves state-of-the-art results on object detection and image classification.
arXiv Detail & Related papers (2020-11-03T18:27:03Z)
Region Comparison Network for Interpretable Few-shot Image Classification [97.97902360117368]
Few-shot image classification has been proposed to effectively use only a limited number of labeled examples to train models for new classes. We propose a metric learning based method named Region Comparison Network (RCN), which is able to reveal how few-shot learning works. We also present a new way to generalize the interpretability from the level of tasks to categories.
arXiv Detail & Related papers (2020-09-08T07:29:05Z)
Puzzle-AE: Novelty Detection in Images through Solving Puzzles [8.999416735254586]
U-Net is proved to be effective for this purpose but overfits on the training data if trained by just using reconstruction error similar to other AE-based frameworks. We show that training U-Nets based on this task is an effective remedy that prevents overfitting and facilitates learning beyond pixel-level features. We propose adversarial robust training as an effective automatic shortcut removal.
arXiv Detail & Related papers (2020-08-29T10:53:55Z)
One-Shot Object Detection without Fine-Tuning [62.39210447209698]
We introduce a two-stage model consisting of a first stage Matching-FCOS network and a second stage Structure-Aware Relation Module. We also propose novel training strategies that effectively improve detection performance. Our method exceeds the state-of-the-art one-shot performance consistently on multiple datasets.
arXiv Detail & Related papers (2020-05-08T01:59:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.