TOV: The Original Vision Model for Optical Remote Sensing Image
Understanding via Self-supervised Learning
- URL: http://arxiv.org/abs/2204.04716v1
- Date: Sun, 10 Apr 2022 16:25:05 GMT
- Title: TOV: The Original Vision Model for Optical Remote Sensing Image
Understanding via Self-supervised Learning
- Authors: Chao Tao, Ji Qia, Guo Zhang, Qing Zhu, Weipeng Lu, Haifeng Li
- Abstract summary: We propose textbfThe textbfOriginal textbfVision model (TOV) in remote sensing filed.
Trained by massive unlabeled optical data along a human-like self-supervised learning path, TOV model can be easily adapted to various RSIU tasks.
We analyze the influences of two key factors on the performance of building TOV model for RSIU, including the influence of using different data sampling methods.
- Score: 13.57667361338603
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Do we on the right way for remote sensing image understanding (RSIU) by
training models via supervised data-dependent and task-dependent way, instead
of human vision in a label-free and task-independent way? We argue that a more
desirable RSIU model should be trained with intrinsic structure from data
rather that extrinsic human labels to realize generalizability across a wide
range of RSIU tasks. According to this hypothesis, we proposed \textbf{T}he
\textbf{O}riginal \textbf{V}ision model (TOV) in remote sensing filed. Trained
by massive unlabeled optical data along a human-like self-supervised learning
(SSL) path that is from general knowledge to specialized knowledge, TOV model
can be easily adapted to various RSIU tasks, including scene classification,
object detection, and semantic segmentation, and outperforms dominant ImageNet
supervised pretrained method as well as two recently proposed SSL pretrained
methods on majority of 12 publicly available benchmarks. Moreover, we analyze
the influences of two key factors on the performance of building TOV model for
RSIU, including the influence of using different data sampling methods and the
selection of learning paths during self-supervised optimization. We believe
that a general model which is trained by a label-free and task-independent way
may be the next paradigm for RSIU and hope the insights distilled from this
study can help to foster the development of an original vision model for RSIU.
Related papers
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension [131.14381425260706]
We introduce Self-Training on Image (STIC), which emphasizes a self-training approach specifically for image comprehension.
First, the model self-constructs a preference for image descriptions using unlabeled images.
To further self-improve reasoning on the extracted visual information, we let the model reuse a small portion of existing instruction-tuning data.
arXiv Detail & Related papers (2024-05-30T05:53:49Z) - MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks.
transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.
We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection.
Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z) - Robust Training of Federated Models with Extremely Label Deficiency [84.00832527512148]
Federated semi-supervised learning (FSSL) has emerged as a powerful paradigm for collaboratively training machine learning models using distributed data with label deficiency.
We propose a novel twin-model paradigm, called Twin-sight, designed to enhance mutual guidance by providing insights from different perspectives of labeled and unlabeled data.
Our comprehensive experiments on four benchmark datasets provide substantial evidence that Twin-sight can significantly outperform state-of-the-art methods across various experimental settings.
arXiv Detail & Related papers (2024-02-22T10:19:34Z) - A Probabilistic Model Behind Self-Supervised Learning [53.64989127914936]
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels.
We present a generative latent variable model for self-supervised learning.
We show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations.
arXiv Detail & Related papers (2024-02-02T13:31:17Z) - Revisiting Self-supervised Learning of Speech Representation from a
Mutual Information Perspective [68.20531518525273]
We take a closer look into existing self-supervised methods of speech from an information-theoretic perspective.
We use linear probes to estimate the mutual information between the target information and learned representations.
We explore the potential of evaluating representations in a self-supervised fashion, where we estimate the mutual information between different parts of the data without using any labels.
arXiv Detail & Related papers (2024-01-16T21:13:22Z) - In-Domain Self-Supervised Learning Improves Remote Sensing Image Scene
Classification [5.323049242720532]
Self-supervised learning has emerged as a promising approach for remote sensing image classification.
We present a study of different self-supervised pre-training strategies and evaluate their effect across 14 downstream datasets.
arXiv Detail & Related papers (2023-07-04T10:57:52Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Self-Supervised Visual Representation Learning Using Lightweight
Architectures [0.0]
In self-supervised learning, a model is trained to solve a pretext task, using a data set whose annotations are created by a machine.
We critically examine the most notable pretext tasks to extract features from image data.
We study the performance of various self-supervised techniques keeping all other parameters uniform.
arXiv Detail & Related papers (2021-10-21T14:13:10Z) - Two-Level Adversarial Visual-Semantic Coupling for Generalized Zero-shot
Learning [21.89909688056478]
We propose a new two-level joint idea to augment the generative network with an inference network during training.
This provides strong cross-modal interaction for effective transfer of knowledge between visual and semantic domains.
We evaluate our approach on four benchmark datasets against several state-of-the-art methods, and show its performance.
arXiv Detail & Related papers (2020-07-15T15:34:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.