Self-Supervised Masked Digital Elevation Models Encoding for
Low-Resource Downstream Tasks
- URL: http://arxiv.org/abs/2309.03367v1
- Date: Wed, 6 Sep 2023 21:20:10 GMT
- Title: Self-Supervised Masked Digital Elevation Models Encoding for
Low-Resource Downstream Tasks
- Authors: Priyam Mazumdar, Aiman Soliman, Volodymyr Kindratenko, Luigi Marini,
Kenton McHenry
- Abstract summary: GeoAI is uniquely poised to take advantage of the self-supervised methodology due to the decades of data collected.
The proposed architecture is the Masked Autoencoder pre-trained on ImageNet.
- Score: 0.6374763930914523
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The lack of quality labeled data is one of the main bottlenecks for training
Deep Learning models. As the task increases in complexity, there is a higher
penalty for overfitting and unstable learning. The typical paradigm employed
today is Self-Supervised learning, where the model attempts to learn from a
large corpus of unstructured and unlabeled data and then transfer that
knowledge to the required task. Some notable examples of self-supervision in
other modalities are BERT for Large Language Models, Wav2Vec for Speech
Recognition, and the Masked AutoEncoder for Vision, which all utilize
Transformers to solve a masked prediction task. GeoAI is uniquely poised to
take advantage of the self-supervised methodology due to the decades of data
collected, little of which is precisely and dependably annotated. Our goal is
to extract building and road segmentations from Digital Elevation Models (DEM)
that provide a detailed topography of the earths surface. The proposed
architecture is the Masked Autoencoder pre-trained on ImageNet (with the
limitation that there is a large domain discrepancy between ImageNet and DEM)
with an UperNet Head for decoding segmentations. We tested this model with 450
and 50 training images only, utilizing roughly 5% and 0.5% of the original data
respectively. On the building segmentation task, this model obtains an 82.1%
Intersection over Union (IoU) with 450 Images and 69.1% IoU with only 50
images. On the more challenging road detection task the model obtains an 82.7%
IoU with 450 images and 73.2% IoU with only 50 images. Any hand-labeled dataset
made today about the earths surface will be immediately obsolete due to the
constantly changing nature of the landscape. This motivates the clear necessity
for data-efficient learners that can be used for a wide variety of downstream
tasks.
Related papers
- UnSeg: One Universal Unlearnable Example Generator is Enough against All Image Segmentation [64.01742988773745]
An increasing privacy concern exists regarding training large-scale image segmentation models on unauthorized private data.
We exploit the concept of unlearnable examples to make images unusable to model training by generating and adding unlearnable noise into the original images.
We empirically verify the effectiveness of UnSeg across 6 mainstream image segmentation tasks, 10 widely used datasets, and 7 different network architectures.
arXiv Detail & Related papers (2024-10-13T16:34:46Z) - Self-Supervised Versus Supervised Training for Segmentation of Organoid
Images [2.6242820867975127]
Large amounts of microscopic image data sets remain unlabeled, preventing their effective exploitation using deep-learning algorithms.
Self-supervised learning (SSL) is a promising solution based on learning intrinsic features under a pretext task that is similar to the main task without requiring labels.
A ResNet50 U-Net was first trained to restore images of liver progenitor organoids from augmented images using the Structural Similarity Index Metric (SSIM), alone, and using SSIM combined with L1 loss.
For comparison, we used the same U-Net architecture to train two supervised models, one utilizing the ResNet50 encoder
arXiv Detail & Related papers (2023-11-19T01:57:55Z) - No Data Augmentation? Alternative Regularizations for Effective Training
on Small Datasets [0.0]
We study alternative regularization strategies to push the limits of supervised learning on small image classification datasets.
In particular, we employ a agnostic to select (semi) optimal learning rate and weight decay couples via the norm of model parameters.
We reach a test accuracy of 66.5%, on par with the best state-of-the-art methods.
arXiv Detail & Related papers (2023-09-04T16:13:59Z) - Delving Deeper into Data Scaling in Masked Image Modeling [145.36501330782357]
We conduct an empirical study on the scaling capability of masked image modeling (MIM) methods for visual recognition.
Specifically, we utilize the web-collected Coyo-700M dataset.
Our goal is to investigate how the performance changes on downstream tasks when scaling with different sizes of data and models.
arXiv Detail & Related papers (2023-05-24T15:33:46Z) - DINOv2: Learning Robust Visual Features without Supervision [75.42921276202522]
This work shows that existing pretraining methods, especially self-supervised methods, can produce such features if trained on enough curated data from diverse sources.
Most of the technical contributions aim at accelerating and stabilizing the training at scale.
In terms of data, we propose an automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data, as typically done in the self-supervised literature.
arXiv Detail & Related papers (2023-04-14T15:12:19Z) - Highly Accurate Dichotomous Image Segmentation [139.79513044546]
A new task called dichotomous image segmentation (DIS) aims to segment highly accurate objects from natural images.
We collect the first large-scale dataset, DIS5K, which contains 5,470 high-resolution (e.g., 2K, 4K or larger) images.
We also introduce a simple intermediate supervision baseline (IS-Net) using both feature-level and mask-level guidance for DIS model training.
arXiv Detail & Related papers (2022-03-06T20:09:19Z) - One Model is All You Need: Multi-Task Learning Enables Simultaneous
Histology Image Segmentation and Classification [3.8725005247905386]
We present a multi-task learning approach for segmentation and classification of tissue regions.
We enable simultaneous prediction with a single network.
As a result of feature sharing, we also show that the learned representation can be used to improve downstream tasks.
arXiv Detail & Related papers (2022-02-28T20:22:39Z) - Vision Models Are More Robust And Fair When Pretrained On Uncurated
Images Without Supervision [38.22842778742829]
Discriminative self-supervised learning allows training models on any random group of internet images.
We train models on billions of random images without any data pre-processing or prior assumptions about what we want the model to learn.
We extensively study and validate our model performance on over 50 benchmarks including fairness, to distribution shift, geographical diversity, fine grained recognition, image copy detection and many image classification datasets.
arXiv Detail & Related papers (2022-02-16T22:26:47Z) - Learning Transferable Visual Models From Natural Language Supervision [13.866297967166089]
Learning directly from raw text about images is a promising alternative.
We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn.
SOTA image representations are learned from scratch on a dataset of 400 million (image, text) pairs collected from the internet.
arXiv Detail & Related papers (2021-02-26T19:04:58Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z) - Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences
for Urban Scene Segmentation [57.68890534164427]
In this work, we ask if we may leverage semi-supervised learning in unlabeled video sequences and extra images to improve the performance on urban scene segmentation.
We simply predict pseudo-labels for the unlabeled data and train subsequent models with both human-annotated and pseudo-labeled data.
Our Naive-Student model, trained with such simple yet effective iterative semi-supervised learning, attains state-of-the-art results at all three Cityscapes benchmarks.
arXiv Detail & Related papers (2020-05-20T18:00:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.