Semi-supervised Deep Learning for Image Classification with Distribution
Mismatch: A Survey
- URL: http://arxiv.org/abs/2203.00190v1
- Date: Tue, 1 Mar 2022 02:46:00 GMT
- Title: Semi-supervised Deep Learning for Image Classification with Distribution
Mismatch: A Survey
- Authors: Saul Calderon-Ramirez, Shengxiang Yang, David Elizondo
- Abstract summary: Deep learning models rely on the abundance of labelled observations to train a prospective model.
It is expensive to gather labelled observations of data, making the usage of deep learning models not ideal.
In many situations different unlabelled data sources might be available.
This raises the risk of a significant distribution mismatch between the labelled and unlabelled datasets.
- Score: 1.5469452301122175
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning methodologies have been employed in several different fields,
with an outstanding success in image recognition applications, such as material
quality control, medical imaging, autonomous driving, etc. Deep learning models
rely on the abundance of labelled observations to train a prospective model.
These models are composed of millions of parameters to estimate, increasing the
need of more training observations. Frequently it is expensive to gather
labelled observations of data, making the usage of deep learning models not
ideal, as the model might over-fit data. In a semi-supervised setting,
unlabelled data is used to improve the levels of accuracy and generalization of
a model with small labelled datasets. Nevertheless, in many situations
different unlabelled data sources might be available. This raises the risk of a
significant distribution mismatch between the labelled and unlabelled datasets.
Such phenomena can cause a considerable performance hit to typical
semi-supervised deep learning frameworks, which often assume that both labelled
and unlabelled datasets are drawn from similar distributions. Therefore, in
this paper we study the latest approaches for semi-supervised deep learning for
image recognition. Emphasis is made in semi-supervised deep learning models
designed to deal with a distribution mismatch between the labelled and
unlabelled datasets. We address open challenges with the aim to encourage the
community to tackle them, and overcome the high data demand of traditional deep
learning pipelines under real-world usage settings.
Related papers
- Granularity Matters in Long-Tail Learning [62.30734737735273]
We offer a novel perspective on long-tail learning, inspired by an observation: datasets with finer granularity tend to be less affected by data imbalance.
We introduce open-set auxiliary classes that are visually similar to existing ones, aiming to enhance representation learning for both head and tail classes.
To prevent the overwhelming presence of auxiliary classes from disrupting training, we introduce a neighbor-silencing loss.
arXiv Detail & Related papers (2024-10-21T13:06:21Z) - Continuous Contrastive Learning for Long-Tailed Semi-Supervised Recognition [50.61991746981703]
Current state-of-the-art LTSSL approaches rely on high-quality pseudo-labels for large-scale unlabeled data.
This paper introduces a novel probabilistic framework that unifies various recent proposals in long-tail learning.
We introduce a continuous contrastive learning method, CCL, extending our framework to unlabeled data using reliable and smoothed pseudo-labels.
arXiv Detail & Related papers (2024-10-08T15:06:10Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - Exploring Beyond Logits: Hierarchical Dynamic Labeling Based on Embeddings for Semi-Supervised Classification [49.09505771145326]
We propose a Hierarchical Dynamic Labeling (HDL) algorithm that does not depend on model predictions and utilizes image embeddings to generate sample labels.
Our approach has the potential to change the paradigm of pseudo-label generation in semi-supervised learning.
arXiv Detail & Related papers (2024-04-26T06:00:27Z) - PixelDINO: Semi-Supervised Semantic Segmentation for Detecting
Permafrost Disturbances [15.78884578132055]
We focus on the remote detection of retrogressive thaw slumps (RTS), a permafrost disturbance comparable to landslides induced by thawing.
We present a semi-supervised learning approach to train semantic segmentation models to detect RTS.
Our framework called PixelDINO is trained in parallel on labelled data as well as unlabelled data.
arXiv Detail & Related papers (2024-01-17T15:20:10Z) - Semi-Supervised Learning for hyperspectral images by non parametrically
predicting view assignment [25.198550162904713]
Hyperspectral image (HSI) classification is gaining a lot of momentum in present time because of high inherent spectral information within the images.
Recently, to effectively train the deep learning models with minimal labelled samples, the unlabeled samples are also being leveraged in self-supervised and semi-supervised setting.
In this work, we leverage the idea of semi-supervised learning to assist the discriminative self-supervised pretraining of the models.
arXiv Detail & Related papers (2023-06-19T14:13:56Z) - Self Training with Ensemble of Teacher Models [8.257085583227695]
In order to train robust deep learning models, large amounts of labelled data is required.
In the absence of such large repositories of labelled data, unlabeled data can be exploited for the same.
Semi-Supervised learning aims to utilize such unlabeled data for training classification models.
arXiv Detail & Related papers (2021-07-17T09:44:09Z) - Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines.
Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.