Collaborative Learning of Semi-Supervised Clustering and Classification
for Labeling Uncurated Data
- URL: http://arxiv.org/abs/2003.04261v1
- Date: Mon, 9 Mar 2020 17:03:05 GMT
- Title: Collaborative Learning of Semi-Supervised Clustering and Classification
for Labeling Uncurated Data
- Authors: Sara Mousavi, Dylan Lee, Tatianna Griffin, Dawnie Steadman, and Audris
Mockus
- Abstract summary: Domain-specific image collections present potential value in various areas of science and business.
To employ contemporary supervised image analysis methods on such image data, they must first be cleaned and organized, and then manually labeled for the nomenclature employed in the specific domain.
We designed and implemented the Plud system to minimize the effort spent by an expert and handles realistic large collections of images.
- Score: 6.871887763122593
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Domain-specific image collections present potential value in various areas of
science and business but are often not curated nor have any way to readily
extract relevant content. To employ contemporary supervised image analysis
methods on such image data, they must first be cleaned and organized, and then
manually labeled for the nomenclature employed in the specific domain, which is
a time consuming and expensive endeavor. To address this issue, we designed and
implemented the Plud system. Plud provides an iterative semi-supervised
workflow to minimize the effort spent by an expert and handles realistic large
collections of images. We believe it can support labeling datasets regardless
of their size and type. Plud is an iterative sequence of unsupervised
clustering, human assistance, and supervised classification. With each
iteration 1) the labeled dataset grows, 2) the generality of the classification
method and its accuracy increases, and 3) manual effort is reduced. We
evaluated the effectiveness of our system, by applying it on over a million
images documenting human decomposition. In our experiment comparing manual
labeling with labeling conducted with the support of Plud, we found that it
reduces the time needed to label data and produces highly accurate models for
this new domain.
Related papers
- Simple but Effective Unsupervised Classification for Specified Domain
Images: A Case Study on Fungi Images [7.725818999035946]
High-quality labeled datasets are essential for deep learning.
Traditional manual annotation methods are costly and inefficient.
An unsupervised classification method with three key ideas is introduced.
arXiv Detail & Related papers (2023-11-15T14:33:22Z) - BI-LAVA: Biocuration with Hierarchical Image Labeling through Active
Learning and Visual Analysis [2.859324824091085]
BI-LAVA is a system for organizing scientific images in hierarchical structures.
It uses a small set of image labels, a hierarchical set of image classifiers, and active learning to help model builders deal with incomplete ground-truth labels.
An evaluation shows that our mixed human-machine approach successfully supports domain experts in understanding the characteristics of classes within the taxonomy.
arXiv Detail & Related papers (2023-08-15T19:36:19Z) - Spatiotemporal Classification with limited labels using Constrained
Clustering for large datasets [22.117238467818623]
Separable representations can lead to supervised models with better classification capabilities.
We show how we can learn even better representation using a constrained loss with few labels.
We conclude by showing how our method, using few labels, can pick out new labeled samples from the unlabeled data, which can be used to augment supervised methods leading to better classification.
arXiv Detail & Related papers (2022-10-14T05:05:22Z) - SLRNet: Semi-Supervised Semantic Segmentation Via Label Reuse for Human
Decomposition Images [5.560471251954644]
We propose a semi-supervised method that reuses available labels for unlabeled images of a dataset by exploiting existing similarities.
We evaluate our method on a large dataset of human decomposition images and find that our method, while conceptually simple, outperforms state-of-the-art consistency.
arXiv Detail & Related papers (2022-02-24T04:58:02Z) - Mixed Supervision Learning for Whole Slide Image Classification [88.31842052998319]
We propose a mixed supervision learning framework for super high-resolution images.
During the patch training stage, this framework can make use of coarse image-level labels to refine self-supervised learning.
A comprehensive strategy is proposed to suppress pixel-level false positives and false negatives.
arXiv Detail & Related papers (2021-07-02T09:46:06Z) - Pseudo Pixel-level Labeling for Images with Evolving Content [5.573543601558405]
We propose a pseudo-pixel-level label generation technique to reduce the amount of effort for manual annotation of images.
We train two semantic segmentation models with VGG and ResNet backbones on images labeled using our pseudo labeling method and those of a state-of-the-art method.
The results indicate that using our pseudo-labels instead of those generated using the state-of-the-art method in the training process improves the mean-IoU and the frequency-weighted-IoU of the VGG and ResNet-based semantic segmentation models by 3.36%, 2.58%, 10
arXiv Detail & Related papers (2021-05-20T18:14:19Z) - Towards Good Practices for Efficiently Annotating Large-Scale Image
Classification Datasets [90.61266099147053]
We investigate efficient annotation strategies for collecting multi-class classification labels for a large collection of images.
We propose modifications and best practices aimed at minimizing human labeling effort.
Simulated experiments on a 125k image subset of the ImageNet100 show that it can be annotated to 80% top-1 accuracy with 0.35 annotations per image on average.
arXiv Detail & Related papers (2021-04-26T16:29:32Z) - Semantic Segmentation with Generative Models: Semi-Supervised Learning
and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels.
We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images.
We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z) - Streaming Self-Training via Domain-Agnostic Unlabeled Images [62.57647373581592]
We present streaming self-training (SST) that aims to democratize the process of learning visual recognition models.
Key to SST are two crucial observations: (1) domain-agnostic unlabeled images enable us to learn better models with a few labeled examples without any additional knowledge or supervision; and (2) learning is a continuous process and can be done by constructing a schedule of learning updates.
arXiv Detail & Related papers (2021-04-07T17:58:39Z) - Grafit: Learning fine-grained image representations with coarse labels [114.17782143848315]
This paper tackles the problem of learning a finer representation than the one provided by training labels.
By jointly leveraging the coarse labels and the underlying fine-grained latent space, it significantly improves the accuracy of category-level retrieval methods.
arXiv Detail & Related papers (2020-11-25T19:06:26Z) - Automatically Discovering and Learning New Visual Categories with
Ranking Statistics [145.89790963544314]
We tackle the problem of discovering novel classes in an image collection given labelled examples of other classes.
We learn a general-purpose clustering model and use the latter to identify the new classes in the unlabelled data.
We evaluate our approach on standard classification benchmarks and outperform current methods for novel category discovery by a significant margin.
arXiv Detail & Related papers (2020-02-13T18:53:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.