Simple but Effective Unsupervised Classification for Specified Domain
Images: A Case Study on Fungi Images
- URL: http://arxiv.org/abs/2311.08995v1
- Date: Wed, 15 Nov 2023 14:33:22 GMT
- Title: Simple but Effective Unsupervised Classification for Specified Domain
Images: A Case Study on Fungi Images
- Authors: Zhaocong liu, Fa Zhang, Lin Cheng, Huanxi Deng, Xiaoyan Yang, Zhenyu
Zhang, and Chichun Zhou
- Abstract summary: High-quality labeled datasets are essential for deep learning.
Traditional manual annotation methods are costly and inefficient.
An unsupervised classification method with three key ideas is introduced.
- Score: 7.725818999035946
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High-quality labeled datasets are essential for deep learning. Traditional
manual annotation methods are not only costly and inefficient but also pose
challenges in specialized domains where expert knowledge is needed.
Self-supervised methods, despite leveraging unlabeled data for feature
extraction, still require hundreds or thousands of labeled instances to guide
the model for effective specialized image classification. Current unsupervised
learning methods offer automatic classification without prior annotation but
often compromise on accuracy. As a result, efficiently procuring high-quality
labeled datasets remains a pressing challenge for specialized domain images
devoid of annotated data. Addressing this, an unsupervised classification
method with three key ideas is introduced: 1) dual-step feature dimensionality
reduction using a pre-trained model and manifold learning, 2) a voting
mechanism from multiple clustering algorithms, and 3) post-hoc instead of prior
manual annotation. This approach outperforms supervised methods in
classification accuracy, as demonstrated with fungal image data, achieving
94.1% and 96.7% on public and private datasets respectively. The proposed
unsupervised classification method reduces dependency on pre-annotated
datasets, enabling a closed-loop for data classification. The simplicity and
ease of use of this method will also bring convenience to researchers in
various fields in building datasets, promoting AI applications for images in
specialized domains.
Related papers
- Domain Adaptive Multiple Instance Learning for Instance-level Prediction
of Pathological Images [45.132775668689604]
We propose a new task setting to improve the classification performance of the target dataset without increasing annotation costs.
In order to combine the supervisory information of both methods effectively, we propose a method to create pseudo-labels with high confidence.
arXiv Detail & Related papers (2023-04-07T08:31:06Z) - Out-of-Distribution Detection without Class Labels [29.606812876314386]
Anomaly detection methods identify samples that deviate from the normal behavior of the dataset.
Current methods struggle when faced with training data consisting of multiple classes but no labels.
We first cluster images using self-supervised methods and obtain a cluster label for every image.
We finetune pretrained features on the task of classifying images by their cluster labels.
arXiv Detail & Related papers (2021-12-14T18:58:32Z) - AutoNovel: Automatically Discovering and Learning Novel Visual
Categories [138.80332861066287]
We present a new approach called AutoNovel to tackle the problem of discovering novel classes in an image collection given labelled examples of other classes.
We evaluate AutoNovel on standard classification benchmarks and substantially outperform current methods for novel category discovery.
arXiv Detail & Related papers (2021-06-29T11:12:16Z) - SCARF: Self-Supervised Contrastive Learning using Random Feature
Corruption [72.35532598131176]
We propose SCARF, a technique for contrastive learning, where views are formed by corrupting a random subset of features.
We show that SCARF complements existing strategies and outperforms alternatives like autoencoders.
arXiv Detail & Related papers (2021-06-29T08:08:33Z) - Towards Good Practices for Efficiently Annotating Large-Scale Image
Classification Datasets [90.61266099147053]
We investigate efficient annotation strategies for collecting multi-class classification labels for a large collection of images.
We propose modifications and best practices aimed at minimizing human labeling effort.
Simulated experiments on a 125k image subset of the ImageNet100 show that it can be annotated to 80% top-1 accuracy with 0.35 annotations per image on average.
arXiv Detail & Related papers (2021-04-26T16:29:32Z) - Streaming Self-Training via Domain-Agnostic Unlabeled Images [62.57647373581592]
We present streaming self-training (SST) that aims to democratize the process of learning visual recognition models.
Key to SST are two crucial observations: (1) domain-agnostic unlabeled images enable us to learn better models with a few labeled examples without any additional knowledge or supervision; and (2) learning is a continuous process and can be done by constructing a schedule of learning updates.
arXiv Detail & Related papers (2021-04-07T17:58:39Z) - Semi-supervised Sparse Representation with Graph Regularization for
Image Classification [1.370633147306388]
We propose a discriminative semi-supervised sparse representation algorithm for image classification.
The proposed algorithm achieves excellent performances compared with related popular methods.
arXiv Detail & Related papers (2020-11-11T09:16:48Z) - SCAN: Learning to Classify Images without Labels [73.69513783788622]
We advocate a two-step approach where feature learning and clustering are decoupled.
A self-supervised task from representation learning is employed to obtain semantically meaningful features.
We obtain promising results on ImageNet, and outperform several semi-supervised learning methods in the low-data regime.
arXiv Detail & Related papers (2020-05-25T18:12:33Z) - Collaborative Learning of Semi-Supervised Clustering and Classification
for Labeling Uncurated Data [6.871887763122593]
Domain-specific image collections present potential value in various areas of science and business.
To employ contemporary supervised image analysis methods on such image data, they must first be cleaned and organized, and then manually labeled for the nomenclature employed in the specific domain.
We designed and implemented the Plud system to minimize the effort spent by an expert and handles realistic large collections of images.
arXiv Detail & Related papers (2020-03-09T17:03:05Z) - Automatically Discovering and Learning New Visual Categories with
Ranking Statistics [145.89790963544314]
We tackle the problem of discovering novel classes in an image collection given labelled examples of other classes.
We learn a general-purpose clustering model and use the latter to identify the new classes in the unlabelled data.
We evaluate our approach on standard classification benchmarks and outperform current methods for novel category discovery by a significant margin.
arXiv Detail & Related papers (2020-02-13T18:53:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.