Embrace Limited and Imperfect Training Datasets: Opportunities and
Challenges in Plant Disease Recognition Using Deep Learning
- URL: http://arxiv.org/abs/2305.11533v2
- Date: Fri, 28 Jul 2023 14:29:12 GMT
- Title: Embrace Limited and Imperfect Training Datasets: Opportunities and
Challenges in Plant Disease Recognition Using Deep Learning
- Authors: Mingle Xu and Hyongsuk Kim and Jucheng Yang and Alvaro Fuentes and Yao
Meng and Sook Yoon and Taehyun Kim and Dong Sun Park
- Abstract summary: We argue that embracing poor datasets is viable and aim to explicitly define the challenges associated with using these datasets.
Although our primary focus is on plant disease recognition, we emphasize that the principles of embracing and analyzing poor datasets are applicable to a wider range of domains, including agriculture.
- Score: 5.526950086166696
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent advancements in deep learning have brought significant improvements to
plant disease recognition. However, achieving satisfactory performance often
requires high-quality training datasets, which are challenging and expensive to
collect. Consequently, the practical application of current deep learning-based
methods in real-world scenarios is hindered by the scarcity of high-quality
datasets. In this paper, we argue that embracing poor datasets is viable and
aim to explicitly define the challenges associated with using these datasets.
To delve into this topic, we analyze the characteristics of high-quality
datasets, namely large-scale images and desired annotation, and contrast them
with the \emph{limited} and \emph{imperfect} nature of poor datasets.
Challenges arise when the training datasets deviate from these characteristics.
To provide a comprehensive understanding, we propose a novel and informative
taxonomy that categorizes these challenges. Furthermore, we offer a brief
overview of existing studies and approaches that address these challenges. We
believe that our paper sheds light on the importance of embracing poor
datasets, enhances the understanding of the associated challenges, and
contributes to the ambitious objective of deploying deep learning in real-world
applications. To facilitate the progress, we finally describe several
outstanding questions and point out potential future directions. Although our
primary focus is on plant disease recognition, we emphasize that the principles
of embracing and analyzing poor datasets are applicable to a wider range of
domains, including agriculture.
Related papers
- A Survey on Data Synthesis and Augmentation for Large Language Models [35.59526251210408]
This paper reviews and summarizes data generation techniques throughout the lifecycle of Large Language Models.
We discuss the current constraints faced by these methods and investigate potential pathways for future development and research.
arXiv Detail & Related papers (2024-10-16T16:12:39Z) - Deep Learning-Based Object Pose Estimation: A Comprehensive Survey [73.74933379151419]
We discuss the recent advances in deep learning-based object pose estimation.
Our survey also covers multiple input data modalities, degrees-of-freedom of output poses, object properties, and downstream tasks.
arXiv Detail & Related papers (2024-05-13T14:44:22Z) - Best Practices and Lessons Learned on Synthetic Data [83.63271573197026]
The success of AI models relies on the availability of large, diverse, and high-quality datasets.
Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns.
arXiv Detail & Related papers (2024-04-11T06:34:17Z) - Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey [17.19337964440007]
There is currently a lack of comprehensive review that summarizes and compares the key techniques, metrics, datasets, models, and optimization approaches in this research domain.
This survey aims to address this gap by consolidating recent progress in these areas, offering a thorough survey and taxonomy of the datasets, metrics, and methodologies utilized.
It identifies strengths, limitations, unexplored territories, and gaps in the existing literature, while providing some insights for future research directions in this vital and rapidly evolving field.
arXiv Detail & Related papers (2024-02-27T23:59:01Z) - Plant Disease Recognition Datasets in the Age of Deep Learning:
Challenges and Opportunities [1.9578088547147654]
This study explicitly propose an informative taxonomy to describe potential plant disease datasets.
We provide several directions for future, such as creating challenge-oriented datasets and the ultimate objective deploying deep learning in real-world applications with satisfactory performance.
arXiv Detail & Related papers (2023-12-13T05:24:36Z) - Image Synthesis under Limited Data: A Survey and Taxonomy [4.0989155767548375]
Deep generative models, which target reproducing the given data distribution to produce novel samples, have made unprecedented advancements in recent years.
When trained on limited data, generative models tend to suffer from severe performance deterioration due to overfitting and memorization.
This survey offers a comprehensive review and a novel taxonomy on the development of image synthesis under limited data.
arXiv Detail & Related papers (2023-07-31T17:45:16Z) - LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting [65.71129509623587]
Road traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning.
However, the promising results achieved on current public datasets may not be applicable to practical scenarios.
We introduce the LargeST benchmark dataset, which includes a total of 8,600 sensors in California with a 5-year time coverage.
arXiv Detail & Related papers (2023-06-14T05:48:36Z) - A Survey of Label-Efficient Deep Learning for 3D Point Clouds [109.07889215814589]
This paper presents the first comprehensive survey of label-efficient learning of point clouds.
We propose a taxonomy that organizes label-efficient learning methods based on the data prerequisites provided by different types of labels.
For each approach, we outline the problem setup and provide an extensive literature review that showcases relevant progress and challenges.
arXiv Detail & Related papers (2023-05-31T12:54:51Z) - A Survey on Dataset Distillation: Approaches, Applications and Future
Directions [4.906549881313351]
By synthesizing datasets with high information density, dataset distillation offers a range of potential applications.
We propose a taxonomy of dataset distillation, characterizing existing approaches, and then systematically reviewing the data modalities, and related applications.
arXiv Detail & Related papers (2023-05-03T08:41:37Z) - Self-Supervised Representation Learning: Introduction, Advances and
Challenges [125.38214493654534]
Self-supervised representation learning methods aim to provide powerful deep feature learning without the requirement of large annotated datasets.
This article introduces this vibrant area including key concepts, the four main families of approach and associated state of the art, and how self-supervised methods are applied to diverse modalities of data.
arXiv Detail & Related papers (2021-10-18T13:51:22Z) - Few-shot Partial Multi-view Learning [103.33865779721458]
We propose a new task called few-shot partial multi-view learning.
It focuses on overcoming the negative impact of the view-missing issue in the low-data regime.
We conduct extensive experiments to evaluate our method.
arXiv Detail & Related papers (2021-05-05T13:34:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.