Plant Disease Recognition Datasets in the Age of Deep Learning:
Challenges and Opportunities
- URL: http://arxiv.org/abs/2312.07905v1
- Date: Wed, 13 Dec 2023 05:24:36 GMT
- Title: Plant Disease Recognition Datasets in the Age of Deep Learning:
Challenges and Opportunities
- Authors: Mingle Xu and Ji Eun Park and Jaehwan Lee and Jucheng Yang and Sook
Yoon
- Abstract summary: This study explicitly propose an informative taxonomy to describe potential plant disease datasets.
We provide several directions for future, such as creating challenge-oriented datasets and the ultimate objective deploying deep learning in real-world applications with satisfactory performance.
- Score: 1.9578088547147654
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Plant disease recognition has witnessed a significant improvement with deep
learning in recent years. Although plant disease datasets are essential and
many relevant datasets are public available, two fundamental questions exist.
First, how to differentiate datasets and further choose suitable public
datasets for specific applications? Second, what kinds of characteristics of
datasets are desired to achieve promising performance in real-world
applications? To address the questions, this study explicitly propose an
informative taxonomy to describe potential plant disease datasets. We further
provide several directions for future, such as creating challenge-oriented
datasets and the ultimate objective deploying deep learning in real-world
applications with satisfactory performance. In addition, existing related
public RGB image datasets are summarized. We believe that this study will
contributing making better datasets and that this study will contribute beyond
plant disease recognition such as plant species recognition. To facilitate the
community, our project is public https://github.com/xml94/PPDRD with the
information of relevant public datasets.
Related papers
- Synthetic Data Generation with Large Language Models for Personalized Community Question Answering [47.300506002171275]
We build Sy-SE-PQA based on an existing dataset, SE-PQA, which consists of questions and answers posted on the popular StackExchange communities.
Our findings suggest that LLMs have high potential in generating data tailored to users' needs.
The synthetic data can replace human-written training data, even if the generated data may contain incorrect information.
arXiv Detail & Related papers (2024-10-29T16:19:08Z) - Self-supervised transformer-based pre-training method with General Plant Infection dataset [3.969851116372513]
This study proposes an advanced network architecture that combines Contrastive Learning and Masked Image Modeling (MIM)
The proposed network architecture demonstrates effectiveness in addressing plant pest and disease recognition tasks, achieving notable detection accuracy.
Our code and dataset will be publicly available to advance research in plant pest and disease recognition.
arXiv Detail & Related papers (2024-07-20T15:48:35Z) - On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - Ontologies for increasing the FAIRness of plant research data [0.0]
Onologies provide concepts for a particular domain as well as relationships between concepts.
By tagging with data terms data becomes both human machine interpretable, allowing increased reuse and interoperability.
We outline the most relevant to the fundamental plant sciences and how they can be used to annotate data related to plant-specific experiments.
arXiv Detail & Related papers (2023-08-25T13:08:26Z) - Exploring the Potential of AI-Generated Synthetic Datasets: A Case Study
on Telematics Data with ChatGPT [0.0]
This research delves into the construction and utilization of synthetic datasets, specifically within the telematics sphere, leveraging OpenAI's powerful language model, ChatGPT.
To illustrate this data creation process, a hands-on case study is conducted, focusing on the generation of a synthetic telematics dataset.
arXiv Detail & Related papers (2023-06-23T15:15:13Z) - LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting [65.71129509623587]
Road traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning.
However, the promising results achieved on current public datasets may not be applicable to practical scenarios.
We introduce the LargeST benchmark dataset, which includes a total of 8,600 sensors in California with a 5-year time coverage.
arXiv Detail & Related papers (2023-06-14T05:48:36Z) - Embrace Limited and Imperfect Training Datasets: Opportunities and
Challenges in Plant Disease Recognition Using Deep Learning [5.526950086166696]
We argue that embracing poor datasets is viable and aim to explicitly define the challenges associated with using these datasets.
Although our primary focus is on plant disease recognition, we emphasize that the principles of embracing and analyzing poor datasets are applicable to a wider range of domains, including agriculture.
arXiv Detail & Related papers (2023-05-19T08:58:09Z) - Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic
Data [91.52783572568214]
Synthetic data may become a dominant force in the machine learning world, promising a future where datasets can be tailored to individual needs.
We discuss which fundamental challenges the community needs to overcome for wider relevance and application of synthetic data.
arXiv Detail & Related papers (2023-04-07T16:38:40Z) - A Comprehensive Survey of Dataset Distillation [73.15482472726555]
It has become challenging to handle the unlimited growth of data with limited computing power.
Deep learning technology has developed unprecedentedly in the last decade.
This paper provides a holistic understanding of dataset distillation from multiple aspects.
arXiv Detail & Related papers (2023-01-13T15:11:38Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - On The State of Data In Computer Vision: Human Annotations Remain
Indispensable for Developing Deep Learning Models [0.0]
High-quality labeled datasets play a crucial role in fueling the development of machine learning (ML)
Since the emergence of the ImageNet dataset and the AlexNet model in 2012, the size of new open-source labeled vision datasets has remained roughly constant.
Only a minority of publications in the computer vision community tackle supervised learning on datasets that are orders of magnitude larger than Imagenet.
arXiv Detail & Related papers (2021-07-31T00:08:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.