Automated Classification of Dry Bean Varieties Using XGBoost and SVM Models
- URL: http://arxiv.org/abs/2408.01244v1
- Date: Fri, 2 Aug 2024 13:05:33 GMT
- Title: Automated Classification of Dry Bean Varieties Using XGBoost and SVM Models
- Authors: Ramtin Ardeshirifar,
- Abstract summary: This paper presents a comparative study on the automated classification of seven different varieties of dry beans using machine learning models.
The XGBoost and SVM models achieved overall correct classification rates of 94.00% and 94.39%, respectively.
This study contributes to the growing body of work on precision agriculture, demonstrating that automated systems can significantly support seed quality control and crop yield optimization.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a comparative study on the automated classification of seven different varieties of dry beans using machine learning models. Leveraging a dataset of 12,909 dry bean samples, reduced from an initial 13,611 through outlier removal and feature extraction, we applied Principal Component Analysis (PCA) for dimensionality reduction and trained two multiclass classifiers: XGBoost and Support Vector Machine (SVM). The models were evaluated using nested cross-validation to ensure robust performance assessment and hyperparameter tuning. The XGBoost and SVM models achieved overall correct classification rates of 94.00% and 94.39%, respectively. The results underscore the efficacy of these machine learning approaches in agricultural applications, particularly in enhancing the uniformity and efficiency of seed classification. This study contributes to the growing body of work on precision agriculture, demonstrating that automated systems can significantly support seed quality control and crop yield optimization. Future work will explore incorporating more diverse datasets and advanced algorithms to further improve classification accuracy.
Related papers
- AgEval: A Benchmark for Zero-Shot and Few-Shot Plant Stress Phenotyping with Multimodal LLMs [19.7240633020344]
AgEval is a benchmark comprising 12 diverse plant stress phenotyping tasks.
Our study assesses zero-shot and few-shot in-context learning performance of state-of-the-art models.
arXiv Detail & Related papers (2024-07-29T00:39:51Z) - LSTM Autoencoder-based Deep Neural Networks for Barley Genotype-to-Phenotype Prediction [16.99449054451577]
We propose a new LSTM autoencoder-based model for barley genotype-to-phenotype prediction, specifically for flowering time and grain yield estimation.
Our model outperformed the other baseline methods, demonstrating its potential in handling complex high-dimensional agricultural datasets.
arXiv Detail & Related papers (2024-07-21T16:07:43Z) - Predictive Analytics of Varieties of Potatoes [2.336821989135698]
We explore the application of machine learning algorithms specifically to enhance the selection process of Russet potato clones in breeding trials.
This study addresses the challenge of efficiently identifying high-yield, disease-resistant, and climate-resilient potato varieties.
arXiv Detail & Related papers (2024-04-04T00:49:05Z) - Extension of Transformational Machine Learning: Classification Problems [0.0]
This study explores the application and performance of Transformational Machine Learning (TML) in drug discovery.
TML, a meta learning algorithm, excels in exploiting common attributes across various domains.
The drug discovery process, which is complex and time-consuming, can benefit greatly from the enhanced prediction accuracy.
arXiv Detail & Related papers (2023-08-07T07:34:18Z) - Benchmarking the Effectiveness of Classification Algorithms and SVM
Kernels for Dry Beans [0.6263481844384227]
This study analyses different Support Vector Machine (SVM) classification algorithms, namely linear, and radial basis function (RBF)
The analysis is performed on the Dry Bean dataset, with PCA (Principal Component Analysis) conducted as a preprocessing step for dimensionality reduction.
The RBF SVM kernel algorithm achieves the highest Accuracy of 93.34%, Precision of 92.61%, Recall of 92.35% and F1 Score as 91.40%.
arXiv Detail & Related papers (2023-07-15T18:13:29Z) - PruMUX: Augmenting Data Multiplexing with Model Compression [42.89593283051397]
In this paper, we combine two such methods -- structured pruning and data multiplexing -- to compound the speedup gains obtained by either method.
Our approach, PruMUX, obtains up to 7.5-29.5X throughput improvement over BERT-base model with accuracy threshold from 80% to 74%.
We propose Auto-PruMUX, a meta-level model that can predict the high-performance parameters for pruning and multiplexing given a desired accuracy loss budget.
arXiv Detail & Related papers (2023-05-24T04:22:38Z) - Boosting Out-of-Distribution Detection with Multiple Pre-trained Models [41.66566916581451]
Post hoc detection utilizing pre-trained models has shown promising performance and can be scaled to large-scale problems.
We propose a detection enhancement method by ensembling multiple detection decisions derived from a zoo of pre-trained models.
Our method substantially improves the relative performance by 65.40% and 26.96% on the CIFAR10 and ImageNet benchmarks.
arXiv Detail & Related papers (2022-12-24T12:11:38Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - SelectAugment: Hierarchical Deterministic Sample Selection for Data
Augmentation [72.58308581812149]
We propose an effective approach, dubbed SelectAugment, to select samples to be augmented in a deterministic and online manner.
Specifically, in each batch, we first determine the augmentation ratio, and then decide whether to augment each training sample under this ratio.
In this way, the negative effects of the randomness in selecting samples to augment can be effectively alleviated and the effectiveness of DA is improved.
arXiv Detail & Related papers (2021-12-06T08:38:38Z) - Guiding Generative Language Models for Data Augmentation in Few-Shot
Text Classification [59.698811329287174]
We leverage GPT-2 for generating artificial training instances in order to improve classification performance.
Our results show that fine-tuning GPT-2 in a handful of label instances leads to consistent classification improvements.
arXiv Detail & Related papers (2021-11-17T12:10:03Z) - Active Hybrid Classification [79.02441914023811]
This paper shows how crowd and machines can support each other in tackling classification problems.
We propose an architecture that orchestrates active learning and crowd classification and combines them in a virtuous cycle.
arXiv Detail & Related papers (2021-01-21T21:09:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.