Automated Classification of Dry Bean Varieties Using XGBoost and SVM Models
- URL: http://arxiv.org/abs/2408.01244v1
- Date: Fri, 2 Aug 2024 13:05:33 GMT
- Title: Automated Classification of Dry Bean Varieties Using XGBoost and SVM Models
- Authors: Ramtin Ardeshirifar,
- Abstract summary: This paper presents a comparative study on the automated classification of seven different varieties of dry beans using machine learning models.
The XGBoost and SVM models achieved overall correct classification rates of 94.00% and 94.39%, respectively.
This study contributes to the growing body of work on precision agriculture, demonstrating that automated systems can significantly support seed quality control and crop yield optimization.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a comparative study on the automated classification of seven different varieties of dry beans using machine learning models. Leveraging a dataset of 12,909 dry bean samples, reduced from an initial 13,611 through outlier removal and feature extraction, we applied Principal Component Analysis (PCA) for dimensionality reduction and trained two multiclass classifiers: XGBoost and Support Vector Machine (SVM). The models were evaluated using nested cross-validation to ensure robust performance assessment and hyperparameter tuning. The XGBoost and SVM models achieved overall correct classification rates of 94.00% and 94.39%, respectively. The results underscore the efficacy of these machine learning approaches in agricultural applications, particularly in enhancing the uniformity and efficiency of seed classification. This study contributes to the growing body of work on precision agriculture, demonstrating that automated systems can significantly support seed quality control and crop yield optimization. Future work will explore incorporating more diverse datasets and advanced algorithms to further improve classification accuracy.
Related papers
- Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search [59.75749613951193]
We propose Data Influence-oriented Tree Search (DITS) to guide both tree search and data selection.
By leveraging influence scores, we effectively identify the most impactful data for system improvement.
We derive influence score estimation methods tailored for non-differentiable metrics.
arXiv Detail & Related papers (2025-02-02T23:20:16Z) - GBFRS: Robust Fuzzy Rough Sets via Granular-ball Computing [48.33779268699777]
Fuzzy rough set theory is effective for processing datasets with complex attributes.
Most existing models operate at the finest granularity, rendering them inefficient and sensitive to noise.
This paper proposes integrating multi-granularity granular-ball computing into fuzzy rough set theory, using granular-balls to replace sample points.
arXiv Detail & Related papers (2025-01-30T15:09:26Z) - A Robust Support Vector Machine Approach for Raman COVID-19 Data Classification [0.7864304771129751]
In this paper, we investigate the performance of a novel robust formulation for Support Vector Machine (SVM) in classifying COVID-19 samples obtained from Raman spectroscopy.
We derive robust counterpart models of deterministic formulations using bounded-by-norm uncertainty sets around each observation.
The effectiveness of our approach is validated on real-world COVID-19 datasets provided by Italian hospitals.
arXiv Detail & Related papers (2025-01-29T14:02:45Z) - Artificial Liver Classifier: A New Alternative to Conventional Machine Learning Models [4.395397502990339]
This paper introduces the Artificial Liver (ALC), a novel supervised learning classifier inspired by the human liver's detoxification function.
The ALC is characterized by its simplicity, speed, hyperparameters-free, ability to reduce overfitting, and effectiveness in addressing multi-classification problems.
It was evaluated on five benchmark machine learning datasets: Iris Flower, Breast Cancer Wisconsin, Wine, Voice Gender, and MNIST.
arXiv Detail & Related papers (2025-01-14T12:42:01Z) - Predictive Analytics of Varieties of Potatoes [2.336821989135698]
We explore the application of machine learning algorithms specifically to enhance the selection process of Russet potato clones in breeding trials.
This study addresses the challenge of efficiently identifying high-yield, disease-resistant, and climate-resilient potato varieties.
arXiv Detail & Related papers (2024-04-04T00:49:05Z) - Extension of Transformational Machine Learning: Classification Problems [0.0]
This study explores the application and performance of Transformational Machine Learning (TML) in drug discovery.
TML, a meta learning algorithm, excels in exploiting common attributes across various domains.
The drug discovery process, which is complex and time-consuming, can benefit greatly from the enhanced prediction accuracy.
arXiv Detail & Related papers (2023-08-07T07:34:18Z) - Benchmarking the Effectiveness of Classification Algorithms and SVM
Kernels for Dry Beans [0.6263481844384227]
This study analyses different Support Vector Machine (SVM) classification algorithms, namely linear, and radial basis function (RBF)
The analysis is performed on the Dry Bean dataset, with PCA (Principal Component Analysis) conducted as a preprocessing step for dimensionality reduction.
The RBF SVM kernel algorithm achieves the highest Accuracy of 93.34%, Precision of 92.61%, Recall of 92.35% and F1 Score as 91.40%.
arXiv Detail & Related papers (2023-07-15T18:13:29Z) - PruMUX: Augmenting Data Multiplexing with Model Compression [42.89593283051397]
In this paper, we combine two such methods -- structured pruning and data multiplexing -- to compound the speedup gains obtained by either method.
Our approach, PruMUX, obtains up to 7.5-29.5X throughput improvement over BERT-base model with accuracy threshold from 80% to 74%.
We propose Auto-PruMUX, a meta-level model that can predict the high-performance parameters for pruning and multiplexing given a desired accuracy loss budget.
arXiv Detail & Related papers (2023-05-24T04:22:38Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - SelectAugment: Hierarchical Deterministic Sample Selection for Data
Augmentation [72.58308581812149]
We propose an effective approach, dubbed SelectAugment, to select samples to be augmented in a deterministic and online manner.
Specifically, in each batch, we first determine the augmentation ratio, and then decide whether to augment each training sample under this ratio.
In this way, the negative effects of the randomness in selecting samples to augment can be effectively alleviated and the effectiveness of DA is improved.
arXiv Detail & Related papers (2021-12-06T08:38:38Z) - Guiding Generative Language Models for Data Augmentation in Few-Shot
Text Classification [59.698811329287174]
We leverage GPT-2 for generating artificial training instances in order to improve classification performance.
Our results show that fine-tuning GPT-2 in a handful of label instances leads to consistent classification improvements.
arXiv Detail & Related papers (2021-11-17T12:10:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.