Related papers: Benchmarking the Effectiveness of Classification Algorithms and SVM Kernels for Dry Beans

Benchmarking the Effectiveness of Classification Algorithms and SVM Kernels for Dry Beans

URL: http://arxiv.org/abs/2307.07863v1
Date: Sat, 15 Jul 2023 18:13:29 GMT
Title: Benchmarking the Effectiveness of Classification Algorithms and SVM Kernels for Dry Beans
Authors: Anant Mehta, Prajit Sengupta, Divisha Garg, Harpreet Singh, Yosi Shacham Diamand
Abstract summary: This study analyses different Support Vector Machine (SVM) classification algorithms, namely linear, and radial basis function (RBF) The analysis is performed on the Dry Bean dataset, with PCA (Principal Component Analysis) conducted as a preprocessing step for dimensionality reduction. The RBF SVM kernel algorithm achieves the highest Accuracy of 93.34%, Precision of 92.61%, Recall of 92.35% and F1 Score as 91.40%.
Score: 0.6263481844384227
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Plant breeders and agricultural researchers can increase crop productivity by identifying desirable features, disease resistance, and nutritional content by analysing the Dry Bean dataset. This study analyses and compares different Support Vector Machine (SVM) classification algorithms, namely linear, polynomial, and radial basis function (RBF), along with other popular classification algorithms. The analysis is performed on the Dry Bean Dataset, with PCA (Principal Component Analysis) conducted as a preprocessing step for dimensionality reduction. The primary evaluation metric used is accuracy, and the RBF SVM kernel algorithm achieves the highest Accuracy of 93.34%, Precision of 92.61%, Recall of 92.35% and F1 Score as 91.40%. Along with adept visualization and empirical analysis, this study offers valuable guidance by emphasizing the importance of considering different SVM algorithms for complex and non-linear structured datasets.

Related papers

Prediction of Cellular Malignancy Using Electrical Impedance Signatures and Supervised Machine Learning [0.0]
This study systematically reviewed 33 scholarly articles to compile datasets of quantitative bioelectric parameters.<n>Three supervised machine learning algorithms- Random Forest (RF), Support Vector Machine (SVM), and K-Nearest Neighbor (KNN) were implemented and tuned.<n>Results demonstrate that Random Forest achieved the highest predictive accuracy of 90% when configured with a maximum depth of 4 and 100 estimators.
arXiv Detail & Related papers (2026-01-08T01:30:52Z)
The role of data partitioning on the performance of EEG-based deep learning models in supervised cross-subject analysis: a preliminary study [37.69303106863453]
Deep learning is advancing the analysis of electroencephalography (EEG) data by effectively discovering highly nonlinear patterns.<n>No comprehensive guidelines for proper data partitioning and cross-validation exist in the domain.<n>This paper thoroughly investigates the role of data partitioning and cross-validation in evaluating EEG deep learning models.
arXiv Detail & Related papers (2025-05-19T12:05:28Z)
Dataset Properties Shape the Success of Neuroimaging-Based Patient Stratification: A Benchmarking Analysis Across Clustering Algorithms [38.321248253111776]
We evaluated 4 widely used stratification algorithms, HYDRA, SuStaIn, SmileGAN, and SurrealGAN, on a suite of synthetic brain-morphometry cohorts.<n>Across 122 synthetic scenarios, data complexity consistently outweighed algorithm choice in predicting stratification success.<n>Well-separated clusters yielded high accuracy for all methods, whereas overlapping, unequal-sized, or subtle effects reduced accuracy by up to 50%.
arXiv Detail & Related papers (2025-03-15T09:44:00Z)
Machine Learning and statistical classification of CRISPR-Cas12a diagnostic assays [0.0]
CRISPR-based diagnostics have gained increasing attention as biosensing tools able to address limitations in contemporary molecular diagnostic tests. We develop a long short-term memory recurrent neural network to classify CRISPR-biosensing data, achieving 100% specificity on our model data set.
arXiv Detail & Related papers (2025-01-08T10:59:36Z)
Electroencephalogram Emotion Recognition via AUC Maximization [0.0]
Imbalanced datasets pose significant challenges in areas including neuroscience, cognitive science, and medical diagnostics. This study addresses the issue class imbalance, using the Liking' label in the DEAP dataset as an example.
arXiv Detail & Related papers (2024-08-16T19:08:27Z)
Centralized and Federated Heart Disease Classification Models Using UCI Dataset and their Shapley-value Based Interpretability [0.7234862895932991]
This study benchmarks machine learning algorithms for heart disease classification using the UCI dataset. Various binary classification algorithms are trained on pooled data, with a support vector machine (SVM) achieving the highest testing accuracy of 83.3%.
arXiv Detail & Related papers (2024-08-12T14:29:54Z)
Automated Classification of Dry Bean Varieties Using XGBoost and SVM Models [0.0]
This paper presents a comparative study on the automated classification of seven different varieties of dry beans using machine learning models. The XGBoost and SVM models achieved overall correct classification rates of 94.00% and 94.39%, respectively. This study contributes to the growing body of work on precision agriculture, demonstrating that automated systems can significantly support seed quality control and crop yield optimization.
arXiv Detail & Related papers (2024-08-02T13:05:33Z)
PULASki: Learning inter-rater variability using statistical distances to improve probabilistic segmentation [35.34932609930401]
This work proposes the PULASki method as a computationally efficient generative tool for biomedical image segmentation.<n>It captures variability in expert annotations, even in small datasets.<n>Our experiments are also the first to present a comparative study of the computationally feasible segmentation of complex geometries using 3D patches and the traditional use of 2D slices.
arXiv Detail & Related papers (2023-12-25T10:31:22Z)
A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data. We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z)
EKGNet: A 10.96{\mu}W Fully Analog Neural Network for Intra-Patient Arrhythmia Classification [79.7946379395238]
We present an integrated approach by combining analog computing and deep learning for electrocardiogram (ECG) arrhythmia classification. We propose EKGNet, a hardware-efficient and fully analog arrhythmia classification architecture that archives high accuracy with low power consumption.
arXiv Detail & Related papers (2023-10-24T02:37:49Z)
Machine Learning-Assisted Pattern Recognition Algorithms for Estimating Ultimate Tensile Strength in Fused Deposition Modeled Polylactic Acid Specimens [0.0]
We investigate the application of supervised machine learning algorithms for estimating the Ultimate Tensile Strength (UTS) of Polylactic Acid (PLA) specimens fabricated using the Fused Deposition Modeling (FDM) process. The primary objective was to assess the accuracy and effectiveness of four distinct supervised classification algorithms, namely Logistic Classification, Gradient Boosting Classification, Decision Tree, and K-Nearest Neighbor. The results revealed that while the Decision Tree and K-Nearest Neighbor algorithms both achieved an F1 score of 0.71, the KNN algorithm exhibited a higher Area Under the Curve (AUC) score of 0.79, outperforming the other algorithms
arXiv Detail & Related papers (2023-07-13T11:10:22Z)
Making Machine Learning Datasets and Models FAIR for HPC: A Methodology and Case Study [0.0]
The FAIR Guiding Principles aim to improve the findability, accessibility, interoperability, and reusability of digital content by making them both human and machine actionable. These principles have not yet been broadly adopted in the domain of machine learning-based program analyses and optimizations for High-Performance Computing. We design a methodology to make HPC datasets and machine learning models FAIR after investigating existing FAIRness assessment and improvement techniques.
arXiv Detail & Related papers (2022-11-03T18:45:46Z)
Dataset Complexity Assessment Based on Cumulative Maximum Scaled Area Under Laplacian Spectrum [38.65823547986758]
It is meaningful to predict classification performance by assessing the complexity of datasets effectively before training DCNN models. This paper proposes a novel method called cumulative maximum scaled Area Under Laplacian Spectrum (cmsAULS)
arXiv Detail & Related papers (2022-09-29T13:02:04Z)
Lung Cancer Lesion Detection in Histopathology Images Using Graph-Based Sparse PCA Network [93.22587316229954]
We propose a graph-based sparse principal component analysis (GS-PCA) network, for automated detection of cancerous lesions on histological lung slides stained by hematoxylin and eosin (H&E) We evaluate the performance of the proposed algorithm on H&E slides obtained from an SVM K-rasG12D lung cancer mouse model using precision/recall rates, F-score, Tanimoto coefficient, and area under the curve (AUC) of the receiver operator characteristic (ROC)
arXiv Detail & Related papers (2021-10-27T19:28:36Z)
Doing Great at Estimating CATE? On the Neglected Assumptions in Benchmark Comparisons of Treatment Effect Estimators [91.3755431537592]
We show that even in arguably the simplest setting, estimation under ignorability assumptions can be misleading. We consider two popular machine learning benchmark datasets for evaluation of heterogeneous treatment effect estimators. We highlight that the inherent characteristics of the benchmark datasets favor some algorithms over others.
arXiv Detail & Related papers (2021-07-28T13:21:27Z)
Deep Representational Similarity Learning for analyzing neural signatures in task-based fMRI dataset [81.02949933048332]
This paper develops Deep Representational Similarity Learning (DRSL), a deep extension of Representational Similarity Analysis (RSA) DRSL is appropriate for analyzing similarities between various cognitive tasks in fMRI datasets with a large number of subjects.
arXiv Detail & Related papers (2020-09-28T18:30:14Z)
Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management. We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.