Unsupervised Clustering Approaches for Autism Screening: Achieving 95.31% Accuracy with a Gaussian Mixture Model
- URL: http://arxiv.org/abs/2503.05746v1
- Date: Thu, 20 Feb 2025 18:12:59 GMT
- Title: Unsupervised Clustering Approaches for Autism Screening: Achieving 95.31% Accuracy with a Gaussian Mixture Model
- Authors: Nora Fink,
- Abstract summary: Autism spectrum disorder (ASD) remains a challenging condition to diagnose effectively and promptly.<n>Traditional diagnostic methods presuppose the availability of labeled data, which can be both time-consuming and resource-intensive to obtain.<n>This paper explores the use of four distinct unsupervised clustering algorithms to analyze a publicly available dataset of 704 adult individuals screened for ASD.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autism spectrum disorder (ASD) remains a challenging condition to diagnose effectively and promptly, despite global efforts in public health, clinical screening, and scientific research. Traditional diagnostic methods, primarily reliant on supervised learning approaches, presuppose the availability of labeled data, which can be both time-consuming and resource-intensive to obtain. Unsupervised learning, in contrast, offers a means of gaining insights from unlabeled datasets in a manner that can expedite or support the diagnostic process. This paper explores the use of four distinct unsupervised clustering algorithms K-Means, Gaussian Mixture Model (GMM), Agglomerative Clustering, and DBSCAN to analyze a publicly available dataset of 704 adult individuals screened for ASD. After extensive hyperparameter tuning via cross-validation, the study documents how the Gaussian Mixture Model achieved the highest clustering-to-label accuracy (95.31%) when mapped to the original ASD/NO classification (4). Other key performance metrics included the Adjusted Rand Index (ARI) and silhouette scores, which further illustrated the internal coherence of each cluster. The dataset underwent preprocessing procedures including data cleaning, label encoding of categorical features, and standard scaling, followed by a thorough cross-validation approach to assess and compare the four clustering methods (5). These results highlight the significant potential of unsupervised methods in assisting ASD screening, especially in contexts where labeled data may be sparse, uncertain, or prohibitively expensive to obtain. With continued methodological refinements, unsupervised approaches hold promise for augmenting early detection initiatives and guiding resource allocation to individuals at high risk.
Related papers
- Multi-Class Segmentation of Aortic Branches and Zones in Computed Tomography Angiography: The AortaSeg24 Challenge [55.252714550918824]
AortaSeg24 MICCAI Challenge introduced the first dataset of 100 CTA volumes annotated for 23 clinically relevant aortic branches and zones.<n>This paper presents the challenge design, dataset details, evaluation metrics, and an in-depth analysis of the top-performing algorithms.
arXiv Detail & Related papers (2025-02-07T21:09:05Z) - Federated Anomaly Detection for Early-Stage Diagnosis of Autism Spectrum Disorders using Serious Game Data [0.0]
This study presents a novel semi-supervised approach for ASD detection using AutoEncoder-based Machine Learning (ML) methods.
Our approach utilizes data collected manually through a serious game specifically designed for this purpose.
Since the sensitive data collected by the gamified application are susceptible to privacy leakage, we developed a Federated Learning framework.
arXiv Detail & Related papers (2024-10-25T23:00:12Z) - An Evaluation of Machine Learning Approaches for Early Diagnosis of
Autism Spectrum Disorder [0.0]
Autistic Spectrum Disorder (ASD) is a neurological disease characterized by difficulties with social interaction, communication, and repetitive activities.
This study employs diverse machine learning methods to identify crucial ASD traits, aiming to enhance and automate the diagnostic process.
arXiv Detail & Related papers (2023-09-20T21:23:37Z) - Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites:
A Federated Learning Approach with Noise-Resilient Training [75.40980802817349]
Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area.
We introduce a Decoupled Hard Label Correction (DHLC) strategy that considers the imbalanced distribution and fuzzy boundaries of MS lesions.
We also introduce a Centrally Enhanced Label Correction (CELC) strategy, which leverages the aggregated central model as a correction teacher for all sites.
arXiv Detail & Related papers (2023-08-31T00:36:10Z) - Simple and Scalable Algorithms for Cluster-Aware Precision Medicine [0.0]
We propose a simple and scalable approach to joint clustering and embedding.
This novel, cluster-aware embedding approach overcomes the complexity and limitations of current joint embedding and clustering methods.
Our approach does not require the user to choose the desired number of clusters, but instead yields interpretable dendrograms of hierarchically clustered embeddings.
arXiv Detail & Related papers (2022-11-29T19:27:26Z) - Hierarchical Semi-Supervised Contrastive Learning for
Contamination-Resistant Anomaly Detection [81.07346419422605]
Anomaly detection aims at identifying deviant samples from the normal data distribution.
Contrastive learning has provided a successful way to sample representation that enables effective discrimination on anomalies.
We propose a novel hierarchical semi-supervised contrastive learning framework, for contamination-resistant anomaly detection.
arXiv Detail & Related papers (2022-07-24T18:49:26Z) - Hybrid Dynamic Contrast and Probability Distillation for Unsupervised
Person Re-Id [109.1730454118532]
Unsupervised person re-identification (Re-Id) has attracted increasing attention due to its practical application in the read-world video surveillance system.
We present the hybrid dynamic cluster contrast and probability distillation algorithm.
It formulates the unsupervised Re-Id problem into an unified local-to-global dynamic contrastive learning and self-supervised probability distillation framework.
arXiv Detail & Related papers (2021-09-29T02:56:45Z) - A multi-stage semi-supervised improved deep embedded clustering
(MS-SSIDEC) method for bearing fault diagnosis under the situation of
insufficient labeled samples [20.952315351460527]
It costs a lot of labor and time to label data in actual industrial processes, which challenges the application of intelligent fault diagnosis methods.
To solve this problem, a multi-stage semi-supervised improved deep embedded clustering (MS-SSIDEC) method is proposed.
This method includes three stages: pre-training, deep clustering and enhanced supervised learning.
arXiv Detail & Related papers (2021-09-28T06:49:40Z) - Deep Semi-Supervised Embedded Clustering (DSEC) for Stratification of
Heart Failure Patients [50.48904066814385]
In this work we apply deep semi-supervised embedded clustering to determine data-driven patient subgroups of heart failure.
We find clinically relevant clusters from an embedded space derived from heterogeneous data.
The proposed algorithm can potentially find new undiagnosed subgroups of patients that have different outcomes.
arXiv Detail & Related papers (2020-12-24T12:56:46Z) - Semi-supervised and Unsupervised Methods for Heart Sounds Classification
in Restricted Data Environments [4.712158833534046]
This study uses various supervised, semi-supervised and unsupervised approaches on the PhysioNet/CinC 2016 Challenge dataset.
A GAN based semi-supervised method is proposed, which allows the usage of unlabelled data samples to boost the learning of data distribution.
In particular, the unsupervised feature extraction using 1D CNN Autoencoder coupled with one-class SVM obtains good performance without any data labelling.
arXiv Detail & Related papers (2020-06-04T02:07:35Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.