AutoML-Med: A Framework for Automated Machine Learning in Medical Tabular Data
- URL: http://arxiv.org/abs/2508.02625v1
- Date: Mon, 04 Aug 2025 17:13:45 GMT
- Title: AutoML-Med: A Framework for Automated Machine Learning in Medical Tabular Data
- Authors: Riccardo Francia, Maurizio Leone, Giorgio Leonardi, Stefania Montani, Marzio Pennisi, Manuel Striani, Sandra D'Alfonso,
- Abstract summary: This paper introduces AutoML-Med, an Automated Machine Learning tool specifically designed to address these challenges.<n>AutoML-Med's architecture incorporates Latin Hypercube Sampling (LHS) for exploring preprocessing methods, trains models using selected metrics, and utilizes Partial Rank Correlation Coefficient (PRCC) for fine-tuned optimization.<n> Experimental results demonstrate AutoML-Med's effectiveness in two different clinical settings, achieving higher balanced accuracy and sensitivity, which are crucial for identifying at-risk patients.
- Score: 1.1202352020347177
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Medical datasets are typically affected by issues such as missing values, class imbalance, a heterogeneous feature types, and a high number of features versus a relatively small number of samples, preventing machine learning models from obtaining proper results in classification and regression tasks. This paper introduces AutoML-Med, an Automated Machine Learning tool specifically designed to address these challenges, minimizing user intervention and identifying the optimal combination of preprocessing techniques and predictive models. AutoML-Med's architecture incorporates Latin Hypercube Sampling (LHS) for exploring preprocessing methods, trains models using selected metrics, and utilizes Partial Rank Correlation Coefficient (PRCC) for fine-tuned optimization of the most influential preprocessing steps. Experimental results demonstrate AutoML-Med's effectiveness in two different clinical settings, achieving higher balanced accuracy and sensitivity, which are crucial for identifying at-risk patients, compared to other state-of-the-art tools. AutoML-Med's ability to improve prediction results, especially in medical datasets with sparse data and class imbalance, highlights its potential to streamline Machine Learning applications in healthcare.
Related papers
- Clinical NLP with Attention-Based Deep Learning for Multi-Disease Prediction [44.0876796031468]
This paper addresses the challenges posed by the unstructured nature and high-dimensional semantic complexity of electronic health record texts.<n>A deep learning method based on attention mechanisms is proposed to achieve unified modeling for information extraction and multi-label disease prediction.
arXiv Detail & Related papers (2025-07-02T07:45:22Z) - AUTOCT: Automating Interpretable Clinical Trial Prediction with LLM Agents [47.640779069547534]
AutoCT is a novel framework that combines the reasoning capabilities of large language models with the explainability of classical machine learning.<n>We show that AutoCT performs on par with or better than SOTA methods on clinical trial prediction tasks within only a limited number of self-refinement iterations.
arXiv Detail & Related papers (2025-06-04T11:50:55Z) - Improving Clinical Decision Support through Interpretable Machine Learning and Error Handling in Electronic Health Records [6.594072648536156]
Trust-MAPS translates clinical domain knowledge into high-dimensional, mixed-integer programming models.<n>Trust-scores emerge as clinically meaningful features that not only boost predictive performance for clinical decision support tasks, but also lend interpretability to ML models.
arXiv Detail & Related papers (2023-08-21T15:14:49Z) - Study on the effectiveness of AutoML in detecting cardiovascular disease [0.0]
The article presents the relevance of the development and application of patient-oriented systems, in which machine learning (ML) is a promising technology that allows predicting cardiovascular diseases.
The structure of the AutoML model for detecting cardiovascular diseases depends not only on the efficiency and accuracy of the basic models used, but also on the scenarios for preprocessing the initial data.
arXiv Detail & Related papers (2023-08-19T08:46:27Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review
and Replicability Study [60.56194508762205]
We reproduce, compare, and analyze state-of-the-art automated medical coding machine learning models.
We show that several models underperform due to weak configurations, poorly sampled train-test splits, and insufficient evaluation.
We present the first comprehensive results on the newly released MIMIC-IV dataset using the reproduced models.
arXiv Detail & Related papers (2023-04-21T11:54:44Z) - Time Associated Meta Learning for Clinical Prediction [78.99422473394029]
We propose a novel time associated meta learning (TAML) method to make effective predictions at multiple future time points.
To address the sparsity problem after task splitting, TAML employs a temporal information sharing strategy to augment the number of positive samples.
We demonstrate the effectiveness of TAML on multiple clinical datasets, where it consistently outperforms a range of strong baselines.
arXiv Detail & Related papers (2023-03-05T03:54:54Z) - Benchmarking AutoML Frameworks for Disease Prediction Using Medical
Claims [7.219529711278771]
We generated a large dataset using historical administrative claims including demographic information and flags for disease codes.
We trained three AutoML tools on this dataset to predict six different disease outcomes in 2019 and evaluated model performances on several metrics.
All models recorded low area under the precision-recall curve and failed to predict true positives while keeping the true negative rate high.
arXiv Detail & Related papers (2021-07-22T07:34:48Z) - Empirical Analysis of Machine Learning Configurations for Prediction of
Multiple Organ Failure in Trauma Patients [7.122236250657051]
Multiple organ failure (MOF) is a life-threatening condition.
We perform quantitative analysis on early MOF prediction with comprehensive machine learning (ML) configurations.
arXiv Detail & Related papers (2021-03-19T17:49:22Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Self-Training with Improved Regularization for Sample-Efficient Chest
X-Ray Classification [80.00316465793702]
We present a deep learning framework that enables robust modeling in challenging scenarios.
Our results show that using 85% lesser labeled data, we can build predictive models that match the performance of classifiers trained in a large-scale data setting.
arXiv Detail & Related papers (2020-05-03T02:36:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.