A Reproducible Framework for Bias-Resistant Machine Learning on Small-Sample Neuroimaging Data
- URL: http://arxiv.org/abs/2602.02920v1
- Date: Mon, 02 Feb 2026 23:47:57 GMT
- Title: A Reproducible Framework for Bias-Resistant Machine Learning on Small-Sample Neuroimaging Data
- Authors: Jagan Mohan Reddy Dwarampudi, Jennifer L Purks, Joshua Wong, Renjie Hu, Tania Banerjee,
- Abstract summary: We introduce a reproducible, bias-resistant machine learning framework that integrates domain-informed feature engineering, nested cross-validation, and calibrated decision-threshold optimization for small-sample neuroimaging data.<n>By combining interpretability and unbiased evaluation, this work provides a generalizable computational blueprint for reliable machine learning in data-limited biomedical domains.
- Score: 1.2287733479434337
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a reproducible, bias-resistant machine learning framework that integrates domain-informed feature engineering, nested cross-validation, and calibrated decision-threshold optimization for small-sample neuroimaging data. Conventional cross-validation frameworks that reuse the same folds for both model selection and performance estimation yield optimistically biased results, limiting reproducibility and generalization. Demonstrated on a high-dimensional structural MRI dataset of deep brain stimulation cognitive outcomes, the framework achieved a nested-CV balanced accuracy of 0.660\,$\pm$\,0.068 using a compact, interpretable subset selected via importance-guided ranking. By combining interpretability and unbiased evaluation, this work provides a generalizable computational blueprint for reliable machine learning in data-limited biomedical domains.
Related papers
- Improving Cardiac Risk Prediction Using Data Generation Techniques [37.94487163156369]
This work proposes an architecture for the synthesis of realistic clinical records that are coherent with real-world observations.<n>The primary objective is to increase the size and diversity of the available datasets in order to enhance the performance of cardiac risk prediction models.
arXiv Detail & Related papers (2025-12-19T10:17:00Z) - Rethinking Convergence in Deep Learning: The Predictive-Corrective Paradigm for Anatomy-Informed Brain MRI Segmentation [30.94379425064039]
We introduce the Predictive-Corrective (PC) paradigm, a framework that decouples the modeling task to fundamentally accelerate learning.<n>PCambaNet is composed of two synergistic modules. First, the Predictive Prior Module (PPM) generates a coarse approximation at low computational cost.<n>Next, the Corrective Residual Network (CRN) learns to model the residual error, focusing the network's full capacity on refining these challenging regions.
arXiv Detail & Related papers (2025-10-17T08:51:33Z) - Adapting HFMCA to Graph Data: Self-Supervised Learning for Generalizable fMRI Representations [57.054499278843856]
Functional magnetic resonance imaging (fMRI) analysis faces significant challenges due to limited dataset sizes and domain variability between studies.<n>Traditional self-supervised learning methods inspired by computer vision often rely on positive and negative sample pairs.<n>We propose adapting a recently developed Hierarchical Functional Maximal Correlation Algorithm (HFMCA) to graph-structured fMRI data.
arXiv Detail & Related papers (2025-10-05T12:35:01Z) - Enhancing Orthopox Image Classification Using Hybrid Machine Learning and Deep Learning Models [40.325359811289445]
This paper uses Machine Learning models combined with pretrained Deep Learning models to extract deep feature representations without the need for augmented data.<n>The findings show that this feature extraction method, when paired with other methods in the state-of-the-art, produces excellent classification outcomes.
arXiv Detail & Related papers (2025-06-06T11:52:07Z) - Learning Massive-scale Partial Correlation Networks in Clinical Multi-omics Studies with HP-ACCORD [9.759451770552284]
We introduce a novel pseudolikelihood-based graphical model framework for multi-omics data analysis.<n>The proposed estimator maintains estimation and selection consistency in various metrics under high-dimensional assumptions.<n>A high-performance computing implementation of our framework was tested using simulated data with up to one million variables.
arXiv Detail & Related papers (2024-12-16T08:38:02Z) - A Hybrid Framework for Statistical Feature Selection and Image-Based Noise-Defect Detection [55.2480439325792]
This paper presents a hybrid framework that integrates both statistical feature selection and classification techniques to improve defect detection accuracy.<n>We present around 55 distinguished features that are extracted from industrial images, which are then analyzed using statistical methods.<n>By integrating these methods with flexible machine learning applications, the proposed framework improves detection accuracy and reduces false positives and misclassifications.
arXiv Detail & Related papers (2024-12-11T22:12:21Z) - Neural Corrective Machine Unranking [3.2340528215722553]
We formalise corrective unranking and propose a novel teacher-student framework, Corrective unRanking Distillation (CuRD)<n>CuRD facilitates forgetting by adjusting the (trained) neural IR model such that its output relevance scores of to-be-forgotten samples mimic those of low-ranking, non-retrievable samples.<n>Experiments with forget set sizes from 1 % and 20 % of the training dataset demonstrate that CuRD outperforms seven state-of-the-art baselines in terms of forgetting and correction.
arXiv Detail & Related papers (2024-11-13T12:19:46Z) - Refining Tuberculosis Detection in CXR Imaging: Addressing Bias in Deep Neural Networks via Interpretability [1.9936075659851882]
We argue that the reliability of deep learning models is limited, even if they can be shown to obtain perfect classification accuracy on the test data.
We show that pre-training a deep neural network on a large-scale proxy task, as well as using mixed objective optimization network (MOON), can improve the alignment of decision foundations between models and experts.
arXiv Detail & Related papers (2024-07-19T06:41:31Z) - Machine Learning for ALSFRS-R Score Prediction: Making Sense of the Sensor Data [44.99833362998488]
Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options.
The present investigation, spearheaded by the iDPP@CLEF 2024 challenge, focuses on utilizing sensor-derived data obtained through an app.
arXiv Detail & Related papers (2024-07-10T19:17:23Z) - Rethinking model prototyping through the MedMNIST+ dataset collection [0.11999555634662634]
This work introduces a comprehensive benchmark for the MedMNIST+ dataset collection.<n>We reassess commonly used Convolutional Neural Networks (CNNs) and Vision Transformer (ViT) architectures across distinct medical datasets.<n>Our findings suggest that computationally efficient training schemes and modern foundation models offer viable alternatives to costly end-to-end training.
arXiv Detail & Related papers (2024-04-24T10:19:25Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information
Retrieval Models [41.45240621979654]
We introduce BEIR, a heterogeneous benchmark for information retrieval.
We study the effectiveness of nine state-of-the-art retrieval models in a zero-shot evaluation setup.
Dense-retrieval models are computationally more efficient but often underperform other approaches.
arXiv Detail & Related papers (2021-04-17T23:29:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.