Related papers: A Reproducible Framework for Bias-Resistant Machine Learning on Small-Sample Neuroimaging Data

A Reproducible Framework for Bias-Resistant Machine Learning on Small-Sample Neuroimaging Data

URL: http://arxiv.org/abs/2602.02920v1
Date: Mon, 02 Feb 2026 23:47:57 GMT
Title: A Reproducible Framework for Bias-Resistant Machine Learning on Small-Sample Neuroimaging Data
Authors: Jagan Mohan Reddy Dwarampudi, Jennifer L Purks, Joshua Wong, Renjie Hu, Tania Banerjee,
Abstract summary: We introduce a reproducible, bias-resistant machine learning framework that integrates domain-informed feature engineering, nested cross-validation, and calibrated decision-threshold optimization for small-sample neuroimaging data.<n>By combining interpretability and unbiased evaluation, this work provides a generalizable computational blueprint for reliable machine learning in data-limited biomedical domains.
Score: 1.2287733479434337
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a reproducible, bias-resistant machine learning framework that integrates domain-informed feature engineering, nested cross-validation, and calibrated decision-threshold optimization for small-sample neuroimaging data. Conventional cross-validation frameworks that reuse the same folds for both model selection and performance estimation yield optimistically biased results, limiting reproducibility and generalization. Demonstrated on a high-dimensional structural MRI dataset of deep brain stimulation cognitive outcomes, the framework achieved a nested-CV balanced accuracy of 0.660\,$\pm$\,0.068 using a compact, interpretable subset selected via importance-guided ranking. By combining interpretability and unbiased evaluation, this work provides a generalizable computational blueprint for reliable machine learning in data-limited biomedical domains.

Related papers

Improving Cardiac Risk Prediction Using Data Generation Techniques [37.94487163156369]
This work proposes an architecture for the synthesis of realistic clinical records that are coherent with real-world observations.<n>The primary objective is to increase the size and diversity of the available datasets in order to enhance the performance of cardiac risk prediction models.
arXiv Detail & Related papers (2025-12-19T10:17:00Z)
Rethinking Convergence in Deep Learning: The Predictive-Corrective Paradigm for Anatomy-Informed Brain MRI Segmentation [30.94379425064039]
We introduce the Predictive-Corrective (PC) paradigm, a framework that decouples the modeling task to fundamentally accelerate learning.<n>PCambaNet is composed of two synergistic modules. First, the Predictive Prior Module (PPM) generates a coarse approximation at low computational cost.<n>Next, the Corrective Residual Network (CRN) learns to model the residual error, focusing the network's full capacity on refining these challenging regions.
arXiv Detail & Related papers (2025-10-17T08:51:33Z)
Adapting HFMCA to Graph Data: Self-Supervised Learning for Generalizable fMRI Representations [57.054499278843856]
Functional magnetic resonance imaging (fMRI) analysis faces significant challenges due to limited dataset sizes and domain variability between studies.<n>Traditional self-supervised learning methods inspired by computer vision often rely on positive and negative sample pairs.<n>We propose adapting a recently developed Hierarchical Functional Maximal Correlation Algorithm (HFMCA) to graph-structured fMRI data.
arXiv Detail & Related papers (2025-10-05T12:35:01Z)
Enhancing Orthopox Image Classification Using Hybrid Machine Learning and Deep Learning Models [40.325359811289445]
This paper uses Machine Learning models combined with pretrained Deep Learning models to extract deep feature representations without the need for augmented data.<n>The findings show that this feature extraction method, when paired with other methods in the state-of-the-art, produces excellent classification outcomes.
arXiv Detail & Related papers (2025-06-06T11:52:07Z)
Learning Massive-scale Partial Correlation Networks in Clinical Multi-omics Studies with HP-ACCORD [9.759451770552284]
We introduce a novel pseudolikelihood-based graphical model framework for multi-omics data analysis.<n>The proposed estimator maintains estimation and selection consistency in various metrics under high-dimensional assumptions.<n>A high-performance computing implementation of our framework was tested using simulated data with up to one million variables.
arXiv Detail & Related papers (2024-12-16T08:38:02Z)
A Hybrid Framework for Statistical Feature Selection and Image-Based Noise-Defect Detection [55.2480439325792]
This paper presents a hybrid framework that integrates both statistical feature selection and classification techniques to improve defect detection accuracy.<n>We present around 55 distinguished features that are extracted from industrial images, which are then analyzed using statistical methods.<n>By integrating these methods with flexible machine learning applications, the proposed framework improves detection accuracy and reduces false positives and misclassifications.
arXiv Detail & Related papers (2024-12-11T22:12:21Z)
Neural Corrective Machine Unranking [3.2340528215722553]
We formalise corrective unranking and propose a novel teacher-student framework, Corrective unRanking Distillation (CuRD)<n>CuRD facilitates forgetting by adjusting the (trained) neural IR model such that its output relevance scores of to-be-forgotten samples mimic those of low-ranking, non-retrievable samples.<n>Experiments with forget set sizes from 1 % and 20 % of the training dataset demonstrate that CuRD outperforms seven state-of-the-art baselines in terms of forgetting and correction.
arXiv Detail & Related papers (2024-11-13T12:19:46Z)
Refining Tuberculosis Detection in CXR Imaging: Addressing Bias in Deep Neural Networks via Interpretability [1.9936075659851882]
We argue that the reliability of deep learning models is limited, even if they can be shown to obtain perfect classification accuracy on the test data. We show that pre-training a deep neural network on a large-scale proxy task, as well as using mixed objective optimization network (MOON), can improve the alignment of decision foundations between models and experts.
arXiv Detail & Related papers (2024-07-19T06:41:31Z)
Machine Learning for ALSFRS-R Score Prediction: Making Sense of the Sensor Data [44.99833362998488]
Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options. The present investigation, spearheaded by the iDPP@CLEF 2024 challenge, focuses on utilizing sensor-derived data obtained through an app.
arXiv Detail & Related papers (2024-07-10T19:17:23Z)
Rethinking model prototyping through the MedMNIST+ dataset collection [0.11999555634662634]
This work introduces a comprehensive benchmark for the MedMNIST+ dataset collection.<n>We reassess commonly used Convolutional Neural Networks (CNNs) and Vision Transformer (ViT) architectures across distinct medical datasets.<n>Our findings suggest that computationally efficient training schemes and modern foundation models offer viable alternatives to costly end-to-end training.
arXiv Detail & Related papers (2024-04-24T10:19:25Z)
Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data. We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z)
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models. We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z)
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models [41.45240621979654]
We introduce BEIR, a heterogeneous benchmark for information retrieval. We study the effectiveness of nine state-of-the-art retrieval models in a zero-shot evaluation setup. Dense-retrieval models are computationally more efficient but often underperform other approaches.
arXiv Detail & Related papers (2021-04-17T23:29:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.