Bridging the Generalisation Gap: Synthetic Data Generation for Multi-Site Clinical Model Validation
- URL: http://arxiv.org/abs/2504.20635v1
- Date: Tue, 29 Apr 2025 11:04:28 GMT
- Title: Bridging the Generalisation Gap: Synthetic Data Generation for Multi-Site Clinical Model Validation
- Authors: Bradley Segal, Joshua Fieggen, David Clifton, Lei Clifton,
- Abstract summary: Existing model evaluation approaches often rely on real-world datasets, which are limited in availability, embed confounding biases, and lack flexibility needed for systematic experimentation.<n>We propose a novel structured synthetic data framework designed for the controlled robustness of benchmarking model, fairness, and generalisability.
- Score: 0.3362278589492841
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ensuring the generalisability of clinical machine learning (ML) models across diverse healthcare settings remains a significant challenge due to variability in patient demographics, disease prevalence, and institutional practices. Existing model evaluation approaches often rely on real-world datasets, which are limited in availability, embed confounding biases, and lack the flexibility needed for systematic experimentation. Furthermore, while generative models aim for statistical realism, they often lack transparency and explicit control over factors driving distributional shifts. In this work, we propose a novel structured synthetic data framework designed for the controlled benchmarking of model robustness, fairness, and generalisability. Unlike approaches focused solely on mimicking observed data, our framework provides explicit control over the data generating process, including site-specific prevalence variations, hierarchical subgroup effects, and structured feature interactions. This enables targeted investigation into how models respond to specific distributional shifts and potential biases. Through controlled experiments, we demonstrate the framework's ability to isolate the impact of site variations, support fairness-aware audits, and reveal generalisation failures, particularly highlighting how model complexity interacts with site-specific effects. This work contributes a reproducible, interpretable, and configurable tool designed to advance the reliable deployment of ML in clinical settings.
Related papers
- Meta-Statistical Learning: Supervised Learning of Statistical Inference [59.463430294611626]
This work demonstrates that the tools and principles driving the success of large language models (LLMs) can be repurposed to tackle distribution-level tasks.<n>We propose meta-statistical learning, a framework inspired by multi-instance learning that reformulates statistical inference tasks as supervised learning problems.
arXiv Detail & Related papers (2025-02-17T18:04:39Z) - GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models.
GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies.
We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z) - Counterfactual Fairness through Transforming Data Orthogonal to Bias [7.109458605736819]
We propose a novel data pre-processing algorithm, Orthogonal to Bias (OB)
OB is designed to eliminate the influence of a group of continuous sensitive variables, thus promoting counterfactual fairness in machine learning applications.
OB is model-agnostic, making it applicable to a wide range of machine learning models and tasks.
arXiv Detail & Related papers (2024-03-26T16:40:08Z) - Principles from Clinical Research for NLP Model Generalization [10.985226652193543]
We explore the foundations of generalizability and study the factors that affect it.
We demonstrate how learning spurious correlations, such as the distance between entities in relation extraction tasks, can affect a model's internal validity.
arXiv Detail & Related papers (2023-11-07T02:17:25Z) - Out of Distribution Generalization via Interventional Style Transfer in
Single-Cell Microscopy [1.7778546320705952]
Real-world deployment of computer vision systems requires causal representations that are invariant to contextual nuisances.
We propose tests to assess the extent to which models learn causal representations across increasingly challenging levels of OOD-generalization.
We show that despite seemingly strong performance, as assessed by other established metrics, both naive and contemporary baselines designed to ward against confounding, collapse on these tests.
arXiv Detail & Related papers (2023-06-15T20:08:16Z) - Benchmarking Heterogeneous Treatment Effect Models through the Lens of
Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem.
Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools.
We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z) - A Differentially Private Probabilistic Framework for Modeling the
Variability Across Federated Datasets of Heterogeneous Multi-View
Observations [4.511923587827301]
We show that our framework can be effectively optimized through expectation (EM) over latent master's distribution and clients' parameters.
We tested our method on the analysis of multi-modal medical imaging data and clinical scores from distributed clinical datasets of patients affected by Alzheimer's disease.
arXiv Detail & Related papers (2022-04-15T07:20:47Z) - How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating
and Auditing Generative Models [95.8037674226622]
We introduce a 3-dimensional evaluation metric that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion.
Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity.
arXiv Detail & Related papers (2021-02-17T18:25:30Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z) - Estimating the Effects of Continuous-valued Interventions using
Generative Adversarial Networks [103.14809802212535]
We build on the generative adversarial networks (GANs) framework to address the problem of estimating the effect of continuous-valued interventions.
Our model, SCIGAN, is flexible and capable of simultaneously estimating counterfactual outcomes for several different continuous interventions.
To address the challenges presented by shifting to continuous interventions, we propose a novel architecture for our discriminator.
arXiv Detail & Related papers (2020-02-27T18:46:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.