Bridging the Gap: Generalising State-of-the-Art U-Net Models to
Sub-Saharan African Populations
- URL: http://arxiv.org/abs/2312.11770v1
- Date: Tue, 19 Dec 2023 01:03:19 GMT
- Title: Bridging the Gap: Generalising State-of-the-Art U-Net Models to
Sub-Saharan African Populations
- Authors: Alyssa R. Amod, Alexandra Smith, Pearly Joubert, Confidence Raymond,
Dong Zhang, Udunna C. Anazodo, Dodzi Motchon, Tinashe E.M. Mutsvangwa, and
S\'ebastien Quetin
- Abstract summary: A critical challenge for tumour segmentation models is the ability to adapt to diverse clinical settings.
We replicated a framework that secured the 2nd position in the 2022 BraTS competition to investigate the impact of dataset composition on model performance.
- Score: 37.59488403618245
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A critical challenge for tumour segmentation models is the ability to adapt
to diverse clinical settings, particularly when applied to poor-quality
neuroimaging data. The uncertainty surrounding this adaptation stems from the
lack of representative datasets, leaving top-performing models without exposure
to common artifacts found in MRI data throughout Sub-Saharan Africa (SSA). We
replicated a framework that secured the 2nd position in the 2022 BraTS
competition to investigate the impact of dataset composition on model
performance and pursued four distinct approaches through training a model with:
1) BraTS-Africa data only (train_SSA, N=60), 2) BraTS-Adult Glioma data only
(train_GLI, N=1251), 3) both datasets together (train_ALL, N=1311), and 4)
through further training the train_GLI model with BraTS-Africa data
(train_ftSSA). Notably, training on a smaller low-quality dataset alone
(train_SSA) yielded subpar results, and training on a larger high-quality
dataset alone (train_GLI) struggled to delineate oedematous tissue in the
low-quality validation set. The most promising approach (train_ftSSA) involved
pre-training a model on high-quality neuroimages and then fine-tuning it on the
smaller, low-quality dataset. This approach outperformed the others, ranking
second in the MICCAI BraTS Africa global challenge external testing phase.
These findings underscore the significance of larger sample sizes and broad
exposure to data in improving segmentation performance. Furthermore, we
demonstrated that there is potential for improving such models by fine-tuning
them with a wider range of data locally.
Related papers
- Less is More: Adaptive Coverage for Synthetic Training Data [20.136698279893857]
This study introduces a novel sampling algorithm, based on the maximum coverage problem, to select a representative subset from a synthetically generated dataset.
Our results demonstrate that training a classifier on this contextually sampled subset achieves superior performance compared to training on the entire dataset.
arXiv Detail & Related papers (2025-04-20T06:45:16Z) - Toward Generalizable Multiple Sclerosis Lesion Segmentation Models [0.0]
This study aims to develop models that generalize across diverse evaluation datasets.
We used all high-quality publicly-available MS lesion segmentation datasets on which we systematically trained a state-of-the-art UNet++ architecture.
arXiv Detail & Related papers (2024-10-25T15:21:54Z) - FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models [37.76576626976729]
One-Shot Federated Learning (OSFL), a special decentralized machine learning paradigm, has recently gained significant attention.
Current methods face challenges due to client data heterogeneity and limited data quantity when applied to real-world OSFL systems.
We propose Federated Bi-Level Personalization (FedBiP), which personalizes the pretrained LDM at both instance-level and concept-level.
arXiv Detail & Related papers (2024-10-07T07:45:18Z) - Does Data-Efficient Generalization Exacerbate Bias in Foundation Models? [2.298227866545911]
Foundation models have emerged as robust models with label efficiency in diverse domains.
It is unclear whether using a large amount of unlabeled data, biased by the presence of sensitive attributes during pre-training, influences the fairness of the model.
This research examines the bias in the Foundation model when it is applied to fine-tune the Brazilian Multilabel Ophthalmological dataset.
arXiv Detail & Related papers (2024-08-28T22:14:44Z) - Ranking & Reweighting Improves Group Distributional Robustness [14.021069321266516]
We propose a ranking-based training method called Discounted Rank Upweighting (DRU) to learn models that exhibit strong OOD performance on the test data.
Results on several synthetic and real-world datasets highlight the superior ability of our group-ranking-based (akin to soft-minimax) approach in selecting and learning models that are robust to group distributional shifts.
arXiv Detail & Related papers (2023-05-09T20:37:16Z) - Imputing Knowledge Tracing Data with Subject-Based Training via LSTM
Variational Autoencoders Frameworks [6.24828623162058]
We adopt a subject-based training method to split and impute data by student IDs instead of row number splitting.
We leverage two existing deep generative frameworks, namely variational Autoencoders (VAE) and Longitudinal Variational Autoencoders (LVAE)
We demonstrate that the generated data from LSTM-VAE and LSTM-LVAE can boost the original model performance by about 50%.
arXiv Detail & Related papers (2023-02-24T21:56:03Z) - FedDM: Iterative Distribution Matching for Communication-Efficient
Federated Learning [87.08902493524556]
Federated learning(FL) has recently attracted increasing attention from academia and industry.
We propose FedDM to build the global training objective from multiple local surrogate functions.
In detail, we construct synthetic sets of data on each client to locally match the loss landscape from original data.
arXiv Detail & Related papers (2022-07-20T04:55:18Z) - Self-Supervised Pre-Training for Transformer-Based Person
Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID)
Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance.
This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - Regularizing Generative Adversarial Networks under Limited Data [88.57330330305535]
This work proposes a regularization approach for training robust GAN models on limited data.
We show a connection between the regularized loss and an f-divergence called LeCam-divergence, which we find is more robust under limited training data.
arXiv Detail & Related papers (2021-04-07T17:59:06Z) - Deep F-measure Maximization for End-to-End Speech Understanding [52.36496114728355]
We propose a differentiable approximation to the F-measure and train the network with this objective using standard backpropagation.
We perform experiments on two standard fairness datasets, Adult, Communities and Crime, and also on speech-to-intent detection on the ATIS dataset and speech-to-image concept classification on the Speech-COCO dataset.
In all four of these tasks, F-measure results in improved micro-F1 scores, with absolute improvements of up to 8% absolute, as compared to models trained with the cross-entropy loss function.
arXiv Detail & Related papers (2020-08-08T03:02:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.