Exploring Federated Deep Learning for Standardising Naming Conventions
in Radiotherapy Data
- URL: http://arxiv.org/abs/2402.08999v1
- Date: Wed, 14 Feb 2024 07:52:28 GMT
- Title: Exploring Federated Deep Learning for Standardising Naming Conventions
in Radiotherapy Data
- Authors: Ali Haidar, Daniel Al Mouiee, Farhannah Aly, David Thwaites, Lois
Holloway
- Abstract summary: Standardising structure volume names in radiotherapy (RT) data is necessary to enable data mining and analyses.
No studies have considered that RT patient records are distributed across multiple data centres.
This paper introduces a method that emulates real-world environments to establish standardised nomenclature.
A multimodal deep artificial neural network was proposed to standardise RT data in federated settings.
- Score: 0.18749305679160366
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Standardising structure volume names in radiotherapy (RT) data is necessary
to enable data mining and analyses, especially across multi-institutional
centres. This process is time and resource intensive, which highlights the need
for new automated and efficient approaches to handle the task. Several machine
learning-based methods have been proposed and evaluated to standardise
nomenclature. However, no studies have considered that RT patient records are
distributed across multiple data centres. This paper introduces a method that
emulates real-world environments to establish standardised nomenclature. This
is achieved by integrating decentralised real-time data and federated learning
(FL). A multimodal deep artificial neural network was proposed to standardise
RT data in federated settings. Three types of possible attributes were
extracted from the structures to train the deep learning models: tabular,
visual, and volumetric. Simulated experiments were carried out to train the
models across several scenarios including multiple data centres, input
modalities, and aggregation strategies. The models were compared against models
developed with single modalities in federated settings, in addition to models
trained in centralised settings. Categorical classification accuracy was
calculated on hold-out samples to inform the models performance. Our results
highlight the need for fusing multiple modalities when training such models,
with better performance reported with tabular-volumetric models. In addition,
we report comparable accuracy compared to models built in centralised settings.
This demonstrates the suitability of FL for handling the standardization task.
Additional ablation analyses showed that the total number of samples in the
data centres and the number of data centres highly affects the training process
and should be carefully considered when building standardisation models.
Related papers
- A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular Data [9.57464542357693]
This paper demonstrates that model-centric evaluations are biased, as real-world modeling pipelines often require dataset-specific preprocessing and feature engineering.
We select 10 relevant datasets from Kaggle competitions and implement expert-level preprocessing pipelines for each dataset.
After dataset-specific feature engineering, model rankings change considerably, performance differences decrease, and the importance of model selection reduces.
arXiv Detail & Related papers (2024-07-02T09:54:39Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - Exploring Data Redundancy in Real-world Image Classification through
Data Selection [20.389636181891515]
Deep learning models often require large amounts of data for training, leading to increased costs.
We present two data valuation metrics based on Synaptic Intelligence and gradient norms, respectively, to study redundancy in real-world image data.
Online and offline data selection algorithms are then proposed via clustering and grouping based on the examined data values.
arXiv Detail & Related papers (2023-06-25T03:31:05Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Adaptive Personlization in Federated Learning for Highly Non-i.i.d. Data [37.667379000751325]
Federated learning (FL) is a distributed learning method that offers medical institutes the prospect of collaboration in a global model.
In this work, we investigate an adaptive hierarchical clustering method for FL to produce intermediate semi-global models.
Our experiments demonstrate significant performance gain in heterogeneous distribution compared to standard FL methods in classification accuracy.
arXiv Detail & Related papers (2022-07-07T17:25:04Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - A Personalized Federated Learning Algorithm: an Application in Anomaly
Detection [0.6700873164609007]
Federated Learning (FL) has recently emerged as a promising method to overcome data privacy and transmission issues.
In FL, datasets collected from different devices or sensors are used to train local models (clients) each of which shares its learning with a centralized model (server)
This paper proposes a novel Personalized FedAvg (PC-FedAvg) which aims to control weights communication and aggregation augmented with a tailored learning algorithm to personalize the resulting models at each client.
arXiv Detail & Related papers (2021-11-04T04:57:11Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Ensemble Distillation for Robust Model Fusion in Federated Learning [72.61259487233214]
Federated Learning (FL) is a machine learning setting where many devices collaboratively train a machine learning model.
In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side.
We propose ensemble distillation for model fusion, i.e. training the central classifier through unlabeled data on the outputs of the models from the clients.
arXiv Detail & Related papers (2020-06-12T14:49:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.