A new paradigm for accelerating clinical data science at Stanford
Medicine
- URL: http://arxiv.org/abs/2003.10534v1
- Date: Tue, 17 Mar 2020 16:21:42 GMT
- Title: A new paradigm for accelerating clinical data science at Stanford
Medicine
- Authors: Somalee Datta, Jose Posada, Garrick Olson, Wencheng Li, Ciaran
O'Reilly, Deepa Balraj, Joseph Mesterhazy, Joseph Pallas, Priyamvada Desai,
Nigam Shah
- Abstract summary: Stanford Medicine is building a new data platform for our academic research community to do better clinical data science.
Hospitals have a large amount of patient data and researchers have demonstrated the ability to reuse that data and AI approaches.
We are establishing a new secure Big Data platform that aims to reduce time to access and analyze data.
- Score: 1.3814679165245243
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stanford Medicine is building a new data platform for our academic research
community to do better clinical data science. Hospitals have a large amount of
patient data and researchers have demonstrated the ability to reuse that data
and AI approaches to derive novel insights, support patient care, and improve
care quality. However, the traditional data warehouse and Honest Broker
approaches that are in current use, are not scalable. We are establishing a new
secure Big Data platform that aims to reduce time to access and analyze data.
In this platform, data is anonymized to preserve patient data privacy and made
available preparatory to Institutional Review Board (IRB) submission.
Furthermore, the data is standardized such that analysis done at Stanford can
be replicated elsewhere using the same analytical code and clinical concepts.
Finally, the analytics data warehouse integrates with a secure data science
computational facility to support large scale data analytics. The ecosystem is
designed to bring the modern data science community to highly sensitive
clinical data in a secure and collaborative big data analytics environment with
a goal to enable bigger, better and faster science.
Related papers
- Pennsieve: A Collaborative Platform for Translational Neuroscience and Beyond [0.5130659559809153]
Pennsieve is an open-source, cloud-based scientific data management platform.
It supports complex multimodal datasets and provides tools for data visualization and analyses.
Pennsieve stores over 125 TB of scientific data, with 35 TB of data publicly available across more than 350 high-impact datasets.
arXiv Detail & Related papers (2024-09-16T17:55:58Z) - iASiS: Towards Heterogeneous Big Data Analysis for Personalized Medicine [28.917691563659467]
The iASiS infrastructure is able to convert clinical notes into usable data.
Using semantic integration of data gives the opportunity to generate information rich, auditable and reliable.
Data resources for two different disease categories are explored within the iASiS use cases, dementia and lung cancer.
arXiv Detail & Related papers (2024-07-09T10:52:19Z) - TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets [57.067409211231244]
This paper presents meticulously curated AIready datasets covering multi-modal data (e.g., drug molecule, disease code, text, categorical/numerical features) and 8 crucial prediction challenges in clinical trial design.
We provide basic validation methods for each task to ensure the datasets' usability and reliability.
We anticipate that the availability of such open-access datasets will catalyze the development of advanced AI approaches for clinical trial design.
arXiv Detail & Related papers (2024-06-30T09:13:10Z) - Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - Building Flexible, Scalable, and Machine Learning-ready Multimodal
Oncology Datasets [17.774341783844026]
This work proposes Multimodal Integration of Oncology Data System (MINDS)
MINDS is a flexible, scalable, and cost-effective metadata framework for efficiently fusing disparate data from public sources.
By harmonizing multimodal data, MINDS aims to potentially empower researchers with greater analytical ability.
arXiv Detail & Related papers (2023-09-30T15:44:39Z) - Bringing the Algorithms to the Data -- Secure Distributed Medical
Analytics using the Personal Health Train (PHT-meDIC) [1.451998131020241]
Personal Health Train (PHT) paradigm implements an 'algorithm to the data' paradigm.
We present PHT-meDIC, a productively deployed open-source implementation of the PHT concept.
arXiv Detail & Related papers (2022-12-07T06:29:15Z) - Label scarcity in biomedicine: Data-rich latent factor discovery
enhances phenotype prediction [102.23901690661916]
Low-dimensional embedding spaces can be derived from the UK Biobank population dataset to enhance data-scarce prediction of health indicators, lifestyle and demographic characteristics.
Performances gains from semisupervison approaches will probably become an important ingredient for various medical data science applications.
arXiv Detail & Related papers (2021-10-12T16:25:50Z) - A highly scalable repository of waveform and vital signs data from
bedside monitoring devices [0.0]
Machine learning is driving the appetite of the research community for various types of signal data such as patient vitals.
Health care systems are ill suited for massive processing of large volumes of data.
We have developed a solution that siphons off patient vital data on a nightly basis from on-premises bio-medical systems to a cloud storage location as a permanent archive.
arXiv Detail & Related papers (2021-06-07T20:59:58Z) - FLOP: Federated Learning on Medical Datasets using Partial Networks [84.54663831520853]
COVID-19 Disease due to the novel coronavirus has caused a shortage of medical resources.
Different data-driven deep learning models have been developed to mitigate the diagnosis of COVID-19.
The data itself is still scarce due to patient privacy concerns.
We propose a simple yet effective algorithm, named textbfFederated textbfL textbfon Medical datasets using textbfPartial Networks (FLOP)
arXiv Detail & Related papers (2021-02-10T01:56:58Z) - Privacy-preserving medical image analysis [53.4844489668116]
We present PriMIA, a software framework designed for privacy-preserving machine learning (PPML) in medical imaging.
We show significantly better classification performance of a securely aggregated federated learning model compared to human experts on unseen datasets.
We empirically evaluate the framework's security against a gradient-based model inversion attack.
arXiv Detail & Related papers (2020-12-10T13:56:00Z) - Surgical Data Science -- from Concepts toward Clinical Translation [67.543698133416]
Surgical Data Science aims to improve the quality of interventional healthcare through the capture, organization, analysis and modeling of data.
We shed light on the underlying reasons and provide a roadmap for future advances in the field.
arXiv Detail & Related papers (2020-10-30T14:20:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.