Interactive exploration of population scale pharmacoepidemiology
datasets
- URL: http://arxiv.org/abs/2005.09890v1
- Date: Wed, 20 May 2020 07:34:50 GMT
- Title: Interactive exploration of population scale pharmacoepidemiology
datasets
- Authors: Tengel Ekrem Skar, Einar Holsb{\o}, Kristian Svendsen, Lars Ailo Bongo
- Abstract summary: Population-scale drug prescription data linked with adverse drug reaction (ADR) supports the fitting of models large enough to detect drug use and ADR patterns.
detecting ADR patterns in large datasets requires tools for scalable data processing, machine learning for data analysis, and interactive visualization.
We have created a tool for interactive exploration of patterns in prescription datasets with millions of samples.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Population-scale drug prescription data linked with adverse drug reaction
(ADR) data supports the fitting of models large enough to detect drug use and
ADR patterns that are not detectable using traditional methods on smaller
datasets. However, detecting ADR patterns in large datasets requires tools for
scalable data processing, machine learning for data analysis, and interactive
visualization. To our knowledge no existing pharmacoepidemiology tool supports
all three requirements. We have therefore created a tool for interactive
exploration of patterns in prescription datasets with millions of samples. We
use Spark to preprocess the data for machine learning and for analyses using
SQL queries. We have implemented models in Keras and the scikit-learn
framework. The model results are visualized and interpreted using live Python
coding in Jupyter. We apply our tool to explore a 384 million prescription data
set from the Norwegian Prescription Database combined with a 62 million
prescriptions for elders that were hospitalized. We preprocess the data in two
minutes, train models in seconds, and plot the results in milliseconds. Our
results show the power of combining computational power, short computation
times, and ease of use for analysis of population scale pharmacoepidemiology
datasets. The code is open source and available at:
https://github.com/uit-hdl/norpd_prescription_analyses
Related papers
- PharmacoMatch: Efficient 3D Pharmacophore Screening through Neural Subgraph Matching [0.5113447003407372]
We introduce PharmacoMatch, a novel contrastive learning approach based on neural subgraph matching.
Our findings demonstrate significantly shorter runtimes for pharmacophore matching, offering a promising speed-up for screening very large datasets.
arXiv Detail & Related papers (2024-09-10T08:17:06Z) - Drug Synergistic Combinations Predictions via Large-Scale Pre-Training
and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation.
Deep learning models have emerged as an efficient way to discover synergistic combinations.
Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z) - On the data requirements of probing [20.965328323152608]
We present a novel method to estimate the required number of data samples for probing datasets.
Our framework helps to systematically construct probing datasets to diagnose neural NLP models.
arXiv Detail & Related papers (2022-02-25T16:27:06Z) - A Real Use Case of Semi-Supervised Learning for Mammogram Classification
in a Local Clinic of Costa Rica [0.5541644538483946]
Training a deep learning model requires a considerable amount of labeled images.
A number of publicly available datasets have been built with data from different hospitals and clinics.
The use of the semi-supervised deep learning approach known as MixMatch, to leverage the usage of unlabeled data is proposed and evaluated.
arXiv Detail & Related papers (2021-07-24T22:26:50Z) - PyHealth: A Python Library for Health Predictive Models [53.848478115284195]
PyHealth is an open-source Python toolbox for developing various predictive models on healthcare data.
The data preprocessing module enables the transformation of complex healthcare datasets into machine learning friendly formats.
The predictive modeling module provides more than 30 machine learning models, including established ensemble trees and deep neural network-based approaches.
arXiv Detail & Related papers (2021-01-11T22:02:08Z) - DecAug: Augmenting HOI Detection via Decomposition [54.65572599920679]
Current algorithms suffer from insufficient training samples and category imbalance within datasets.
We propose an efficient and effective data augmentation method called DecAug for HOI detection.
Experiments show that our method brings up to 3.3 mAP and 1.6 mAP improvements on V-COCO and HICODET dataset.
arXiv Detail & Related papers (2020-10-02T13:59:05Z) - Fast, Accurate, and Simple Models for Tabular Data via Augmented
Distillation [97.42894942391575]
We propose FAST-DAD to distill arbitrarily complex ensemble predictors into individual models like boosted trees, random forests, and deep networks.
Our individual distilled models are over 10x faster and more accurate than ensemble predictors produced by AutoML tools like H2O/AutoSklearn.
arXiv Detail & Related papers (2020-06-25T09:57:47Z) - Ensemble Transfer Learning for the Prediction of Anti-Cancer Drug
Response [49.86828302591469]
In this paper, we apply transfer learning to the prediction of anti-cancer drug response.
We apply the classic transfer learning framework that trains a prediction model on the source dataset and refines it on the target dataset.
The ensemble transfer learning pipeline is implemented using LightGBM and two deep neural network (DNN) models with different architectures.
arXiv Detail & Related papers (2020-05-13T20:29:48Z) - Unsupervised Pre-trained Models from Healthy ADLs Improve Parkinson's
Disease Classification of Gait Patterns [3.5939555573102857]
We show how to extract features relevant to accelerometer gait data for Parkinson's disease classification.
Our pre-trained source model consists of a convolutional autoencoder, and the target classification model is a simple multi-layer perceptron model.
We explore two different pre-trained source models, trained using different activity groups, and analyze the influence the choice of pre-trained model has over the task of Parkinson's disease classification.
arXiv Detail & Related papers (2020-05-06T04:08:19Z) - PyODDS: An End-to-end Outlier Detection System with Automated Machine
Learning [55.32009000204512]
We present PyODDS, an automated end-to-end Python system for Outlier Detection with Database Support.
Specifically, we define the search space in the outlier detection pipeline, and produce a search strategy within the given search space.
It also provides unified interfaces and visualizations for users with or without data science or machine learning background.
arXiv Detail & Related papers (2020-03-12T03:30:30Z) - Deep generative models in DataSHIELD [0.0]
In Germany, for example, it is not possible to pool routine data from different hospitals for research purposes without the consent of the patients.
The DataSHIELD software provides an infrastructure and a set of statistical methods for joint analyses of distributed data.
We present a methodology together with a software implementation that builds on DataSHIELD to create artificial data that preserve complex patterns from distributed individual patient data.
arXiv Detail & Related papers (2020-03-11T10:15:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.