Related papers: Interactive exploration of population scale pharmacoepidemiology datasets

Interactive exploration of population scale pharmacoepidemiology datasets

URL: http://arxiv.org/abs/2005.09890v1
Date: Wed, 20 May 2020 07:34:50 GMT
Title: Interactive exploration of population scale pharmacoepidemiology datasets
Authors: Tengel Ekrem Skar, Einar Holsb{\o}, Kristian Svendsen, Lars Ailo Bongo
Abstract summary: Population-scale drug prescription data linked with adverse drug reaction (ADR) supports the fitting of models large enough to detect drug use and ADR patterns. detecting ADR patterns in large datasets requires tools for scalable data processing, machine learning for data analysis, and interactive visualization. We have created a tool for interactive exploration of patterns in prescription datasets with millions of samples.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Population-scale drug prescription data linked with adverse drug reaction (ADR) data supports the fitting of models large enough to detect drug use and ADR patterns that are not detectable using traditional methods on smaller datasets. However, detecting ADR patterns in large datasets requires tools for scalable data processing, machine learning for data analysis, and interactive visualization. To our knowledge no existing pharmacoepidemiology tool supports all three requirements. We have therefore created a tool for interactive exploration of patterns in prescription datasets with millions of samples. We use Spark to preprocess the data for machine learning and for analyses using SQL queries. We have implemented models in Keras and the scikit-learn framework. The model results are visualized and interpreted using live Python coding in Jupyter. We apply our tool to explore a 384 million prescription data set from the Norwegian Prescription Database combined with a 62 million prescriptions for elders that were hospitalized. We preprocess the data in two minutes, train models in seconds, and plot the results in milliseconds. Our results show the power of combining computational power, short computation times, and ease of use for analysis of population scale pharmacoepidemiology datasets. The code is open source and available at: https://github.com/uit-hdl/norpd_prescription_analyses

Related papers

PharmacoMatch: Efficient 3D Pharmacophore Screening through Neural Subgraph Matching [0.5113447003407372]
We introduce PharmacoMatch, a novel contrastive learning approach based on neural subgraph matching. Our findings demonstrate significantly shorter runtimes for pharmacophore matching, offering a promising speed-up for screening very large datasets.
arXiv Detail & Related papers (2024-09-10T08:17:06Z)
Drug Synergistic Combinations Predictions via Large-Scale Pre-Training and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation. Deep learning models have emerged as an efficient way to discover synergistic combinations. Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z)
On the data requirements of probing [20.965328323152608]
We present a novel method to estimate the required number of data samples for probing datasets. Our framework helps to systematically construct probing datasets to diagnose neural NLP models.
arXiv Detail & Related papers (2022-02-25T16:27:06Z)
A Real Use Case of Semi-Supervised Learning for Mammogram Classification in a Local Clinic of Costa Rica [0.5541644538483946]
Training a deep learning model requires a considerable amount of labeled images. A number of publicly available datasets have been built with data from different hospitals and clinics. The use of the semi-supervised deep learning approach known as MixMatch, to leverage the usage of unlabeled data is proposed and evaluated.
arXiv Detail & Related papers (2021-07-24T22:26:50Z)
PyHealth: A Python Library for Health Predictive Models [53.848478115284195]
PyHealth is an open-source Python toolbox for developing various predictive models on healthcare data. The data preprocessing module enables the transformation of complex healthcare datasets into machine learning friendly formats. The predictive modeling module provides more than 30 machine learning models, including established ensemble trees and deep neural network-based approaches.
arXiv Detail & Related papers (2021-01-11T22:02:08Z)
DecAug: Augmenting HOI Detection via Decomposition [54.65572599920679]
Current algorithms suffer from insufficient training samples and category imbalance within datasets. We propose an efficient and effective data augmentation method called DecAug for HOI detection. Experiments show that our method brings up to 3.3 mAP and 1.6 mAP improvements on V-COCO and HICODET dataset.
arXiv Detail & Related papers (2020-10-02T13:59:05Z)
Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation [97.42894942391575]
We propose FAST-DAD to distill arbitrarily complex ensemble predictors into individual models like boosted trees, random forests, and deep networks. Our individual distilled models are over 10x faster and more accurate than ensemble predictors produced by AutoML tools like H2O/AutoSklearn.
arXiv Detail & Related papers (2020-06-25T09:57:47Z)
Ensemble Transfer Learning for the Prediction of Anti-Cancer Drug Response [49.86828302591469]
In this paper, we apply transfer learning to the prediction of anti-cancer drug response. We apply the classic transfer learning framework that trains a prediction model on the source dataset and refines it on the target dataset. The ensemble transfer learning pipeline is implemented using LightGBM and two deep neural network (DNN) models with different architectures.
arXiv Detail & Related papers (2020-05-13T20:29:48Z)
Unsupervised Pre-trained Models from Healthy ADLs Improve Parkinson's Disease Classification of Gait Patterns [3.5939555573102857]
We show how to extract features relevant to accelerometer gait data for Parkinson's disease classification. Our pre-trained source model consists of a convolutional autoencoder, and the target classification model is a simple multi-layer perceptron model. We explore two different pre-trained source models, trained using different activity groups, and analyze the influence the choice of pre-trained model has over the task of Parkinson's disease classification.
arXiv Detail & Related papers (2020-05-06T04:08:19Z)
PyODDS: An End-to-end Outlier Detection System with Automated Machine Learning [55.32009000204512]
We present PyODDS, an automated end-to-end Python system for Outlier Detection with Database Support. Specifically, we define the search space in the outlier detection pipeline, and produce a search strategy within the given search space. It also provides unified interfaces and visualizations for users with or without data science or machine learning background.
arXiv Detail & Related papers (2020-03-12T03:30:30Z)
Deep generative models in DataSHIELD [0.0]
In Germany, for example, it is not possible to pool routine data from different hospitals for research purposes without the consent of the patients. The DataSHIELD software provides an infrastructure and a set of statistical methods for joint analyses of distributed data. We present a methodology together with a software implementation that builds on DataSHIELD to create artificial data that preserve complex patterns from distributed individual patient data.
arXiv Detail & Related papers (2020-03-11T10:15:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.