TELII: Temporal Event Level Inverted Indexing for Cohort Discovery on a Large Covid-19 EHR Dataset
- URL: http://arxiv.org/abs/2410.17134v1
- Date: Tue, 22 Oct 2024 16:06:33 GMT
- Title: TELII: Temporal Event Level Inverted Indexing for Cohort Discovery on a Large Covid-19 EHR Dataset
- Authors: Yan Huang,
- Abstract summary: TELII is a temporal event level inverted indexing method designed for cohort discovery on large EHR datasets.
We implement TELII for the OPTUM de-identified COVID-19 EHR dataset, which contains data from 8.87 million patients.
Results show that the temporal query speed for TELII is up to 2000 times faster than that of existing non-temporal inverted indexes.
- Score: 4.872926155522239
- License:
- Abstract: Cohort discovery is a crucial step in clinical research on Electronic Health Record (EHR) data. Temporal queries, which are common in cohort discovery, can be time-consuming and prone to errors when processed on large EHR datasets. In this work, we introduce TELII, a temporal event level inverted indexing method designed for cohort discovery on large EHR datasets. TELII is engineered to pre-compute and store the relations along with the time difference between events, thereby providing fast and accurate temporal query capabilities. We implemented TELII for the OPTUM de-identified COVID-19 EHR dataset, which contains data from 8.87 million patients. We demonstrate four common temporal query tasks and their implementation using TELII with a MongoDB backend. Our results show that the temporal query speed for TELII is up to 2000 times faster than that of existing non-temporal inverted indexes. TELII achieves millisecond-level response times, enabling users to quickly explore event relations and find preliminary evidence for their research questions. Not only is TELII practical and straightforward to implement, but it also offers easy adaptability to other EHR datasets. These advantages underscore TELII's potential to serve as the query engine for EHR-based applications, ensuring fast, accurate, and user-friendly query responses.
Related papers
- Enhancing Legal Case Retrieval via Scaling High-quality Synthetic Query-Candidate Pairs [67.54302101989542]
Legal case retrieval aims to provide similar cases as references for a given fact description.
Existing works mainly focus on case-to-case retrieval using lengthy queries.
Data scale is insufficient to satisfy the training requirements of existing data-hungry neural models.
arXiv Detail & Related papers (2024-10-09T06:26:39Z) - BernGraph: Probabilistic Graph Neural Networks for EHR-based Medication Recommendations [28.456738816539488]
Medical community believes binary medical event outcomes in EHR data contain sufficient information for making sensible recommendation.
Modeling the relationship between massive 0,1 event outcomes is difficult, even with expert knowledge.
In practice, learning can be stalled by the binary values since the equally important 0 entries propagate no learning signals.
arXiv Detail & Related papers (2024-08-18T08:52:27Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - Dynamic Data Pruning for Automatic Speech Recognition [58.95758272440217]
We introduce Dynamic Data Pruning for ASR (DDP-ASR), which offers fine-grained pruning granularities specifically tailored for speech-related datasets.
Our experiments show that DDP-ASR can save up to 1.6x training time with negligible performance loss.
arXiv Detail & Related papers (2024-06-26T14:17:36Z) - REXEL: An End-to-end Model for Document-Level Relation Extraction and Entity Linking [11.374031643273941]
REXEL is a highly efficient and accurate model for the joint task of document level cIE (DocIE)
It is on average 11 times faster than competitive existing approaches in a similar setting.
The combination of speed and accuracy makes REXEL an accurate cost-efficient system for extracting structured information at web-scale.
arXiv Detail & Related papers (2024-04-19T11:04:27Z) - Few-shot Link Prediction on N-ary Facts [70.8150181683017]
Link Prediction on Hyper-relational Facts (LPHFs) is to predict a missing element in a hyper-relational fact.
Few-Shot Link Prediction on Hyper-relational Facts (PHFs) aims to predict a missing entity in a hyper-relational fact with limited support instances.
arXiv Detail & Related papers (2023-05-10T12:44:00Z) - RETE: Retrieval-Enhanced Temporal Event Forecasting on Unified Query
Product Evolutionary Graph [18.826901341496143]
Temporal event forecasting is a new user behavior prediction task in a unified query product evolutionary graph.
We propose a novel RetrievalEnhanced Event forecasting framework.
Unlike existing methods, we propose methods that enhance user representations via roughly connected entities in the whole graph.
arXiv Detail & Related papers (2022-02-12T19:27:56Z) - Approximating Aggregated SQL Queries With LSTM Networks [31.528524004435933]
We present a method for query approximation, also known as approximate query processing (AQP)
We use LSTM network to learn the relationship between queries and their results, and to provide a rapid inference layer for predicting query results.
Our method was able to predict up to 120,000 queries in a second, and with a single query latency of no more than 2ms.
arXiv Detail & Related papers (2020-10-25T16:17:58Z) - Sequential Recommender via Time-aware Attentive Memory Network [67.26862011527986]
We propose a temporal gating methodology to improve attention mechanism and recurrent units.
We also propose a Multi-hop Time-aware Attentive Memory network to integrate long-term and short-term preferences.
Our approach is scalable for candidate retrieval tasks and can be viewed as a non-linear generalization of latent factorization for dot-product based Top-K recommendation.
arXiv Detail & Related papers (2020-05-18T11:29:38Z) - DeepEnroll: Patient-Trial Matching with Deep Embedding and Entailment
Prediction [67.91606509226132]
Clinical trials are essential for drug development but often suffer from expensive, inaccurate and insufficient patient recruitment.
DeepEnroll is a cross-modal inference learning model to jointly encode enrollment criteria (tabular data) into a shared latent space for matching inference.
arXiv Detail & Related papers (2020-01-22T17:51:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.