Real-Time Health Analytics Using Ontology-Driven Complex Event Processing and LLM Reasoning: A Tuberculosis Case Study
- URL: http://arxiv.org/abs/2510.09646v1
- Date: Sun, 05 Oct 2025 14:21:46 GMT
- Title: Real-Time Health Analytics Using Ontology-Driven Complex Event Processing and LLM Reasoning: A Tuberculosis Case Study
- Authors: Ritesh Chandra, Sonali Agarwal, Navjot Singh,
- Abstract summary: This study presents an ontology-enabled real-time analytics framework that integrates Complex Event Processing (CEP) and Large Language Models (LLMs)<n>Patient data is ingested and processed using Apache Kafka and Spark Streaming, where CEP engines detect clinically significant event patterns.<n>The framework is evaluated using a dataset of 1,000 Tuberculosis (TB) patients as a use case, demonstrating low-latency event detection, scalable reasoning, and high model performance.
- Score: 4.0954316720608634
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Timely detection of critical health conditions remains a major challenge in public health analytics, especially in Big Data environments characterized by high volume, rapid velocity, and diverse variety of clinical data. This study presents an ontology-enabled real-time analytics framework that integrates Complex Event Processing (CEP) and Large Language Models (LLMs) to enable intelligent health event detection and semantic reasoning over heterogeneous, high-velocity health data streams. The architecture leverages the Basic Formal Ontology (BFO) and Semantic Web Rule Language (SWRL) to model diagnostic rules and domain knowledge. Patient data is ingested and processed using Apache Kafka and Spark Streaming, where CEP engines detect clinically significant event patterns. LLMs support adaptive reasoning, event interpretation, and ontology refinement. Clinical information is semantically structured as Resource Description Framework (RDF) triples in Graph DB, enabling SPARQL-based querying and knowledge-driven decision support. The framework is evaluated using a dataset of 1,000 Tuberculosis (TB) patients as a use case, demonstrating low-latency event detection, scalable reasoning, and high model performance (in terms of precision, recall, and F1-score). These results validate the system's potential for generalizable, real-time health analytics in complex Big Data scenarios.
Related papers
- Harnessing Large Language Models for Precision Querying and Retrieval-Augmented Knowledge Extraction in Clinical Data Science [3.4325249294405555]
This study applies Large Language Models (LLMs) to two foundational Electronic Health Record (EHR) data science tasks.<n>We test the ability of LLMs to interact accurately with large structured datasets for analytics.<n>We present a flexible evaluation framework that automatically generates synthetic question and answer pairs tailored to the characteristics of each dataset or task.
arXiv Detail & Related papers (2026-01-28T14:57:36Z) - Investigating Zero-Shot Diagnostic Pathology in Vision-Language Models with Efficient Prompt Design [7.509731425152396]
We present a systematic investigation and analysis of three state of the art vision-language models (VLMs) for histopathology.<n>We develop a comprehensive prompt engineering framework that systematically varies domain specificity, anatomical precision, instructional framing, and output constraints.<n>Our findings demonstrate that prompt engineering significantly impacts model performance, with the CONCH model achieving the highest accuracy when provided with precise anatomical references.
arXiv Detail & Related papers (2025-04-30T19:01:06Z) - Towards Scalable and Cross-Lingual Specialist Language Models for Oncology [4.824906329042275]
General-purpose large models (LLMs) struggle with challenges such as clinical terminology, context-dependent interpretations, and multi-modal data integration.<n>We develop an oncology-specialized, efficient, and adaptable NLP framework that combines instruction tuning, retrieval-augmented generation (RAG), and graph-based knowledge integration.
arXiv Detail & Related papers (2025-03-11T11:34:57Z) - How Deep is your Guess? A Fresh Perspective on Deep Learning for Medical Time-Series Imputation [6.547981908229007]
We show how architectural and framework biases combine to influence model performance.<n>Experiments show imputation performance variations of up to 20% based on preprocessing and implementation choices.<n>We identify critical gaps between current deep imputation methods and medical requirements.
arXiv Detail & Related papers (2024-07-11T12:33:28Z) - REST: Efficient and Accelerated EEG Seizure Analysis through Residual State Updates [54.96885726053036]
This paper introduces a novel graph-based residual state update mechanism (REST) for real-time EEG signal analysis.
By leveraging a combination of graph neural networks and recurrent structures, REST efficiently captures both non-Euclidean geometry and temporal dependencies within EEG data.
Our model demonstrates high accuracy in both seizure detection and classification tasks.
arXiv Detail & Related papers (2024-06-03T16:30:19Z) - Prompting Large Language Models for Zero-Shot Clinical Prediction with
Structured Longitudinal Electronic Health Record Data [7.815738943706123]
Large Language Models (LLMs) are traditionally tailored for natural language processing.
This research investigates the adaptability of LLMs, like GPT-4, to EHR data.
In response to the longitudinal, sparse, and knowledge-infused nature of EHR data, our prompting approach involves taking into account specific characteristics.
arXiv Detail & Related papers (2024-01-25T20:14:50Z) - Clairvoyance: A Pipeline Toolkit for Medical Time Series [95.22483029602921]
Time-series learning is the bread and butter of data-driven *clinical decision support*
Clairvoyance proposes a unified, end-to-end, autoML-friendly pipeline that serves as a software toolkit.
Clairvoyance is the first to demonstrate viability of a comprehensive and automatable pipeline for clinical time-series ML.
arXiv Detail & Related papers (2023-10-28T12:08:03Z) - Leveraging the structure of dynamical systems for data-driven modeling [111.45324708884813]
We consider the impact of the training set and its structure on the quality of the long-term prediction.
We show how an informed design of the training set, based on invariants of the system and the structure of the underlying attractor, significantly improves the resulting models.
arXiv Detail & Related papers (2021-12-15T20:09:20Z) - Health Status Prediction with Local-Global Heterogeneous Behavior Graph [69.99431339130105]
Estimation of health status can be achieved with various kinds of data streams continuously collected from wearable sensors.
We propose to model the behavior-related multi-source data streams with a local-global graph.
We take experiments on StudentLife dataset, and extensive results demonstrate the effectiveness of our proposed model.
arXiv Detail & Related papers (2021-03-23T11:10:04Z) - Uncovering the structure of clinical EEG signals with self-supervised
learning [64.4754948595556]
Supervised learning paradigms are often limited by the amount of labeled data that is available.
This phenomenon is particularly problematic in clinically-relevant data, such as electroencephalography (EEG)
By extracting information from unlabeled data, it might be possible to reach competitive performance with deep neural networks.
arXiv Detail & Related papers (2020-07-31T14:34:47Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.