Crowdsourcing-Based Knowledge Graph Construction for Drug Side Effects Using Large Language Models with an Application on Semaglutide
- URL: http://arxiv.org/abs/2504.04346v2
- Date: Tue, 08 Apr 2025 03:11:32 GMT
- Title: Crowdsourcing-Based Knowledge Graph Construction for Drug Side Effects Using Large Language Models with an Application on Semaglutide
- Authors: Zhijie Duan, Kai Wei, Zhaoqian Xue, Jiayan Zhou, Shu Yang, Siyuan Ma, Jin Jin, Lingyao li,
- Abstract summary: Social media is a rich source of real-world data that captures valuable patient experience information for pharmacovigilance.<n>We present a framework that leverages large language models (LLMs) to extract medication side effects from social media and organize them into a knowledge graph (KG)<n>Using the constructed knowledge graph, we perform comprehensive analyses to investigate reported side effects across different semaglutide brands over time.
- Score: 8.621699521476371
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Social media is a rich source of real-world data that captures valuable patient experience information for pharmacovigilance. However, mining data from unstructured and noisy social media content remains a challenging task. We present a systematic framework that leverages large language models (LLMs) to extract medication side effects from social media and organize them into a knowledge graph (KG). We apply this framework to semaglutide for weight loss using data from Reddit. Using the constructed knowledge graph, we perform comprehensive analyses to investigate reported side effects across different semaglutide brands over time. These findings are further validated through comparison with adverse events reported in the FAERS database, providing important patient-centered insights into semaglutide's side effects that complement its safety profile and current knowledge base of semaglutide for both healthcare professionals and patients. Our work demonstrates the feasibility of using LLMs to transform social media data into structured KGs for pharmacovigilance.
Related papers
- Self-Supervised Learning for Graph-Structured Data in Healthcare Applications: A Comprehensive Review [1.5778541636044452]
This review aims to be a valuable resource for both researchers and practitioners looking to utilize self-supervised learning (SSL) for graph-structured data in healthcare.<n>To the best of our knowledge, this work represents the first comprehensive review of the literature on SSL applied to graph data in healthcare.
arXiv Detail & Related papers (2024-11-28T10:51:12Z) - Enhancing Adverse Drug Event Detection with Multimodal Dataset: Corpus Creation and Model Development [12.258245804049114]
The mining of adverse drug events (ADEs) is pivotal in pharmacovigilance, enhancing patient safety.
Traditional ADE detection methods are reliable but slow, not easily adaptable to large-scale operations.
Previous ADE mining studies have focused on text-based methodologies, overlooking visual cues.
We present a MultiModal Adverse Drug Event (MMADE) detection dataset, merging ADE-related textual information with visual aids.
arXiv Detail & Related papers (2024-05-24T17:58:42Z) - Reddit-Impacts: A Named Entity Recognition Dataset for Analyzing Clinical and Social Effects of Substance Use Derived from Social Media [6.138126219622993]
Substance use disorders (SUDs) are a growing concern globally, necessitating enhanced understanding of the problem and its trends through data-driven research.
Social media are unique and important sources of information about SUDs, particularly since the data in such sources are often generated by people with lived experiences.
In this paper, we introduce Reddit-Impacts, a challenging Named Entity Recognition (NER) dataset curated from subreddits dedicated to discussions on prescription and illicit opioids, as well as medications for opioid use disorder.
The dataset specifically concentrates on the lesser-studied, yet critically important, aspects of substance use--its
arXiv Detail & Related papers (2024-05-09T23:43:57Z) - myAURA: Personalized health library for epilepsy management via knowledge graph sparsification and visualization [4.25313339005458]
myAURA is an application designed to aid epilepsy patients, caregivers, and researchers in making decisions about care and self-management.
MyAURA rests on the federation of heterogeneous data resources relevant to epilepsy, such as biomedical databases, social media, and electronic health records.
arXiv Detail & Related papers (2024-05-08T17:24:24Z) - Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features.
We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv Detail & Related papers (2024-03-19T03:59:14Z) - Don't Ignore Dual Logic Ability of LLMs while Privatizing: A
Data-Intensive Analysis in Medical Domain [19.46334739319516]
We study how the dual logic ability of LLMs is affected during the privatization process in the medical domain.
Our results indicate that incorporating general domain dual logic data into LLMs not only enhances LLMs' dual logic ability but also improves their accuracy.
arXiv Detail & Related papers (2023-09-08T08:20:46Z) - Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report
Generation [92.73584302508907]
We propose a knowledge graph with Dynamic structure and nodes to facilitate medical report generation with Contrastive Learning.
In detail, the fundamental structure of our graph is pre-constructed from general knowledge.
Each image feature is integrated with its very own updated graph before being fed into the decoder module for report generation.
arXiv Detail & Related papers (2023-03-18T03:53:43Z) - Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining.
We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data.
Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z) - Is More Data All You Need? A Causal Exploration [4.756600446882457]
Causal analysis is often used in medicine and economics to gain insights about the effects of actions and policies.
In this paper we explore the effect of dataset interventions on the output of image classification models.
arXiv Detail & Related papers (2022-06-06T08:02:54Z) - HINT: Hierarchical Interaction Network for Trial Outcome Prediction
Leveraging Web Data [56.53715632642495]
Clinical trials face uncertain outcomes due to issues with efficacy, safety, or problems with patient recruitment.
In this paper, we propose Hierarchical INteraction Network (HINT) for more general, clinical trial outcome predictions.
arXiv Detail & Related papers (2021-02-08T15:09:07Z) - MET: Multimodal Perception of Engagement for Telehealth [52.54282887530756]
We present MET, a learning-based algorithm for perceiving a human's level of engagement from videos.
We release a new dataset, MEDICA, for mental health patient engagement detection.
arXiv Detail & Related papers (2020-11-17T15:18:38Z) - Assessing the Severity of Health States based on Social Media Posts [62.52087340582502]
We propose a multiview learning framework that models both the textual content as well as contextual-information to assess the severity of the user's health state.
The diverse NLU views demonstrate its effectiveness on both the tasks and as well as on the individual disease to assess a user's health.
arXiv Detail & Related papers (2020-09-21T03:45:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.