Pulse of the Pandemic: Iterative Topic Filtering for Clinical
Information Extraction from Social Media
- URL: http://arxiv.org/abs/2102.06836v2
- Date: Mon, 28 Jun 2021 15:50:35 GMT
- Title: Pulse of the Pandemic: Iterative Topic Filtering for Clinical
Information Extraction from Social Media
- Authors: Julia Wu, Venkatesh Sivaraman, Dheekshita Kumar, Juan M. Banda and
David Sontag
- Abstract summary: The rapid evolution of the COVID-19 pandemic has underscored the need to quickly disseminate the latest clinical knowledge during a public-health emergency.
We present an unsupervised, iterative approach to mine clinically relevant information from social media data.
This approach identifies granular topics and tweets with high clinical relevance from a set of about 52 million COVID-19-related tweets.
- Score: 1.5938324336156293
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid evolution of the COVID-19 pandemic has underscored the need to
quickly disseminate the latest clinical knowledge during a public-health
emergency. One surprisingly effective platform for healthcare professionals
(HCPs) to share knowledge and experiences from the front lines has been social
media (for example, the "#medtwitter" community on Twitter). However,
identifying clinically-relevant content in social media without manual labeling
is a challenge because of the sheer volume of irrelevant data. We present an
unsupervised, iterative approach to mine clinically relevant information from
social media data, which begins by heuristically filtering for HCP-authored
texts and incorporates topic modeling and concept extraction with MetaMap. This
approach identifies granular topics and tweets with high clinical relevance
from a set of about 52 million COVID-19-related tweets from January to mid-June
2020. We also show that because the technique does not require manual labeling,
it can be used to identify emerging topics on a week-to-week basis. Our method
can aid in future public-health emergencies by facilitating knowledge transfer
among healthcare workers in a rapidly-changing information environment, and by
providing an efficient and unsupervised way of highlighting potential areas for
clinical research.
Related papers
- Reddit-Impacts: A Named Entity Recognition Dataset for Analyzing Clinical and Social Effects of Substance Use Derived from Social Media [6.138126219622993]
Substance use disorders (SUDs) are a growing concern globally, necessitating enhanced understanding of the problem and its trends through data-driven research.
Social media are unique and important sources of information about SUDs, particularly since the data in such sources are often generated by people with lived experiences.
In this paper, we introduce Reddit-Impacts, a challenging Named Entity Recognition (NER) dataset curated from subreddits dedicated to discussions on prescription and illicit opioids, as well as medications for opioid use disorder.
The dataset specifically concentrates on the lesser-studied, yet critically important, aspects of substance use--its
arXiv Detail & Related papers (2024-05-09T23:43:57Z) - Empowering machine learning models with contextual knowledge for
enhancing the detection of eating disorders in social media posts [1.0423569489053137]
We introduce a novel hybrid approach combining knowledge graphs with deep learning to enhance the categorization of social media posts.
We focus on the health domain, particularly in identifying posts related to eating disorders.
We tested our approach on a dataset of 2,000 tweets about eating disorders, finding that merging word embeddings with knowledge graph information enhances the predictive models' reliability.
arXiv Detail & Related papers (2024-02-08T10:15:41Z) - Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine [68.7814360102644]
We propose the Re$3$Writer method with retrieval-augmented generation and knowledge-grounded reasoning.
We demonstrate the effectiveness of our method in generating patient discharge instructions.
arXiv Detail & Related papers (2022-10-23T16:34:39Z) - Recovering Patient Journeys: A Corpus of Biomedical Entities and
Relations on Twitter (BEAR) [12.447379545167642]
This paper contributes a corpus with a rich set of annotation layers following the motivation to uncover and model patients' journeys and experiences.
We label 14 entity classes (incl. environmental factors, diagnostics, biochemical processes, patients' quality-of-life descriptions, pathogens, medical conditions, and treatments) and 20 relation classes (e.g., prevents, influences, interactions, causes)
The publicly available dataset consists of 2,100 tweets with approx. 6,000 entity and 3,000 relation annotations.
arXiv Detail & Related papers (2022-04-21T08:18:44Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - A Dynamic Topic Identification and Labeling Approach of COVID-19 Tweets [3.097385298197292]
The COVID-19 epidemic has affected the use of social media by many people across the globe.
This paper formulates the problem of dynamically identifying key topics with proper labels from COVID-19 Tweets to provide an overview of wider public opinion.
arXiv Detail & Related papers (2021-08-13T16:51:04Z) - Domain-Specific Pretraining for Vertical Search: Case Study on
Biomedical Literature [67.4680600632232]
Self-supervised learning has emerged as a promising direction to overcome the annotation bottleneck.
We propose a general approach for vertical search based on domain-specific pretraining.
Our system can scale to tens of millions of articles on PubMed and has been deployed as Microsoft Biomedical Search.
arXiv Detail & Related papers (2021-06-25T01:02:55Z) - CREATe: Clinical Report Extraction and Annotation Technology [53.731999072534876]
Clinical case reports are written descriptions of the unique aspects of a particular clinical case.
There has been no attempt to develop an end-to-end system to annotate, index, or otherwise curate these reports.
We propose a novel computational resource platform, CREATe, for extracting, indexing, and querying the contents of clinical case reports.
arXiv Detail & Related papers (2021-02-28T16:50:14Z) - Addressing machine learning concept drift reveals declining vaccine
sentiment during the COVID-19 pandemic [0.0]
We show that machine learning algorithms trained on annotated data in the past may underperform when applied to contemporary data.
We show that while vaccine sentiment has declined considerably during the COVID-19 pandemic in 2020, algorithms trained on pre-pandemic data would have largely missed this decline due to concept drift.
arXiv Detail & Related papers (2020-12-03T18:53:57Z) - BiteNet: Bidirectional Temporal Encoder Network to Predict Medical
Outcomes [53.163089893876645]
We propose a novel self-attention mechanism that captures the contextual dependency and temporal relationships within a patient's healthcare journey.
An end-to-end bidirectional temporal encoder network (BiteNet) then learns representations of the patient's journeys.
We have evaluated the effectiveness of our methods on two supervised prediction and two unsupervised clustering tasks with a real-world EHR dataset.
arXiv Detail & Related papers (2020-09-24T00:42:36Z) - COVI White Paper [67.04578448931741]
Contact tracing is an essential tool to change the course of the Covid-19 pandemic.
We present an overview of the rationale, design, ethical considerations and privacy strategy of COVI,' a Covid-19 public peer-to-peer contact tracing and risk awareness mobile application developed in Canada.
arXiv Detail & Related papers (2020-05-18T07:40:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.