Recovering Patient Journeys: A Corpus of Biomedical Entities and
Relations on Twitter (BEAR)
- URL: http://arxiv.org/abs/2204.09952v1
- Date: Thu, 21 Apr 2022 08:18:44 GMT
- Title: Recovering Patient Journeys: A Corpus of Biomedical Entities and
Relations on Twitter (BEAR)
- Authors: Amelie W\"uhrl and Roman Klinger
- Abstract summary: This paper contributes a corpus with a rich set of annotation layers following the motivation to uncover and model patients' journeys and experiences.
We label 14 entity classes (incl. environmental factors, diagnostics, biochemical processes, patients' quality-of-life descriptions, pathogens, medical conditions, and treatments) and 20 relation classes (e.g., prevents, influences, interactions, causes)
The publicly available dataset consists of 2,100 tweets with approx. 6,000 entity and 3,000 relation annotations.
- Score: 12.447379545167642
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Text mining and information extraction for the medical domain has focused on
scientific text generated by researchers. However, their direct access to
individual patient experiences or patient-doctor interactions can be limited.
Information provided on social media, e.g., by patients and their relatives,
complements the knowledge in scientific text. It reflects the patient's journey
and their subjective perspective on the process of developing symptoms, being
diagnosed and offered a treatment, being cured or learning to live with a
medical condition. The value of this type of data is therefore twofold:
Firstly, it offers direct access to people's perspectives. Secondly, it might
cover information that is not available elsewhere, including self-treatment or
self-diagnoses. Named entity recognition and relation extraction are methods to
structure information that is available in unstructured text. However, existing
medical social media corpora focused on a comparably small set of entities and
relations and particular domains, rather than putting the patient into the
center of analyses. With this paper we contribute a corpus with a rich set of
annotation layers following the motivation to uncover and model patients'
journeys and experiences in more detail. We label 14 entity classes (incl.
environmental factors, diagnostics, biochemical processes, patients'
quality-of-life descriptions, pathogens, medical conditions, and treatments)
and 20 relation classes (e.g., prevents, influences, interactions, causes) most
of which have not been considered before for social media data. The publicly
available dataset consists of 2,100 tweets with approx. 6,000 entity and 3,000
relation annotations. In a corpus analysis we find that over 80 % of documents
contain relevant entities. Over 50 % of tweets express relations which we
consider essential for uncovering patients' narratives about their journeys.
Related papers
- A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis [48.84443450990355]
Deep networks have achieved broad success in analyzing natural images, when applied to medical scans, they often fail in unexcepted situations.
We investigate this challenge and focus on model sensitivity to domain shifts, such as data sampled from different hospitals or data confounded by demographic variables such as sex, race, etc, in the context of chest X-rays and skin lesion images.
Taking inspiration from medical training, we propose giving deep networks a prior grounded in explicit medical knowledge communicated in natural language.
arXiv Detail & Related papers (2024-05-23T17:55:02Z) - Revealing Patient-Reported Experiences in Healthcare from Social Media
using the DAPMAV Framework [0.04096453902709291]
We introduce the Design-Acquire-Process-Model-Analyse-Visualise (DAPMAV) framework to provide an overview of techniques and an approach to capture patient-reported experiences from social media data.
We apply this framework in a case study on prostate cancer data from /r/ProstateCancer.
arXiv Detail & Related papers (2022-10-09T11:38:41Z) - METS-CoV: A Dataset of Medical Entity and Targeted Sentiment on COVID-19
Related Tweets [13.35986397208115]
This paper releases METS-CoV, a dataset containing medical entities and targeted sentiments from COVID-19-related tweets.
To the best of our knowledge, METS-CoV is the first dataset to collect medical entities and corresponding sentiments of COVID-19-related tweets.
arXiv Detail & Related papers (2022-09-28T01:55:14Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - CoVERT: A Corpus of Fact-checked Biomedical COVID-19 Tweets [10.536415845097661]
CoVERT is a fact-checked corpus of tweets with a focus on biomedicine and COVID-19-related (mis)information.
We employ a novel crowdsourcing methodology to annotate all tweets with fact-checking labels and supporting evidence, which crowdworkers search for online.
We use the retrieved evidence extracts as part of a fact-checking pipeline, finding that the real-world evidence is more useful than the knowledge indirectly available in pretrained language models.
arXiv Detail & Related papers (2022-04-26T09:05:03Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - MIMO: Mutual Integration of Patient Journey and Medical Ontology for
Healthcare Representation Learning [49.57261599776167]
We propose an end-to-end robust Transformer-based solution, Mutual Integration of patient journey and Medical Ontology (MIMO) for healthcare representation learning and predictive analytics.
arXiv Detail & Related papers (2021-07-20T07:04:52Z) - MedDG: An Entity-Centric Medical Consultation Dataset for Entity-Aware
Medical Dialogue Generation [86.38736781043109]
We build and release a large-scale high-quality Medical Dialogue dataset related to 12 types of common Gastrointestinal diseases named MedDG.
We propose two kinds of medical dialogue tasks based on MedDG dataset. One is the next entity prediction and the other is the doctor response generation.
Experimental results show that the pre-train language models and other baselines struggle on both tasks with poor performance in our dataset.
arXiv Detail & Related papers (2020-10-15T03:34:33Z) - BiteNet: Bidirectional Temporal Encoder Network to Predict Medical
Outcomes [53.163089893876645]
We propose a novel self-attention mechanism that captures the contextual dependency and temporal relationships within a patient's healthcare journey.
An end-to-end bidirectional temporal encoder network (BiteNet) then learns representations of the patient's journeys.
We have evaluated the effectiveness of our methods on two supervised prediction and two unsupervised clustering tasks with a real-world EHR dataset.
arXiv Detail & Related papers (2020-09-24T00:42:36Z) - Extracting Structured Data from Physician-Patient Conversations By
Predicting Noteworthy Utterances [39.888619005843246]
We describe a new dataset consisting of conversation transcripts, post-visit summaries, corresponding supporting evidence (in the transcript), and structured labels.
One methodological challenge is that the conversations are long (around 1500 words) making it difficult for modern deep-learning models to use them as input.
We find that by first filtering for (predicted) noteworthy utterances, we can significantly boost predictive performance for recognizing both diagnoses and RoS abnormalities.
arXiv Detail & Related papers (2020-07-14T16:10:37Z) - A Corpus for Detecting High-Context Medical Conditions in Intensive Care
Patient Notes Focusing on Frequently Readmitted Patients [28.668217175230822]
This dataset contains 1102 Discharge Summaries and 1000 Nursing Progress Notes.
Annotated phenotypes include treatment non-adherence, chronic pain, advanced/metastatic cancer, as well as 10 other phenotypes.
This dataset can be utilized for academic and industrial research in medicine and computer science, particularly within the field of medical natural language processing.
arXiv Detail & Related papers (2020-03-06T05:56:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.