Plague Dot Text: Text mining and annotation of outbreak reports of the
Third Plague Pandemic (1894-1952)
- URL: http://arxiv.org/abs/2002.01415v3
- Date: Mon, 11 Jan 2021 11:08:08 GMT
- Title: Plague Dot Text: Text mining and annotation of outbreak reports of the
Third Plague Pandemic (1894-1952)
- Authors: Arlene Casey, Mike Bennett, Richard Tobin, Claire Grover, Iona Walker,
Lukas Engelmann, Beatrice Alex
- Abstract summary: Interdisciplinary research investigates more than 100 reports from the third plague pandemic (1894- 1952)
Our goal is to develop structured accounts of some of the most significant concepts that were used to understand the epidemiology of the third plague pandemic around the globe.
- Score: 0.8114550931351494
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The design of models that govern diseases in population is commonly built on
information and data gathered from past outbreaks. However, epidemic outbreaks
are never captured in statistical data alone but are communicated by
narratives, supported by empirical observations. Outbreak reports discuss
correlations between populations, locations and the disease to infer insights
into causes, vectors and potential interventions. The problem with these
narratives is usually the lack of consistent structure or strong conventions,
which prohibit their formal analysis in larger corpora. Our interdisciplinary
research investigates more than 100 reports from the third plague pandemic
(1894-1952) evaluating ways of building a corpus to extract and structure this
narrative information through text mining and manual annotation. In this paper
we discuss the progress of our ongoing exploratory project, how we enhance
optical character recognition (OCR) methods to improve text capture, our
approach to structure the narratives and identify relevant entities in the
reports. The structured corpus is made available via Solr enabling search and
analysis across the whole collection for future research dedicated, for
example, to the identification of concepts. We show preliminary visualisations
of the characteristics of causation and differences with respect to gender as a
result of syntactic-category-dependent corpus statistics. Our goal is to
develop structured accounts of some of the most significant concepts that were
used to understand the epidemiology of the third plague pandemic around the
globe. The corpus enables researchers to analyse the reports collectively
allowing for deep insights into the global epidemiological consideration of
plague in the early twentieth century.
Related papers
- SPEED++: A Multilingual Event Extraction Framework for Epidemic Prediction and Preparedness [73.73883111570458]
We introduce the first multilingual Event Extraction framework for extracting epidemic event information for a wide range of diseases and languages.
Annotating data in every language is infeasible; thus we develop zero-shot cross-lingual cross-disease models.
Our framework can provide epidemic warnings for COVID-19 in its earliest stages in Dec 2019 from Chinese Weibo posts without any training in Chinese.
arXiv Detail & Related papers (2024-10-24T03:03:54Z) - Causal Micro-Narratives [62.47217054314046]
We present a novel approach to classify causal micro-narratives from text.
These narratives are sentence-level explanations of the cause(s) and/or effect(s) of a target subject.
arXiv Detail & Related papers (2024-10-07T17:55:10Z) - Event Detection from Social Media for Epidemic Prediction [76.90779562626541]
We develop a framework to extract and analyze epidemic-related events from social media posts.
Experimentation reveals how ED models trained on COVID-based SPEED can effectively detect epidemic events for three unseen epidemics.
We show that reporting sharp increases in the extracted events by our framework can provide warnings 4-9 weeks earlier than the WHO epidemic declaration for Monkeypox.
arXiv Detail & Related papers (2024-04-02T06:31:17Z) - A Comparative Analysis of the COVID-19 Infodemic in English and Chinese:
Insights from Social Media Textual Data [2.641576480886427]
The COVID-19 infodemic, characterized by the rapid spread of misinformation and unverified claims related to the pandemic, presents a significant challenge.
This paper presents a comparative analysis of the COVID-19 infodemic in the English and Chinese languages, utilizing textual data extracted from social media platforms.
arXiv Detail & Related papers (2023-11-14T08:55:11Z) - Agent-Based Model: Simulating a Virus Expansion Based on the Acceptance
of Containment Measures [65.62256987706128]
Compartmental epidemiological models categorize individuals based on their disease status.
We propose an ABM architecture that combines an adapted SEIRD model with a decision-making model for citizens.
We illustrate the designed model by examining the progression of SARS-CoV-2 infections in A Coruna, Spain.
arXiv Detail & Related papers (2023-07-28T08:01:05Z) - Informing clinical assessment by contextualizing post-hoc explanations
of risk prediction models in type-2 diabetes [50.8044927215346]
We consider a comorbidity risk prediction scenario and focus on contexts regarding the patients clinical state.
We employ several state-of-the-art LLMs to present contexts around risk prediction model inferences and evaluate their acceptability.
Our paper is one of the first end-to-end analyses identifying the feasibility and benefits of contextual explanations in a real-world clinical use case.
arXiv Detail & Related papers (2023-02-11T18:07:11Z) - Coronavirus statistics causes emotional bias: a social media text mining
perspective [4.042350304426975]
This paper proposes a deep learning model which classifies texts related to the pandemic from text data with place labels.
Next, it conducts a sentiment analysis based on multi-task learning.
Finally, it carries out a fixed-effect panel regression with outputs of the sentiment analysis.
arXiv Detail & Related papers (2022-11-16T03:36:13Z) - When Infodemic Meets Epidemic: a Systematic Literature Review [3.3454373538792543]
Social media offer significant amounts of data that can be leveraged for bio-surveillance.
This systematic literature review provides a methodical overview of the integration of social media in different epidemic-related contexts.
arXiv Detail & Related papers (2022-10-03T21:04:30Z) - Sentiment and Emotion Classification of Epidemic Related Bilingual data
from Social Media [1.7109522466982476]
The study exploits the bilingual (Urdu and English) data from Twitter and NEWS websites related to the dengue epidemic in Pakistan.
The proposed study exploits the bilingual (Urdu and English) data from Twitter and NEWS websites related to the dengue epidemic in Pakistan.
arXiv Detail & Related papers (2021-05-04T12:51:18Z) - STOPPAGE: Spatio-temporal Data Driven Cloud-Fog-Edge Computing Framework
for Pandemic Monitoring and Management [28.205715426050105]
It is absolutely necessary to develop an analytics framework to deliver insights in improving administrative policy and enhance the preparedness to combat the pandemic.
This paper proposes a STOP-temporal knowledge mining framework, named STOP to model the impact of human mobility and contextual information over large geographic area in different temporal scales.
The framework has two modules: (i) S-temporal data and computing infrastructure using fog/edge based architecture; and (ii) S-temporal data analytics module to efficiently extract knowledge from heterogeneous data sources.
arXiv Detail & Related papers (2021-04-04T12:29:31Z) - Unifying Relational Sentence Generation and Retrieval for Medical Image
Report Composition [142.42920413017163]
Current methods often generate the most common sentences due to dataset bias for individual case.
We propose a novel framework that unifies template retrieval and sentence generation to handle both common and rare abnormality.
arXiv Detail & Related papers (2021-01-09T04:33:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.