TweetDIS: A Large Twitter Dataset for Natural Disasters Built using Weak
Supervision
- URL: http://arxiv.org/abs/2207.04947v1
- Date: Mon, 11 Jul 2022 15:30:09 GMT
- Title: TweetDIS: A Large Twitter Dataset for Natural Disasters Built using Weak
Supervision
- Authors: Ramya Tekumalla and Juan M. Banda
- Abstract summary: Social media is often utilized as a lifeline for communication during natural disasters.
In this work, we curate a silver standard dataset using weak supervision.
In order to validate its utility, we train machine learning models on the weakly supervised data to identify three different types of natural disasters.
- Score: 1.2400116527089997
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Social media is often utilized as a lifeline for communication during natural
disasters. Traditionally, natural disaster tweets are filtered from the Twitter
stream using the name of the natural disaster and the filtered tweets are sent
for human annotation. The process of human annotation to create labeled sets
for machine learning models is laborious, time consuming, at times inaccurate,
and more importantly not scalable in terms of size and real-time use. In this
work, we curate a silver standard dataset using weak supervision. In order to
validate its utility, we train machine learning models on the weakly supervised
data to identify three different types of natural disasters i.e earthquakes,
hurricanes and floods. Our results demonstrate that models trained on the
silver standard dataset achieved performance greater than 90% when classifying
a manually curated, gold-standard dataset. To enable reproducible research and
additional downstream utility, we release the silver standard dataset for the
scientific community.
Related papers
- CrisisSense-LLM: Instruction Fine-Tuned Large Language Model for Multi-label Social Media Text Classification in Disaster Informatics [49.2719253711215]
This study introduces a novel approach to disaster text classification by enhancing a pre-trained Large Language Model (LLM)
Our methodology involves creating a comprehensive instruction dataset from disaster-related tweets, which is then used to fine-tune an open-source LLM.
This fine-tuned model can classify multiple aspects of disaster-related information simultaneously, such as the type of event, informativeness, and involvement of human aid.
arXiv Detail & Related papers (2024-06-16T23:01:10Z) - CrisisMatch: Semi-Supervised Few-Shot Learning for Fine-Grained Disaster
Tweet Classification [51.58605842457186]
We present a fine-grained disaster tweet classification model under the semi-supervised, few-shot learning setting.
Our model, CrisisMatch, effectively classifies tweets into fine-grained classes of interest using few labeled data and large amounts of unlabeled data.
arXiv Detail & Related papers (2023-10-23T07:01:09Z) - Sarcasm Detection in a Disaster Context [103.93691731605163]
We introduce HurricaneSARC, a dataset of 15,000 tweets annotated for intended sarcasm.
Our best model is able to obtain as much as 0.70 F1 on our dataset.
arXiv Detail & Related papers (2023-08-16T05:58:12Z) - Harnessing the Power of Text-image Contrastive Models for Automatic
Detection of Online Misinformation [50.46219766161111]
We develop a self-learning model to explore the constrastive learning in the domain of misinformation identification.
Our model shows the superior performance of non-matched image-text pair detection when the training data is insufficient.
arXiv Detail & Related papers (2023-04-19T02:53:59Z) - SurvivalGAN: Generating Time-to-Event Data for Survival Analysis [121.84429525403694]
Imbalances in censoring and time horizons cause generative models to experience three new failure modes specific to survival analysis.
We propose SurvivalGAN, a generative model that handles survival data by addressing the imbalance in the censoring and event horizons.
We evaluate this method via extensive experiments on medical datasets.
arXiv Detail & Related papers (2023-02-24T17:03:51Z) - Spatio-Temporal Graph Contrastive Learning [49.132528449909316]
We propose a Spatio-Temporal Graph Contrastive Learning framework (STGCL) to tackle these issues.
We elaborate on four types of data augmentations which disturb data in terms of graph structure, time domain, and frequency domain.
Our framework is evaluated across three real-world datasets and four state-of-the-art models.
arXiv Detail & Related papers (2021-08-26T16:05:32Z) - HumAID: Human-Annotated Disaster Incidents Data from Twitter with Deep
Learning Benchmarks [5.937482215664902]
Social media content is often too noisy for direct use in any application.
It is important to filter, categorize, and concisely summarize the available content to facilitate effective consumption and decision-making.
We present a new large-scale dataset with 77K human-labeled tweets, sampled from a pool of 24 million tweets across 19 disaster events.
arXiv Detail & Related papers (2021-04-07T12:29:36Z) - A multi-modal approach towards mining social media data during natural
disasters -- a case study of Hurricane Irma [1.9259288012724252]
We use 54,383 Twitter messages (out of 784K geolocated messages) from 16,598 users to develop 4 independent models to filter data for relevance.
All four models are independently tested, and can be combined to quickly filter and visualize tweets.
arXiv Detail & Related papers (2021-01-02T17:08:53Z) - Semantic-based End-to-End Learning for Typhoon Intensity Prediction [0.2580765958706853]
Existing technologies employ different machine learning approaches to predict incoming disasters from historical environmental data.
Social media posts (e.g., tweets) is very informal and contains only limited content.
We propose an end-to-end based framework that learns from disaster-related tweets and environmental data to improve typhoon intensity prediction.
arXiv Detail & Related papers (2020-03-22T01:13:20Z) - Localized Flood DetectionWith Minimal Labeled Social Media Data Using
Transfer Learning [3.964047152162558]
We investigate the problem of localized flood detection using the social sensing model (Twitter)
This study can immensely help in providing the flood-related updates and notifications to the city officials for emergency decision making, rescue operations, and early warnings, etc.
arXiv Detail & Related papers (2020-02-10T20:17:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.