CrisisMatch: Semi-Supervised Few-Shot Learning for Fine-Grained Disaster
Tweet Classification
- URL: http://arxiv.org/abs/2310.14627v1
- Date: Mon, 23 Oct 2023 07:01:09 GMT
- Title: CrisisMatch: Semi-Supervised Few-Shot Learning for Fine-Grained Disaster
Tweet Classification
- Authors: Henry Peng Zou, Yue Zhou, Cornelia Caragea, and Doina Caragea
- Abstract summary: We present a fine-grained disaster tweet classification model under the semi-supervised, few-shot learning setting.
Our model, CrisisMatch, effectively classifies tweets into fine-grained classes of interest using few labeled data and large amounts of unlabeled data.
- Score: 51.58605842457186
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The shared real-time information about natural disasters on social media
platforms like Twitter and Facebook plays a critical role in informing
volunteers, emergency managers, and response organizations. However, supervised
learning models for monitoring disaster events require large amounts of
annotated data, making them unrealistic for real-time use in disaster events.
To address this challenge, we present a fine-grained disaster tweet
classification model under the semi-supervised, few-shot learning setting where
only a small number of annotated data is required. Our model, CrisisMatch,
effectively classifies tweets into fine-grained classes of interest using few
labeled data and large amounts of unlabeled data, mimicking the early stage of
a disaster. Through integrating effective semi-supervised learning ideas and
incorporating TextMixUp, CrisisMatch achieves performance improvement on two
disaster datasets of 11.2\% on average. Further analyses are also provided for
the influence of the number of labeled data and out-of-domain results.
Related papers
- CrisisSense-LLM: Instruction Fine-Tuned Large Language Model for Multi-label Social Media Text Classification in Disaster Informatics [49.2719253711215]
This study introduces a novel approach to disaster text classification by enhancing a pre-trained Large Language Model (LLM)
Our methodology involves creating a comprehensive instruction dataset from disaster-related tweets, which is then used to fine-tune an open-source LLM.
This fine-tuned model can classify multiple aspects of disaster-related information simultaneously, such as the type of event, informativeness, and involvement of human aid.
arXiv Detail & Related papers (2024-06-16T23:01:10Z) - ADSumm: Annotated Ground-truth Summary Datasets for Disaster Tweet Summarization [8.371475703337106]
Existing tweet disaster summarization approaches provide a summary of these events to aid government agencies, humanitarian organizations, etc.
In this paper, we present ADSumm, which adds annotated ground-truth summaries for eight disaster events.
Our experimental analysis shows that the newly added datasets improve the performance of the supervised summarization approaches by 8-28% in terms of ROUGE-N F1-score.
arXiv Detail & Related papers (2024-05-10T15:49:01Z) - DeCrisisMB: Debiased Semi-Supervised Learning for Crisis Tweet
Classification via Memory Bank [52.20298962359658]
In crisis events, people often use social media platforms such as Twitter to disseminate information about the situation, warnings, advice, and support.
fully-supervised approaches require annotating vast amounts of data and are impractical due to limited response time.
Semi-supervised models can be biased, performing moderately well for certain classes while performing extremely poorly for others.
We propose a simple but effective debiasing method, DeCrisisMB, that utilizes a Memory Bank to store and perform equal sampling for generated pseudo-labels from each class at each training.
arXiv Detail & Related papers (2023-10-23T05:25:51Z) - Sarcasm Detection in a Disaster Context [103.93691731605163]
We introduce HurricaneSARC, a dataset of 15,000 tweets annotated for intended sarcasm.
Our best model is able to obtain as much as 0.70 F1 on our dataset.
arXiv Detail & Related papers (2023-08-16T05:58:12Z) - CrisisLTLSum: A Benchmark for Local Crisis Event Timeline Extraction and
Summarization [62.77066949111921]
This paper presents CrisisLTLSum, the largest dataset of local crisis event timelines available to date.
CrisisLTLSum contains 1,000 crisis event timelines across four domains: wildfires, local fires, traffic, and storms.
Our initial experiments indicate a significant gap between the performance of strong baselines compared to the human performance on both tasks.
arXiv Detail & Related papers (2022-10-25T17:32:40Z) - HumAID: Human-Annotated Disaster Incidents Data from Twitter with Deep
Learning Benchmarks [5.937482215664902]
Social media content is often too noisy for direct use in any application.
It is important to filter, categorize, and concisely summarize the available content to facilitate effective consumption and decision-making.
We present a new large-scale dataset with 77K human-labeled tweets, sampled from a pool of 24 million tweets across 19 disaster events.
arXiv Detail & Related papers (2021-04-07T12:29:36Z) - Event-Related Bias Removal for Real-time Disaster Events [67.2965372987723]
Social media has become an important tool to share information about crisis events such as natural disasters and mass attacks.
Detecting actionable posts that contain useful information requires rapid analysis of huge volume of data in real-time.
We train an adversarial neural model to remove latent event-specific biases and improve the performance on tweet importance classification.
arXiv Detail & Related papers (2020-11-02T02:03:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.