Bridging the gap between supervised classification and unsupervised
topic modelling for social-media assisted crisis management
- URL: http://arxiv.org/abs/2103.11835v1
- Date: Mon, 22 Mar 2021 13:30:39 GMT
- Title: Bridging the gap between supervised classification and unsupervised
topic modelling for social-media assisted crisis management
- Authors: Mikael Brunila, Rosie Zhao, Andrei Mircea, Sam Lumley, Renee Sieber
- Abstract summary: Social media such as Twitter provide valuable information to crisis managers and affected people during natural disasters.
Machine learning can help structure and extract information from the large volume of messages shared during a crisis.
We show that BERT embeddings finetuned on crisis-related tweet classification can effectively be used to adapt to a new crisis.
- Score: 0.5249805590164902
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Social media such as Twitter provide valuable information to crisis managers
and affected people during natural disasters. Machine learning can help
structure and extract information from the large volume of messages shared
during a crisis; however, the constantly evolving nature of crises makes
effective domain adaptation essential. Supervised classification is limited by
unchangeable class labels that may not be relevant to new events, and
unsupervised topic modelling by insufficient prior knowledge. In this paper, we
bridge the gap between the two and show that BERT embeddings finetuned on
crisis-related tweet classification can effectively be used to adapt to a new
crisis, discovering novel topics while preserving relevant classes from
supervised training, and leveraging bidirectional self-attention to extract
topic keywords. We create a dataset of tweets from a snowstorm to evaluate our
method's transferability to new crises, and find that it outperforms
traditional topic models in both automatic, and human evaluations grounded in
the needs of crisis managers. More broadly, our method can be used for textual
domain adaptation where the latent classes are unknown but overlap with known
classes from other domains.
Related papers
- CrisisSense-LLM: Instruction Fine-Tuned Large Language Model for Multi-label Social Media Text Classification in Disaster Informatics [49.2719253711215]
This study introduces a novel approach to disaster text classification by enhancing a pre-trained Large Language Model (LLM)
Our methodology involves creating a comprehensive instruction dataset from disaster-related tweets, which is then used to fine-tune an open-source LLM.
This fine-tuned model can classify multiple aspects of disaster-related information simultaneously, such as the type of event, informativeness, and involvement of human aid.
arXiv Detail & Related papers (2024-06-16T23:01:10Z) - CrisisMatch: Semi-Supervised Few-Shot Learning for Fine-Grained Disaster
Tweet Classification [51.58605842457186]
We present a fine-grained disaster tweet classification model under the semi-supervised, few-shot learning setting.
Our model, CrisisMatch, effectively classifies tweets into fine-grained classes of interest using few labeled data and large amounts of unlabeled data.
arXiv Detail & Related papers (2023-10-23T07:01:09Z) - DeCrisisMB: Debiased Semi-Supervised Learning for Crisis Tweet
Classification via Memory Bank [52.20298962359658]
In crisis events, people often use social media platforms such as Twitter to disseminate information about the situation, warnings, advice, and support.
fully-supervised approaches require annotating vast amounts of data and are impractical due to limited response time.
Semi-supervised models can be biased, performing moderately well for certain classes while performing extremely poorly for others.
We propose a simple but effective debiasing method, DeCrisisMB, that utilizes a Memory Bank to store and perform equal sampling for generated pseudo-labels from each class at each training.
arXiv Detail & Related papers (2023-10-23T05:25:51Z) - CrisisTransformers: Pre-trained language models and sentence encoders for crisis-related social media texts [3.690904966341072]
Social media platforms play an essential role in crisis communication, but analyzing crisis-related social media texts is challenging due to their informal nature.
This study introduces CrisisTransformers, an ensemble of pre-trained language models and sentence encoders trained on an extensive corpus of over 15 billion word tokens from tweets.
arXiv Detail & Related papers (2023-09-11T14:36:16Z) - Coping with low data availability for social media crisis message
categorisation [3.0255457622022495]
This thesis focuses on addressing the challenge of low data availability when categorising crisis messages for emergency response.
It first presents domain adaptation as a solution for this problem, which involves learning a categorisation model from annotated data from past crisis events.
In many-to-many adaptation, where the model is trained on multiple past events and adapted to multiple ongoing events, a multi-task learning approach is proposed.
arXiv Detail & Related papers (2023-05-26T19:08:24Z) - CrisisLTLSum: A Benchmark for Local Crisis Event Timeline Extraction and
Summarization [62.77066949111921]
This paper presents CrisisLTLSum, the largest dataset of local crisis event timelines available to date.
CrisisLTLSum contains 1,000 crisis event timelines across four domains: wildfires, local fires, traffic, and storms.
Our initial experiments indicate a significant gap between the performance of strong baselines compared to the human performance on both tasks.
arXiv Detail & Related papers (2022-10-25T17:32:40Z) - Event-Related Bias Removal for Real-time Disaster Events [67.2965372987723]
Social media has become an important tool to share information about crisis events such as natural disasters and mass attacks.
Detecting actionable posts that contain useful information requires rapid analysis of huge volume of data in real-time.
We train an adversarial neural model to remove latent event-specific biases and improve the performance on tweet importance classification.
arXiv Detail & Related papers (2020-11-02T02:03:07Z) - Clustering of Social Media Messages for Humanitarian Aid Response during
Crisis [47.187609203210705]
We show that recent advances in Deep Learning and Natural Language Processing outperform prior approaches for the task of classifying informativeness.
We extend these methods to two sub-tasks of informativeness and find that the Deep Learning methods are effective here as well.
arXiv Detail & Related papers (2020-07-23T02:18:05Z) - CrisisBERT: a Robust Transformer for Crisis Classification and
Contextual Crisis Embedding [2.7718973516070684]
We propose an end-to-end transformer-based model for two crisis classification tasks, namely crisis detection and crisis recognition.
We also proposed Crisis2Vec, an attention-based, document-level contextual embedding architecture for crisis embedding.
arXiv Detail & Related papers (2020-05-11T09:57:24Z) - Unsupervised and Interpretable Domain Adaptation to Rapidly Filter
Tweets for Emergency Services [18.57009530004948]
We present a novel method to classify relevant tweets during an ongoing crisis using the publicly available dataset of TREC incident streams.
We use dedicated attention layers for each task to provide model interpretability; critical for real-word applications.
We show a practical implication of our work by providing a use-case for the COVID-19 pandemic.
arXiv Detail & Related papers (2020-03-04T06:40:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.