Coping with low data availability for social media crisis message
categorisation
- URL: http://arxiv.org/abs/2305.17211v1
- Date: Fri, 26 May 2023 19:08:24 GMT
- Title: Coping with low data availability for social media crisis message
categorisation
- Authors: Congcong Wang
- Abstract summary: This thesis focuses on addressing the challenge of low data availability when categorising crisis messages for emergency response.
It first presents domain adaptation as a solution for this problem, which involves learning a categorisation model from annotated data from past crisis events.
In many-to-many adaptation, where the model is trained on multiple past events and adapted to multiple ongoing events, a multi-task learning approach is proposed.
- Score: 3.0255457622022495
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: During crisis situations, social media allows people to quickly share
information, including messages requesting help. This can be valuable to
emergency responders, who need to categorise and prioritise these messages
based on the type of assistance being requested. However, the high volume of
messages makes it difficult to filter and prioritise them without the use of
computational techniques. Fully supervised filtering techniques for crisis
message categorisation typically require a large amount of annotated training
data, but this can be difficult to obtain during an ongoing crisis and is
expensive in terms of time and labour to create.
This thesis focuses on addressing the challenge of low data availability when
categorising crisis messages for emergency response. It first presents domain
adaptation as a solution for this problem, which involves learning a
categorisation model from annotated data from past crisis events (source
domain) and adapting it to categorise messages from an ongoing crisis event
(target domain). In many-to-many adaptation, where the model is trained on
multiple past events and adapted to multiple ongoing events, a multi-task
learning approach is proposed using pre-trained language models. This approach
outperforms baselines and an ensemble approach further improves performance...
Related papers
- CrisisSense-LLM: Instruction Fine-Tuned Large Language Model for Multi-label Social Media Text Classification in Disaster Informatics [49.2719253711215]
This study introduces a novel approach to disaster text classification by enhancing a pre-trained Large Language Model (LLM)
Our methodology involves creating a comprehensive instruction dataset from disaster-related tweets, which is then used to fine-tune an open-source LLM.
This fine-tuned model can classify multiple aspects of disaster-related information simultaneously, such as the type of event, informativeness, and involvement of human aid.
arXiv Detail & Related papers (2024-06-16T23:01:10Z) - CrisisMatch: Semi-Supervised Few-Shot Learning for Fine-Grained Disaster
Tweet Classification [51.58605842457186]
We present a fine-grained disaster tweet classification model under the semi-supervised, few-shot learning setting.
Our model, CrisisMatch, effectively classifies tweets into fine-grained classes of interest using few labeled data and large amounts of unlabeled data.
arXiv Detail & Related papers (2023-10-23T07:01:09Z) - CrisisLTLSum: A Benchmark for Local Crisis Event Timeline Extraction and
Summarization [62.77066949111921]
This paper presents CrisisLTLSum, the largest dataset of local crisis event timelines available to date.
CrisisLTLSum contains 1,000 crisis event timelines across four domains: wildfires, local fires, traffic, and storms.
Our initial experiments indicate a significant gap between the performance of strong baselines compared to the human performance on both tasks.
arXiv Detail & Related papers (2022-10-25T17:32:40Z) - Cross-Lingual and Cross-Domain Crisis Classification for Low-Resource
Scenarios [4.147346416230273]
We study the task of automatically classifying messages related to crisis events by leveraging cross-language and cross-domain labeled data.
Our goal is to make use of labeled data from high-resource languages to classify messages from other (low-resource) languages and/or of new (previously unseen) types of crisis situations.
Our empirical findings show that it is indeed possible to leverage data from crisis events in English to classify the same type of event in other languages, such as Spanish and Italian.
arXiv Detail & Related papers (2022-09-05T20:57:23Z) - Combating Temporal Drift in Crisis with Adapted Embeddings [58.4558720264897]
Language usage changes over time, and this can impact the effectiveness of NLP systems.
This work investigates methods for adapting to changing discourse during crisis events.
arXiv Detail & Related papers (2021-04-17T13:11:41Z) - Bridging the gap between supervised classification and unsupervised
topic modelling for social-media assisted crisis management [0.5249805590164902]
Social media such as Twitter provide valuable information to crisis managers and affected people during natural disasters.
Machine learning can help structure and extract information from the large volume of messages shared during a crisis.
We show that BERT embeddings finetuned on crisis-related tweet classification can effectively be used to adapt to a new crisis.
arXiv Detail & Related papers (2021-03-22T13:30:39Z) - Event-Related Bias Removal for Real-time Disaster Events [67.2965372987723]
Social media has become an important tool to share information about crisis events such as natural disasters and mass attacks.
Detecting actionable posts that contain useful information requires rapid analysis of huge volume of data in real-time.
We train an adversarial neural model to remove latent event-specific biases and improve the performance on tweet importance classification.
arXiv Detail & Related papers (2020-11-02T02:03:07Z) - Multimodal Categorization of Crisis Events in Social Media [81.07061295887172]
We present a new multimodal fusion method that leverages both images and texts as input.
In particular, we introduce a cross-attention module that can filter uninformative and misleading components from weak modalities.
We show that our method outperforms the unimodal approaches and strong multimodal baselines by a large margin on three crisis-related tasks.
arXiv Detail & Related papers (2020-04-10T06:31:30Z) - Unsupervised and Interpretable Domain Adaptation to Rapidly Filter
Tweets for Emergency Services [18.57009530004948]
We present a novel method to classify relevant tweets during an ongoing crisis using the publicly available dataset of TREC incident streams.
We use dedicated attention layers for each task to provide model interpretability; critical for real-word applications.
We show a practical implication of our work by providing a use-case for the COVID-19 pandemic.
arXiv Detail & Related papers (2020-03-04T06:40:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.