Cross-Lingual and Cross-Domain Crisis Classification for Low-Resource
Scenarios
- URL: http://arxiv.org/abs/2209.02139v1
- Date: Mon, 5 Sep 2022 20:57:23 GMT
- Title: Cross-Lingual and Cross-Domain Crisis Classification for Low-Resource
Scenarios
- Authors: Cinthia S\'anchez, Hernan Sarmiento, Jorge P\'erez, Andres Abeliuk,
Barbara Poblete
- Abstract summary: We study the task of automatically classifying messages related to crisis events by leveraging cross-language and cross-domain labeled data.
Our goal is to make use of labeled data from high-resource languages to classify messages from other (low-resource) languages and/or of new (previously unseen) types of crisis situations.
Our empirical findings show that it is indeed possible to leverage data from crisis events in English to classify the same type of event in other languages, such as Spanish and Italian.
- Score: 4.147346416230273
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Social media data has emerged as a useful source of timely information about
real-world crisis events. One of the main tasks related to the use of social
media for disaster management is the automatic identification of crisis-related
messages. Most of the studies on this topic have focused on the analysis of
data for a particular type of event in a specific language. This limits the
possibility of generalizing existing approaches because models cannot be
directly applied to new types of events or other languages. In this work, we
study the task of automatically classifying messages that are related to crisis
events by leveraging cross-language and cross-domain labeled data. Our goal is
to make use of labeled data from high-resource languages to classify messages
from other (low-resource) languages and/or of new (previously unseen) types of
crisis situations. For our study we consolidated from the literature a large
unified dataset containing multiple crisis events and languages. Our empirical
findings show that it is indeed possible to leverage data from crisis events in
English to classify the same type of event in other languages, such as Spanish
and Italian (80.0% F1-score). Furthermore, we achieve good performance for the
cross-domain task (80.0% F1-score) in a cross-lingual setting. Overall, our
work contributes to improving the data scarcity problem that is so important
for multilingual crisis classification. In particular, mitigating cold-start
situations in emergency events, when time is of essence.
Related papers
- CrisisSense-LLM: Instruction Fine-Tuned Large Language Model for Multi-label Social Media Text Classification in Disaster Informatics [49.2719253711215]
This study introduces a novel approach to disaster text classification by enhancing a pre-trained Large Language Model (LLM)
Our methodology involves creating a comprehensive instruction dataset from disaster-related tweets, which is then used to fine-tune an open-source LLM.
This fine-tuned model can classify multiple aspects of disaster-related information simultaneously, such as the type of event, informativeness, and involvement of human aid.
arXiv Detail & Related papers (2024-06-16T23:01:10Z) - CReMa: Crisis Response through Computational Identification and Matching of Cross-Lingual Requests and Offers Shared on Social Media [5.384787836425144]
In times of crisis, social media platforms play a crucial role in facilitating communication and coordinating resources.
We propose CReMa (Crisis Response Matcher), a systematic approach that integrates textual, temporal, and spatial features.
We introduce a novel multi-lingual dataset simulating help-seeking and offering assistance on social media in 16 languages.
arXiv Detail & Related papers (2024-05-20T09:30:03Z) - CrisisMatch: Semi-Supervised Few-Shot Learning for Fine-Grained Disaster
Tweet Classification [51.58605842457186]
We present a fine-grained disaster tweet classification model under the semi-supervised, few-shot learning setting.
Our model, CrisisMatch, effectively classifies tweets into fine-grained classes of interest using few labeled data and large amounts of unlabeled data.
arXiv Detail & Related papers (2023-10-23T07:01:09Z) - Coping with low data availability for social media crisis message
categorisation [3.0255457622022495]
This thesis focuses on addressing the challenge of low data availability when categorising crisis messages for emergency response.
It first presents domain adaptation as a solution for this problem, which involves learning a categorisation model from annotated data from past crisis events.
In many-to-many adaptation, where the model is trained on multiple past events and adapted to multiple ongoing events, a multi-task learning approach is proposed.
arXiv Detail & Related papers (2023-05-26T19:08:24Z) - Enhancing Crisis-Related Tweet Classification with Entity-Masked
Language Modeling and Multi-Task Learning [0.30458514384586394]
We propose a combination of entity-masked language modeling and hierarchical multi-label classification as a multi-task learning problem.
We evaluate our method on tweets from the TREC-IS dataset and show an absolute performance gain w.r.t. F1-score of up to 10% for actionable information types.
arXiv Detail & Related papers (2022-11-21T13:54:10Z) - CrisisLTLSum: A Benchmark for Local Crisis Event Timeline Extraction and
Summarization [62.77066949111921]
This paper presents CrisisLTLSum, the largest dataset of local crisis event timelines available to date.
CrisisLTLSum contains 1,000 crisis event timelines across four domains: wildfires, local fires, traffic, and storms.
Our initial experiments indicate a significant gap between the performance of strong baselines compared to the human performance on both tasks.
arXiv Detail & Related papers (2022-10-25T17:32:40Z) - CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual
Retrieval [73.48591773882052]
Most fact-checking approaches focus on English only due to the data scarcity issue in other languages.
We present the first fact-checking framework augmented with crosslingual retrieval.
We train the retriever with our proposed Crosslingual Inverse Cloze Task (XICT)
arXiv Detail & Related papers (2022-09-05T17:36:14Z) - Combating Temporal Drift in Crisis with Adapted Embeddings [58.4558720264897]
Language usage changes over time, and this can impact the effectiveness of NLP systems.
This work investigates methods for adapting to changing discourse during crisis events.
arXiv Detail & Related papers (2021-04-17T13:11:41Z) - Event-Related Bias Removal for Real-time Disaster Events [67.2965372987723]
Social media has become an important tool to share information about crisis events such as natural disasters and mass attacks.
Detecting actionable posts that contain useful information requires rapid analysis of huge volume of data in real-time.
We train an adversarial neural model to remove latent event-specific biases and improve the performance on tweet importance classification.
arXiv Detail & Related papers (2020-11-02T02:03:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.