Related papers: DeCrisisMB: Debiased Semi-Supervised Learning for Crisis Tweet Classification via Memory Bank

DeCrisisMB: Debiased Semi-Supervised Learning for Crisis Tweet Classification via Memory Bank

URL: http://arxiv.org/abs/2310.14577v1
Date: Mon, 23 Oct 2023 05:25:51 GMT
Title: DeCrisisMB: Debiased Semi-Supervised Learning for Crisis Tweet Classification via Memory Bank
Authors: Henry Peng Zou, Yue Zhou, Weizhi Zhang, Cornelia Caragea
Abstract summary: In crisis events, people often use social media platforms such as Twitter to disseminate information about the situation, warnings, advice, and support. fully-supervised approaches require annotating vast amounts of data and are impractical due to limited response time. Semi-supervised models can be biased, performing moderately well for certain classes while performing extremely poorly for others. We propose a simple but effective debiasing method, DeCrisisMB, that utilizes a Memory Bank to store and perform equal sampling for generated pseudo-labels from each class at each training.
Score: 52.20298962359658
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: During crisis events, people often use social media platforms such as Twitter to disseminate information about the situation, warnings, advice, and support. Emergency relief organizations leverage such information to acquire timely crisis circumstances and expedite rescue operations. While existing works utilize such information to build models for crisis event analysis, fully-supervised approaches require annotating vast amounts of data and are impractical due to limited response time. On the other hand, semi-supervised models can be biased, performing moderately well for certain classes while performing extremely poorly for others, resulting in substantially negative effects on disaster monitoring and rescue. In this paper, we first study two recent debiasing methods on semi-supervised crisis tweet classification. Then we propose a simple but effective debiasing method, DeCrisisMB, that utilizes a Memory Bank to store and perform equal sampling for generated pseudo-labels from each class at each training iteration. Extensive experiments are conducted to compare different debiasing methods' performance and generalization ability in both in-distribution and out-of-distribution settings. The results demonstrate the superior performance of our proposed method. Our code is available at https://github.com/HenryPengZou/DeCrisisMB.

Related papers

CrisisSense-LLM: Instruction Fine-Tuned Large Language Model for Multi-label Social Media Text Classification in Disaster Informatics [49.2719253711215]
This study introduces a novel approach to disaster text classification by enhancing a pre-trained Large Language Model (LLM) Our methodology involves creating a comprehensive instruction dataset from disaster-related tweets, which is then used to fine-tune an open-source LLM. This fine-tuned model can classify multiple aspects of disaster-related information simultaneously, such as the type of event, informativeness, and involvement of human aid.
arXiv Detail & Related papers (2024-06-16T23:01:10Z)
TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time (Extended Version) [18.146377453918724]
Malware detectors often experience performance decay due to constantly evolving operating systems and attack methods. This paper argues that commonly reported results are inflated due to two pervasive sources of experimental bias in the detection task.
arXiv Detail & Related papers (2024-02-02T12:27:32Z)
Segue: Side-information Guided Generative Unlearnable Examples for Facial Privacy Protection in Real World [64.4289385463226]
We propose Segue: Side-information guided generative unlearnable examples. To improve transferability, we introduce side information such as true labels and pseudo labels. It can resist JPEG compression, adversarial training, and some standard data augmentations.
arXiv Detail & Related papers (2023-10-24T06:22:37Z)
CrisisMatch: Semi-Supervised Few-Shot Learning for Fine-Grained Disaster Tweet Classification [51.58605842457186]
We present a fine-grained disaster tweet classification model under the semi-supervised, few-shot learning setting. Our model, CrisisMatch, effectively classifies tweets into fine-grained classes of interest using few labeled data and large amounts of unlabeled data.
arXiv Detail & Related papers (2023-10-23T07:01:09Z)
CrisisLTLSum: A Benchmark for Local Crisis Event Timeline Extraction and Summarization [62.77066949111921]
This paper presents CrisisLTLSum, the largest dataset of local crisis event timelines available to date. CrisisLTLSum contains 1,000 crisis event timelines across four domains: wildfires, local fires, traffic, and storms. Our initial experiments indicate a significant gap between the performance of strong baselines compared to the human performance on both tasks.
arXiv Detail & Related papers (2022-10-25T17:32:40Z)
Event-Related Bias Removal for Real-time Disaster Events [67.2965372987723]
Social media has become an important tool to share information about crisis events such as natural disasters and mass attacks. Detecting actionable posts that contain useful information requires rapid analysis of huge volume of data in real-time. We train an adversarial neural model to remove latent event-specific biases and improve the performance on tweet importance classification.
arXiv Detail & Related papers (2020-11-02T02:03:07Z)
CrisisBench: Benchmarking Crisis-related Social Media Datasets for Humanitarian Information Processing [13.11283003017537]
We consolidate eight human-annotated datasets and provide 166.1k and 141.5k tweets for textitinformativeness and textithumanitarian classification tasks. We provide benchmarks for both binary and multiclass classification tasks using several deep learning architecrures including, CNN, fastText, and transformers.
arXiv Detail & Related papers (2020-04-14T19:51:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.