Related papers: Multimodal Learning with Augmentation Techniques for Natural Disaster Assessment

Multimodal Learning with Augmentation Techniques for Natural Disaster Assessment

URL: http://arxiv.org/abs/2511.00004v1
Date: Sat, 04 Oct 2025 18:51:54 GMT
Title: Multimodal Learning with Augmentation Techniques for Natural Disaster Assessment
Authors: Adrian-Dinu Urse, Dumitru-Clementin Cercel, Florin Pop,
Abstract summary: Disaster assessment relies on accurate and rapid access to information, with social media emerging as a valuable real-time source.<n>This paper explores augmentation techniques to address these issues on the CrisisMMD multimodal dataset.<n>For visual data, we apply diffusion-based methods, namely Real Guidance and DiffuseMix.<n>For text data, we explore back-translation, paraphrasing with transformers, and image caption-based augmentation.
Score: 3.0911537708814483
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Natural disaster assessment relies on accurate and rapid access to information, with social media emerging as a valuable real-time source. However, existing datasets suffer from class imbalance and limited samples, making effective model development a challenging task. This paper explores augmentation techniques to address these issues on the CrisisMMD multimodal dataset. For visual data, we apply diffusion-based methods, namely Real Guidance and DiffuseMix. For text data, we explore back-translation, paraphrasing with transformers, and image caption-based augmentation. We evaluated these across unimodal, multimodal, and multi-view learning setups. Results show that selected augmentations improve classification performance, particularly for underrepresented classes, while multi-view learning introduces potential but requires further refinement. This study highlights effective augmentation strategies for building more robust disaster assessment systems.

Related papers

Differential Attention for Multimodal Crisis Event Analysis [1.5030693386126894]
Social networks can be a valuable source of information during crisis events.<n>We explore vision language models (VLMs) and advanced fusion strategies to enhance the classification of crisis data.<n>Our results show that the combination of pretrained VLMs, enriched textual descriptions, and adaptive fusion strategies consistently outperforms state-of-the-art models in classification accuracy.
arXiv Detail & Related papers (2025-07-07T16:20:35Z)
A Self-Learning Multimodal Approach for Fake News Detection [35.98977478616019]
We introduce a self-learning multimodal model for fake news classification.<n>The model leverages contrastive learning, a robust method for feature extraction that operates without requiring labeled data.<n>Our experimental results on a public dataset demonstrate that the proposed model outperforms several state-of-the-art classification approaches.
arXiv Detail & Related papers (2024-12-08T07:41:44Z)
RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks. Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs. In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z)
A Review of Image Retrieval Techniques: Data Augmentation and Adversarial Learning Approaches [0.0]
This review focuses on the roles of data augmentation and adversarial learning techniques in enhancing retrieval performance. Data augmentation enhances the model's generalization ability and robustness by generating more diverse training samples, simulating real-world variations, and reducing overfitting. adversarial attacks and defenses introduce perturbations during training to improve the model's robustness against potential attacks.
arXiv Detail & Related papers (2024-09-02T12:55:17Z)
A Simple Background Augmentation Method for Object Detection with Diffusion Model [53.32935683257045]
In computer vision, it is well-known that a lack of data diversity will impair model performance. We propose a simple yet effective data augmentation approach by leveraging advancements in generative models. Background augmentation, in particular, significantly improves the models' robustness and generalization capabilities.
arXiv Detail & Related papers (2024-08-01T07:40:00Z)
Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition [52.522244807811894]
We propose a novel multimodal Transformer framework using prompt learning to address the issue of missing modalities. Our method introduces three types of prompts: generative prompts, missing-signal prompts, and missing-type prompts. Through prompt learning, we achieve a substantial reduction in the number of trainable parameters.
arXiv Detail & Related papers (2024-07-07T13:55:56Z)
CrisisSense-LLM: Instruction Fine-Tuned Large Language Model for Multi-label Social Media Text Classification in Disaster Informatics [50.122541222825156]
This study introduces a novel approach to disaster text classification by enhancing a pre-trained Large Language Model (LLM)<n>Our methodology involves creating a comprehensive instruction dataset from disaster-related tweets, which is then used to fine-tune an open-source LLM.<n>This fine-tuned model can classify multiple aspects of disaster-related information simultaneously, such as the type of event, informativeness, and involvement of human aid.
arXiv Detail & Related papers (2024-06-16T23:01:10Z)
Diffusion Deepfake [41.59597965760673]
Recent progress in generative AI, primarily through diffusion models, presents significant challenges for real-world deepfake detection. The increased realism in image details, diverse content, and widespread accessibility to the general public complicates the identification of these sophisticated deepfakes. This paper introduces two extensive deepfake datasets generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2024-04-02T02:17:50Z)
Borrowing Treasures from Neighbors: In-Context Learning for Multimodal Learning with Missing Modalities and Data Scarcity [9.811378971225727]
This paper extends the current research into missing modalities to the low-data regime. It is often expensive to get full-modality data and sufficient annotated training samples. We propose to use retrieval-augmented in-context learning to address these two crucial issues.
arXiv Detail & Related papers (2024-03-14T14:19:48Z)
Efficient Prompt Tuning of Large Vision-Language Model for Fine-Grained Ship Classification [59.99976102069976]
Fine-grained ship classification in remote sensing (RS-FGSC) poses a significant challenge due to the high similarity between classes and the limited availability of labeled data.<n>Recent advancements in large pre-trained Vision-Language Models (VLMs) have demonstrated impressive capabilities in few-shot or zero-shot learning.<n>This study delves into harnessing the potential of VLMs to enhance classification accuracy for unseen ship categories.
arXiv Detail & Related papers (2024-03-13T05:48:58Z)
Multimodal Categorization of Crisis Events in Social Media [81.07061295887172]
We present a new multimodal fusion method that leverages both images and texts as input. In particular, we introduce a cross-attention module that can filter uninformative and misleading components from weak modalities. We show that our method outperforms the unimodal approaches and strong multimodal baselines by a large margin on three crisis-related tasks.
arXiv Detail & Related papers (2020-04-10T06:31:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.