Flood Detection via Twitter Streams using Textual and Visual Features
- URL: http://arxiv.org/abs/2011.14944v1
- Date: Mon, 30 Nov 2020 16:09:11 GMT
- Title: Flood Detection via Twitter Streams using Textual and Visual Features
- Authors: Firoj Alam, Zohaib Hassan, Kashif Ahmad, Asma Gul, Michael Reiglar,
Nicola Conci, Ala AL-Fuqaha
- Abstract summary: The paper presents our proposed solutions for the MediaEval 2020 Flood-Related Multimedia Task.
The task aims to analyze and detect flooding events in multimedia content shared over Twitter.
- Score: 5.615972945389011
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The paper presents our proposed solutions for the MediaEval 2020
Flood-Related Multimedia Task, which aims to analyze and detect flooding events
in multimedia content shared over Twitter. In total, we proposed four different
solutions including a multi-modal solution combining textual and visual
information for the mandatory run, and three single modal image and text-based
solutions as optional runs. In the multimodal method, we rely on a supervised
multimodal bitransformer model that combines textual and visual features in an
early fusion, achieving a micro F1-score of .859 on the development data set.
For the text-based flood events detection, we use a transformer network (i.e.,
pretrained Italian BERT model) achieving an F1-score of .853. For image-based
solutions, we employed multiple deep models, pre-trained on both, the ImageNet
and places data sets, individually and combined in an early fusion achieving
F1-scores of .816 and .805 on the development set, respectively.
Related papers
- Multimodality Helps Few-Shot 3D Point Cloud Semantic Segmentation [61.91492500828508]
Few-shot 3D point cloud segmentation (FS-PCS) aims at generalizing models to segment novel categories with minimal support samples.
We introduce a cost-free multimodal FS-PCS setup, utilizing textual labels and the potentially available 2D image modality.
We propose a simple yet effective Test-time Adaptive Cross-modal Seg (TACC) technique to mitigate training bias.
arXiv Detail & Related papers (2024-10-29T19:28:41Z) - Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks [62.758680527838436]
Leopard is a vision-language model for handling vision-language tasks involving multiple text-rich images.
First, we curated about one million high-quality multimodal instruction-tuning data, tailored to text-rich, multi-image scenarios.
Second, we developed an adaptive high-resolution multi-image encoding module to dynamically optimize the allocation of visual sequence length.
arXiv Detail & Related papers (2024-10-02T16:55:01Z) - Multimodal Cross-Document Event Coreference Resolution Using Linear Semantic Transfer and Mixed-Modality Ensembles [8.233126457964834]
Event coreference resolution (ECR) is the task of determining whether distinct mentions of events are actually linked to the same underlying occurrence.
Here, we propose a multimodal cross-document event coreference resolution method that integrates visual and textual cues with a simple linear map between vision and language models.
Our results demonstrate the utility of multimodal information in ECR for certain challenging coreference problems.
arXiv Detail & Related papers (2024-04-13T10:01:58Z) - Multi-modal Stance Detection: New Datasets and Model [56.97470987479277]
We study multi-modal stance detection for tweets consisting of texts and images.
We propose a simple yet effective Targeted Multi-modal Prompt Tuning framework (TMPT)
TMPT achieves state-of-the-art performance in multi-modal stance detection.
arXiv Detail & Related papers (2024-02-22T05:24:19Z) - EDIS: Entity-Driven Image Search over Multimodal Web Content [95.40238328527931]
We introduce textbfEntity-textbfDriven textbfImage textbfSearch (EDIS), a dataset for cross-modal image search in the news domain.
EDIS consists of 1 million web images from actual search engine results and curated datasets, with each image paired with a textual description.
arXiv Detail & Related papers (2023-05-23T02:59:19Z) - Iterative Adversarial Attack on Image-guided Story Ending Generation [37.42908817585858]
Multimodal learning involves developing models that can integrate information from various sources like images and texts.
Deep neural networks, which are the backbone of recent IgSEG models, are vulnerable to adversarial samples.
We propose an iterative adversarial attack method (Iterative-attack) that fuses image and text modality attacks.
arXiv Detail & Related papers (2023-05-16T06:19:03Z) - M2HF: Multi-level Multi-modal Hybrid Fusion for Text-Video Retrieval [34.343617836027725]
We propose a multi-level multi-modal hybrid fusion network to explore comprehensive interactions between text queries and each modality content in videos.
Our framework provides two kinds of training strategies, including an ensemble manner and an end-to-end manner.
arXiv Detail & Related papers (2022-08-16T10:51:37Z) - MHMS: Multimodal Hierarchical Multimedia Summarization [80.18786847090522]
We propose a multimodal hierarchical multimedia summarization (MHMS) framework by interacting visual and language domains.
Our method contains video and textual segmentation and summarization module, respectively.
It formulates a cross-domain alignment objective with optimal transport distance to generate the representative and textual summary.
arXiv Detail & Related papers (2022-04-07T21:00:40Z) - FiLMing Multimodal Sarcasm Detection with Attention [0.7340017786387767]
Sarcasm detection identifies natural language expressions whose intended meaning is different from what is implied by its surface meaning.
We propose a novel architecture that uses the RoBERTa model with a co-attention layer on top to incorporate context incongruity between input text and image attributes.
Our results demonstrate that our proposed model outperforms the existing state-of-the-art method by 6.14% F1 score on the public Twitter multimodal detection dataset.
arXiv Detail & Related papers (2021-08-09T06:33:29Z) - Floods Detection in Twitter Text and Images [4.5848302154106815]
This paper aims to analyze and combine textual and visual content from social media for the detection of real-world flooding events.
For text-based flood events detection, we use three different methods, relying on Bog of Words (BOW) and an Italian Version of Bert.
For the visual analysis, we rely on features extracted via multiple state-of-the-art deep models pre-trained on ImageNet.
arXiv Detail & Related papers (2020-11-30T16:08:19Z) - Multimodal Categorization of Crisis Events in Social Media [81.07061295887172]
We present a new multimodal fusion method that leverages both images and texts as input.
In particular, we introduce a cross-attention module that can filter uninformative and misleading components from weak modalities.
We show that our method outperforms the unimodal approaches and strong multimodal baselines by a large margin on three crisis-related tasks.
arXiv Detail & Related papers (2020-04-10T06:31:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.