Floods Detection in Twitter Text and Images
- URL: http://arxiv.org/abs/2011.14943v1
- Date: Mon, 30 Nov 2020 16:08:19 GMT
- Title: Floods Detection in Twitter Text and Images
- Authors: Naina Said, Kashif Ahmad, Asma Gul, Nasir Ahmad, Ala Al-Fuqaha
- Abstract summary: This paper aims to analyze and combine textual and visual content from social media for the detection of real-world flooding events.
For text-based flood events detection, we use three different methods, relying on Bog of Words (BOW) and an Italian Version of Bert.
For the visual analysis, we rely on features extracted via multiple state-of-the-art deep models pre-trained on ImageNet.
- Score: 4.5848302154106815
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present our methods for the MediaEval 2020 Flood Related
Multimedia task, which aims to analyze and combine textual and visual content
from social media for the detection of real-world flooding events. The task
mainly focuses on identifying floods related tweets relevant to a specific
area. We propose several schemes to address the challenge. For text-based flood
events detection, we use three different methods, relying on Bog of Words (BOW)
and an Italian Version of Bert individually and in combination, achieving an
F1-score of 0.77%, 0.68%, and 0.70% on the development set, respectively. For
the visual analysis, we rely on features extracted via multiple
state-of-the-art deep models pre-trained on ImageNet. The extracted features
are then used to train multiple individual classifiers whose scores are then
combined in a late fusion manner achieving an F1-score of 0.75%. For our
mandatory multi-modal run, we combine the classification scores obtained with
the best textual and visual schemes in a late fusion manner. Overall, better
results are obtained with the multimodal scheme achieving an F1-score of 0.80%
on the development set.
Related papers
- ChartEye: A Deep Learning Framework for Chart Information Extraction [2.4936576553283287]
In this study, we propose a deep learning-based framework that provides a solution for key steps in the chart information extraction pipeline.
The proposed framework utilizes hierarchal vision transformers for the tasks of chart-type and text-role classification, while YOLOv7 for text detection.
Our proposed framework achieves excellent performance at every stage with F1-scores of 0.97 for chart-type classification, 0.91 for text-role classification, and a mean Average Precision of 0.95 for text detection.
arXiv Detail & Related papers (2024-08-28T20:22:39Z) - Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized
Visual Class Discovery [69.91441987063307]
Generalized Category Discovery (GCD) aims to cluster unlabeled data from both known and unknown categories.
Current GCD methods rely on only visual cues, which neglect the multi-modality perceptive nature of human cognitive processes in discovering novel visual categories.
We propose a two-phase TextGCD framework to accomplish multi-modality GCD by exploiting powerful Visual-Language Models.
arXiv Detail & Related papers (2024-03-12T07:06:50Z) - Align before Attend: Aligning Visual and Textual Features for Multimodal
Hateful Content Detection [4.997673761305336]
This paper proposes a context-aware attention framework for multimodal hateful content detection.
We evaluate the proposed approach on two benchmark hateful meme datasets, viz. MUTE (Bengali code-mixed) and MultiOFF (English)
arXiv Detail & Related papers (2024-02-15T06:34:15Z) - Unified Coarse-to-Fine Alignment for Video-Text Retrieval [71.85966033484597]
We propose a Unified Coarse-to-fine Alignment model, dubbed UCoFiA.
Our model captures the cross-modal similarity information at different granularity levels.
We apply the Sinkhorn-Knopp algorithm to normalize the similarities of each level before summing them.
arXiv Detail & Related papers (2023-09-18T19:04:37Z) - Generating EDU Extracts for Plan-Guided Summary Re-Ranking [77.7752504102925]
Two-step approaches, in which summary candidates are generated-then-reranked to return a single summary, can improve ROUGE scores over the standard single-step approach.
We design a novel method to generate candidates for re-ranking that addresses these issues.
We show large relevance improvements over previously published methods on widely used single document news article corpora.
arXiv Detail & Related papers (2023-05-28T17:22:04Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge
Distillation [86.41437210485932]
We aim at advancing zero-shot HOI detection to detect both seen and unseen HOIs simultaneously.
We propose a novel end-to-end zero-shot HOI Detection framework via vision-language knowledge distillation.
Our method outperforms the previous SOTA by 8.92% on unseen mAP and 10.18% on overall mAP.
arXiv Detail & Related papers (2022-04-01T07:27:19Z) - Deep Models for Visual Sentiment Analysis of Disaster-related Multimedia
Content [4.284841324544116]
This paper presents a solutions for the MediaEval 2021 task namely "Visual Sentiment Analysis: A Natural Disaster Use-case"
The task aims to extract and classify sentiments perceived by viewers and the emotional message conveyed by natural disaster-related images shared on social media.
In our proposed solutions, we rely mainly on two different state-of-the-art models namely, Inception-v3 and VggNet-19, pre-trained on ImageNet.
arXiv Detail & Related papers (2021-11-30T10:22:41Z) - MARMOT: A Deep Learning Framework for Constructing Multimodal
Representations for Vision-and-Language Tasks [0.0]
This paper proposes a novel vision-and-language framework called multimodal representations using modality translation (MARMOT)
MARMOT outperforms an ensemble text-only classifier in 19 of 20 categories in multilabel classifications of tweets reporting election incidents during the 2016 U.S. general election.
arXiv Detail & Related papers (2021-09-23T17:48:48Z) - LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document
Understanding [49.941806975280045]
Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks.
We present text-bfLMv2 by pre-training text, layout and image in a multi-modal framework.
arXiv Detail & Related papers (2020-12-29T13:01:52Z) - Flood Detection via Twitter Streams using Textual and Visual Features [5.615972945389011]
The paper presents our proposed solutions for the MediaEval 2020 Flood-Related Multimedia Task.
The task aims to analyze and detect flooding events in multimedia content shared over Twitter.
arXiv Detail & Related papers (2020-11-30T16:09:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.