MEDIC: A Multi-Task Learning Dataset for Disaster Image Classification
- URL: http://arxiv.org/abs/2108.12828v1
- Date: Sun, 29 Aug 2021 11:55:50 GMT
- Title: MEDIC: A Multi-Task Learning Dataset for Disaster Image Classification
- Authors: Firoj Alam, Tanvirul Alam, Md. Arid Hasan, Abul Hasnat, Muhammad
Imran, Ferda Ofli
- Abstract summary: We propose MEDIC, the largest social media image classification dataset for humanitarian response.
MEDIC consists of 71,198 images to address four different tasks in a multi-task learning setup.
This is the first dataset of its kind: social media image, disaster response, and multi-task learning research.
- Score: 6.167082944123002
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent research in disaster informatics demonstrates a practical and
important use case of artificial intelligence to save human lives and
sufferings during post-natural disasters based on social media contents (text
and images). While notable progress has been made using texts, research on
exploiting the images remains relatively under-explored. To advance the
image-based approach, we propose MEDIC (available at:
https://crisisnlp.qcri.org/medic/index.html), which is the largest social media
image classification dataset for humanitarian response consisting of 71,198
images to address four different tasks in a multi-task learning setup. This is
the first dataset of its kind: social media image, disaster response, and
multi-task learning research. An important property of this dataset is its high
potential to contribute research on multi-task learning, which recently
receives much interest from the machine learning community and has shown
remarkable results in terms of memory, inference speed, performance, and
generalization capability. Therefore, the proposed dataset is an important
resource for advancing image-based disaster management and multi-task machine
learning research.
Related papers
- Self-Supervised Learning for Medical Image Data with Anatomy-Oriented Imaging Planes [28.57933404578436]
We propose two complementary pretext tasks for medical image data.
The first is to learn the relative orientation between the imaging planes and implemented as regressing their intersecting lines.
The second exploits parallel imaging planes to regress their relative slice locations within a stack.
arXiv Detail & Related papers (2024-03-25T07:34:06Z) - Stellar: Systematic Evaluation of Human-Centric Personalized
Text-to-Image Methods [52.806258774051216]
We focus on text-to-image systems that input a single image of an individual and ground the generation process along with text describing the desired visual context.
We introduce a standardized dataset (Stellar) that contains personalized prompts coupled with images of individuals that is an order of magnitude larger than existing relevant datasets and where rich semantic ground-truth annotations are readily available.
We derive a simple yet efficient, personalized text-to-image baseline that does not require test-time fine-tuning for each subject and which sets quantitatively and in human trials a new SoTA.
arXiv Detail & Related papers (2023-12-11T04:47:39Z) - Learning Transferable Pedestrian Representation from Multimodal
Information Supervision [174.5150760804929]
VAL-PAT is a novel framework that learns transferable representations to enhance various pedestrian analysis tasks with multimodal information.
We first perform pre-training on LUPerson-TA dataset, where each image contains text and attribute annotations.
We then transfer the learned representations to various downstream tasks, including person reID, person attribute recognition and text-based person search.
arXiv Detail & Related papers (2023-04-12T01:20:58Z) - Learning to Exploit Temporal Structure for Biomedical Vision-Language
Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities.
We explicitly account for prior images and reports when available during both training and fine-tuning.
Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z) - Where Does the Performance Improvement Come From? - A Reproducibility
Concern about Image-Text Retrieval [85.03655458677295]
Image-text retrieval has gradually become a major research direction in the field of information retrieval.
We first examine the related concerns and why the focus is on image-text retrieval tasks.
We analyze various aspects of the reproduction of pretrained and nonpretrained retrieval models.
arXiv Detail & Related papers (2022-03-08T05:01:43Z) - Incidents1M: a large-scale dataset of images with natural disasters,
damage, and incidents [28.16346818821349]
Natural disasters, such as floods, tornadoes, or wildfires, are increasingly pervasive as the Earth undergoes global warming.
It is difficult to predict when and where an incident will occur, so timely emergency response is critical to saving the lives of those endangered by destructive events.
Social media posts can be used as a low-latency data source to understand the progression and aftermath of a disaster, yet parsing this data is tedious without automated methods.
In this work, we present the Incidents1M dataset, a large-scale multi-label dataset which contains 977,088 images, with 43 incident and 49 place categories.
arXiv Detail & Related papers (2022-01-11T23:03:57Z) - Enhancing Social Relation Inference with Concise Interaction Graph and
Discriminative Scene Representation [56.25878966006678]
We propose an approach of textbfPRactical textbfInference in textbfSocial rtextbfElation (PRISE)
It concisely learns interactive features of persons and discriminative features of holistic scenes.
PRISE achieves 6.8$%$ improvement for domain classification in PIPA dataset.
arXiv Detail & Related papers (2021-07-30T04:20:13Z) - Social Media Images Classification Models for Real-time Disaster
Response [5.937482215664902]
Images shared on social media help crisis managers in terms of gaining situational awareness and assessing incurred damages.
Real-time image classification became an urgent need in order to take a faster response.
Recent advances in computer vision and deep neural networks have enabled the development of models for real-time image classification.
arXiv Detail & Related papers (2021-04-09T04:30:04Z) - Factors of Influence for Transfer Learning across Diverse Appearance
Domains and Task Types [50.1843146606122]
A simple form of transfer learning is common in current state-of-the-art computer vision models.
Previous systematic studies of transfer learning have been limited and the circumstances in which it is expected to work are not fully understood.
In this paper we carry out an extensive experimental exploration of transfer learning across vastly different image domains.
arXiv Detail & Related papers (2021-03-24T16:24:20Z) - Deep Learning Benchmarks and Datasets for Social Media Image
Classification for Disaster Response [5.610924570214424]
We propose new datasets for disaster type detection, informativeness classification, and damage severity assessment.
We benchmark several state-of-the-art deep learning models and achieve promising results.
We release our datasets and models publicly, aiming to provide proper baselines as well as to spur further research in the crisis informatics community.
arXiv Detail & Related papers (2020-11-17T20:15:49Z) - Analysis of Social Media Data using Multimodal Deep Learning for
Disaster Response [6.8889797054846795]
We propose to use both text and image modalities of social media data to learn a joint representation using state-of-the-art deep learning techniques.
Experiments on real-world disaster datasets show that the proposed multimodal architecture yields better performance than models trained using a single modality.
arXiv Detail & Related papers (2020-04-14T19:36:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.