CrisisKAN: Knowledge-infused and Explainable Multimodal Attention
Network for Crisis Event Classification
- URL: http://arxiv.org/abs/2401.06194v1
- Date: Thu, 11 Jan 2024 13:22:38 GMT
- Title: CrisisKAN: Knowledge-infused and Explainable Multimodal Attention
Network for Crisis Event Classification
- Authors: Shubham Gupta, Nandini Saini, Suman Kundu, Debasis Das
- Abstract summary: CrisisKAN is a Knowledge-infused and Explainable Multimodal Attention Network that entails images and texts in conjunction with external knowledge from Wikipedia to classify crisis events.
To enrich the context-specific understanding of textual information, we integrated Wikipedia knowledge using proposed wiki extraction algorithm.
In order to ensure reliability, we employ a model-specific approach calledGrad-CAM that provides a robust explanation of the predictions of the proposed model.
- Score: 25.93602006155562
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Pervasive use of social media has become the emerging source for real-time
information (like images, text, or both) to identify various events. Despite
the rapid growth of image and text-based event classification, the
state-of-the-art (SOTA) models find it challenging to bridge the semantic gap
between features of image and text modalities due to inconsistent encoding.
Also, the black-box nature of models fails to explain the model's outcomes for
building trust in high-stakes situations such as disasters, pandemic.
Additionally, the word limit imposed on social media posts can potentially
introduce bias towards specific events. To address these issues, we proposed
CrisisKAN, a novel Knowledge-infused and Explainable Multimodal Attention
Network that entails images and texts in conjunction with external knowledge
from Wikipedia to classify crisis events. To enrich the context-specific
understanding of textual information, we integrated Wikipedia knowledge using
proposed wiki extraction algorithm. Along with this, a guided cross-attention
module is implemented to fill the semantic gap in integrating visual and
textual data. In order to ensure reliability, we employ a model-specific
approach called Gradient-weighted Class Activation Mapping (Grad-CAM) that
provides a robust explanation of the predictions of the proposed model. The
comprehensive experiments conducted on the CrisisMMD dataset yield in-depth
analysis across various crisis-specific tasks and settings. As a result,
CrisisKAN outperforms existing SOTA methodologies and provides a novel view in
the domain of explainable multimodal event classification.
Related papers
- Enhancing Argument Structure Extraction with Efficient Leverage of
Contextual Information [79.06082391992545]
We propose an Efficient Context-aware model (ECASE) that fully exploits contextual information.
We introduce a sequence-attention module and distance-weighted similarity loss to aggregate contextual information and argumentative information.
Our experiments on five datasets from various domains demonstrate that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-10-08T08:47:10Z) - Information Screening whilst Exploiting! Multimodal Relation Extraction
with Feature Denoising and Multimodal Topic Modeling [96.75821232222201]
Existing research on multimodal relation extraction (MRE) faces two co-existing challenges, internal-information over-utilization and external-information under-exploitation.
We propose a novel framework that simultaneously implements the idea of internal-information screening and external-information exploiting.
arXiv Detail & Related papers (2023-05-19T14:56:57Z) - Iterative Adversarial Attack on Image-guided Story Ending Generation [37.42908817585858]
Multimodal learning involves developing models that can integrate information from various sources like images and texts.
Deep neural networks, which are the backbone of recent IgSEG models, are vulnerable to adversarial samples.
We propose an iterative adversarial attack method (Iterative-attack) that fuses image and text modality attacks.
arXiv Detail & Related papers (2023-05-16T06:19:03Z) - Interpretable Detection of Out-of-Context Misinformation with Neural-Symbolic-Enhanced Large Multimodal Model [16.348950072491697]
Misinformation creators now more tend to use out-of- multimedia contents to deceive the public and fake news detection systems.
This new type of misinformation increases the difficulty of not only detection but also clarification, because every individual modality is close enough to true information.
In this paper we explore how to achieve interpretable cross-modal de-contextualization detection that simultaneously identifies the mismatched pairs and the cross-modal contradictions.
arXiv Detail & Related papers (2023-04-15T21:11:55Z) - Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image
Diffusion Models [103.61066310897928]
Recent text-to-image generative models have demonstrated an unparalleled ability to generate diverse and creative imagery guided by a target text prompt.
While revolutionary, current state-of-the-art diffusion models may still fail in generating images that fully convey the semantics in the given text prompt.
We analyze the publicly available Stable Diffusion model and assess the existence of catastrophic neglect, where the model fails to generate one or more of the subjects from the input prompt.
We introduce the concept of Generative Semantic Nursing (GSN), where we seek to intervene in the generative process on the fly during inference time to improve the faithfulness
arXiv Detail & Related papers (2023-01-31T18:10:38Z) - Learning to Model Multimodal Semantic Alignment for Story Visualization [58.16484259508973]
Story visualization aims to generate a sequence of images to narrate each sentence in a multi-sentence story.
Current works face the problem of semantic misalignment because of their fixed architecture and diversity of input modalities.
We explore the semantic alignment between text and image representations by learning to match their semantic levels in the GAN-based generative model.
arXiv Detail & Related papers (2022-11-14T11:41:44Z) - CLIP-Event: Connecting Text and Images with Event Structures [123.31452120399827]
We propose a contrastive learning framework to enforce vision-language pretraining models.
We take advantage of text information extraction technologies to obtain event structural knowledge.
Experiments show that our zero-shot CLIP-Event outperforms the state-of-the-art supervised model in argument extraction.
arXiv Detail & Related papers (2022-01-13T17:03:57Z) - STEEX: Steering Counterfactual Explanations with Semantics [28.771471624014065]
Deep learning models are increasingly used in safety-critical applications.
For simple images, such as low-resolution face portraits, visual counterfactual explanations has recently been proposed.
We propose a new generative counterfactual explanation framework that produces plausible and sparse modifications.
arXiv Detail & Related papers (2021-11-17T13:20:29Z) - Multimodal Categorization of Crisis Events in Social Media [81.07061295887172]
We present a new multimodal fusion method that leverages both images and texts as input.
In particular, we introduce a cross-attention module that can filter uninformative and misleading components from weak modalities.
We show that our method outperforms the unimodal approaches and strong multimodal baselines by a large margin on three crisis-related tasks.
arXiv Detail & Related papers (2020-04-10T06:31:30Z) - Unsupervised and Interpretable Domain Adaptation to Rapidly Filter
Tweets for Emergency Services [18.57009530004948]
We present a novel method to classify relevant tweets during an ongoing crisis using the publicly available dataset of TREC incident streams.
We use dedicated attention layers for each task to provide model interpretability; critical for real-word applications.
We show a practical implication of our work by providing a use-case for the COVID-19 pandemic.
arXiv Detail & Related papers (2020-03-04T06:40:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.