A Survey on Interpretable Cross-modal Reasoning
- URL: http://arxiv.org/abs/2309.01955v2
- Date: Thu, 14 Sep 2023 13:10:34 GMT
- Title: A Survey on Interpretable Cross-modal Reasoning
- Authors: Dizhan Xue, Shengsheng Qian, Zuyi Zhou, Changsheng Xu
- Abstract summary: Cross-modal reasoning (CMR) has emerged as a pivotal area with applications spanning from multimedia analysis to healthcare diagnostics.
This survey delves into the realm of interpretable cross-modal reasoning (I-CMR)
This survey presents a comprehensive overview of the typical methods with a three-level taxonomy for I-CMR.
- Score: 64.37362731950843
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, cross-modal reasoning (CMR), the process of understanding
and reasoning across different modalities, has emerged as a pivotal area with
applications spanning from multimedia analysis to healthcare diagnostics. As
the deployment of AI systems becomes more ubiquitous, the demand for
transparency and comprehensibility in these systems' decision-making processes
has intensified. This survey delves into the realm of interpretable cross-modal
reasoning (I-CMR), where the objective is not only to achieve high predictive
performance but also to provide human-understandable explanations for the
results. This survey presents a comprehensive overview of the typical methods
with a three-level taxonomy for I-CMR. Furthermore, this survey reviews the
existing CMR datasets with annotations for explanations. Finally, this survey
summarizes the challenges for I-CMR and discusses potential future directions.
In conclusion, this survey aims to catalyze the progress of this emerging
research area by providing researchers with a panoramic and comprehensive
perspective, illuminating the state of the art and discerning the
opportunities. The summarized methods, datasets, and other resources are
available at
https://github.com/ZuyiZhou/Awesome-Interpretable-Cross-modal-Reasoning.
Related papers
- From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models [56.9134620424985]
Cross-modal reasoning (CMR) is increasingly recognized as a crucial capability in the progression toward more sophisticated artificial intelligence systems.
The recent trend of deploying Large Language Models (LLMs) to tackle CMR tasks has marked a new mainstream of approaches for enhancing their effectiveness.
This survey offers a nuanced exposition of current methodologies applied in CMR using LLMs, classifying these into a detailed three-tiered taxonomy.
arXiv Detail & Related papers (2024-09-19T02:51:54Z) - A Survey on Retrieval-Augmented Text Generation for Large Language Models [1.4579344926652844]
Retrieval-Augmented Generation (RAG) merges retrieval methods with deep learning advancements.
This paper organizes the RAG paradigm into four categories: pre-retrieval, retrieval, post-retrieval, and generation.
It outlines RAG's evolution and discusses the field's progression through the analysis of significant studies.
arXiv Detail & Related papers (2024-04-17T01:27:42Z) - LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models [75.89014602596673]
Strategic reasoning requires understanding and predicting adversary actions in multi-agent settings while adjusting strategies accordingly.
We explore the scopes, applications, methodologies, and evaluation metrics related to strategic reasoning with Large Language Models.
It underscores the importance of strategic reasoning as a critical cognitive capability and offers insights into future research directions and potential improvements.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - Advancing Explainable Autonomous Vehicle Systems: A Comprehensive Review and Research Roadmap [4.2330023661329355]
This study presents a review to discuss the complexities associated with explanation generation and presentation.
Our roadmap is underpinned by principles of responsible research and innovation.
By exploring these research directions, the study aims to guide the development and deployment of explainable AVs.
arXiv Detail & Related papers (2024-03-19T11:43:41Z) - Multi-agent Reinforcement Learning: A Comprehensive Survey [10.186029242664931]
Multi-agent systems (MAS) are widely prevalent and crucially important in numerous real-world applications.
Despite their ubiquity, the development of intelligent decision-making agents in MAS poses several open challenges to their effective implementation.
This survey examines these challenges, placing an emphasis on studying seminal concepts from game theory (GT) and machine learning (ML)
arXiv Detail & Related papers (2023-12-15T23:16:54Z) - Multimodal Explainable Artificial Intelligence: A Comprehensive Review of Methodological Advances and Future Research Directions [2.35574869517894]
This study focuses on analyzing the recent advances in the area of Multimodal XAI (MXAI)
MXAI comprises methods that involve multiple modalities in the primary prediction and explanation tasks.
arXiv Detail & Related papers (2023-06-09T07:51:50Z) - Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual
Recognition [57.08108545219043]
Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision.
Existing literature addresses this challenge by employing local-based representation approaches.
This article proposes a novel model, Robust Saliency-aware Distillation (RSaD), for few-shot fine-grained visual recognition.
arXiv Detail & Related papers (2023-05-12T00:13:17Z) - Image-text Retrieval: A Survey on Recent Research and Development [58.060687870247996]
Cross-modal image-text retrieval (ITR) has experienced increased interest in the research community due to its excellent research value and broad real-world application.
This paper presents a comprehensive and up-to-date survey on the ITR approaches from four perspectives.
arXiv Detail & Related papers (2022-03-28T13:00:01Z) - Deep Co-Attention Network for Multi-View Subspace Learning [73.3450258002607]
We propose a deep co-attention network for multi-view subspace learning.
It aims to extract both the common information and the complementary information in an adversarial setting.
In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation.
arXiv Detail & Related papers (2021-02-15T18:46:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.