Related papers: A Survey on Interpretable Cross-modal Reasoning

A Survey on Interpretable Cross-modal Reasoning

URL: http://arxiv.org/abs/2309.01955v2
Date: Thu, 14 Sep 2023 13:10:34 GMT
Title: A Survey on Interpretable Cross-modal Reasoning
Authors: Dizhan Xue, Shengsheng Qian, Zuyi Zhou, Changsheng Xu
Abstract summary: Cross-modal reasoning (CMR) has emerged as a pivotal area with applications spanning from multimedia analysis to healthcare diagnostics. This survey delves into the realm of interpretable cross-modal reasoning (I-CMR) This survey presents a comprehensive overview of the typical methods with a three-level taxonomy for I-CMR.
Score: 64.37362731950843
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, cross-modal reasoning (CMR), the process of understanding and reasoning across different modalities, has emerged as a pivotal area with applications spanning from multimedia analysis to healthcare diagnostics. As the deployment of AI systems becomes more ubiquitous, the demand for transparency and comprehensibility in these systems' decision-making processes has intensified. This survey delves into the realm of interpretable cross-modal reasoning (I-CMR), where the objective is not only to achieve high predictive performance but also to provide human-understandable explanations for the results. This survey presents a comprehensive overview of the typical methods with a three-level taxonomy for I-CMR. Furthermore, this survey reviews the existing CMR datasets with annotations for explanations. Finally, this survey summarizes the challenges for I-CMR and discusses potential future directions. In conclusion, this survey aims to catalyze the progress of this emerging research area by providing researchers with a panoramic and comprehensive perspective, illuminating the state of the art and discerning the opportunities. The summarized methods, datasets, and other resources are available at https://github.com/ZuyiZhou/Awesome-Interpretable-Cross-modal-Reasoning.

Related papers

Survey on AI-Generated Media Detection: From Non-MLLM to MLLM [51.91311158085973]
Methods for detecting AI-generated media have evolved rapidly. General-purpose detectors based on MLLMs integrate authenticity verification, explainability, and localization capabilities. Ethical and security considerations have emerged as critical global concerns.
arXiv Detail & Related papers (2025-02-07T12:18:20Z)
A Survey on Computational Pathology Foundation Models: Datasets, Adaptation Strategies, and Evaluation Tasks [22.806228975730008]
Computational pathology foundation models (CPathFMs) have emerged as a powerful approach for analyzing histological data. These models have demonstrated promise in automating complex pathology tasks such as segmentation, classification, and biomarker discovery. However, the development of CPathFMs presents significant challenges, such as limited data accessibility, high variability across datasets, and lack of standardized evaluation benchmarks.
arXiv Detail & Related papers (2025-01-27T01:27:59Z)
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models [56.9134620424985]
Cross-modal reasoning (CMR) is increasingly recognized as a crucial capability in the progression toward more sophisticated artificial intelligence systems. The recent trend of deploying Large Language Models (LLMs) to tackle CMR tasks has marked a new mainstream of approaches for enhancing their effectiveness. This survey offers a nuanced exposition of current methodologies applied in CMR using LLMs, classifying these into a detailed three-tiered taxonomy.
arXiv Detail & Related papers (2024-09-19T02:51:54Z)
A Survey on Retrieval-Augmented Text Generation for Large Language Models [1.4579344926652844]
Retrieval-Augmented Generation (RAG) merges retrieval methods with deep learning advancements. This paper organizes the RAG paradigm into four categories: pre-retrieval, retrieval, post-retrieval, and generation. It outlines RAG's evolution and discusses the field's progression through the analysis of significant studies.
arXiv Detail & Related papers (2024-04-17T01:27:42Z)
LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models [75.89014602596673]
Strategic reasoning requires understanding and predicting adversary actions in multi-agent settings while adjusting strategies accordingly. We explore the scopes, applications, methodologies, and evaluation metrics related to strategic reasoning with Large Language Models. It underscores the importance of strategic reasoning as a critical cognitive capability and offers insights into future research directions and potential improvements.
arXiv Detail & Related papers (2024-04-01T16:50:54Z)
Advancing Explainable Autonomous Vehicle Systems: A Comprehensive Review and Research Roadmap [4.2330023661329355]
This study presents a review to discuss the complexities associated with explanation generation and presentation. Our roadmap is underpinned by principles of responsible research and innovation. By exploring these research directions, the study aims to guide the development and deployment of explainable AVs.
arXiv Detail & Related papers (2024-03-19T11:43:41Z)
Multi-agent Reinforcement Learning: A Comprehensive Survey [10.186029242664931]
Multi-agent systems (MAS) are widely prevalent and crucially important in numerous real-world applications. Despite their ubiquity, the development of intelligent decision-making agents in MAS poses several open challenges to their effective implementation. This survey examines these challenges, placing an emphasis on studying seminal concepts from game theory (GT) and machine learning (ML)
arXiv Detail & Related papers (2023-12-15T23:16:54Z)
Multimodal Explainable Artificial Intelligence: A Comprehensive Review of Methodological Advances and Future Research Directions [2.35574869517894]
This study focuses on analyzing the recent advances in the area of Multimodal XAI (MXAI) MXAI comprises methods that involve multiple modalities in the primary prediction and explanation tasks.
arXiv Detail & Related papers (2023-06-09T07:51:50Z)
Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual Recognition [57.08108545219043]
Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision. Existing literature addresses this challenge by employing local-based representation approaches. This article proposes a novel model, Robust Saliency-aware Distillation (RSaD), for few-shot fine-grained visual recognition.
arXiv Detail & Related papers (2023-05-12T00:13:17Z)
Image-text Retrieval: A Survey on Recent Research and Development [58.060687870247996]
Cross-modal image-text retrieval (ITR) has experienced increased interest in the research community due to its excellent research value and broad real-world application. This paper presents a comprehensive and up-to-date survey on the ITR approaches from four perspectives.
arXiv Detail & Related papers (2022-03-28T13:00:01Z)
Deep Co-Attention Network for Multi-View Subspace Learning [73.3450258002607]
We propose a deep co-attention network for multi-view subspace learning. It aims to extract both the common information and the complementary information in an adversarial setting. In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation.
arXiv Detail & Related papers (2021-02-15T18:46:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.