MAEA: Multimodal Attribution for Embodied AI
- URL: http://arxiv.org/abs/2307.13850v1
- Date: Tue, 25 Jul 2023 22:51:36 GMT
- Title: MAEA: Multimodal Attribution for Embodied AI
- Authors: Vidhi Jain, Jayant Sravan Tamarapalli, Sahiti Yerramilli, Yonatan Bisk
- Abstract summary: We present MAEA, a framework to compute global attributions per modality of any differentiable policy.
We show how attributions enable lower-level behavior analysis in EAI policies for language and visual attributions.
- Score: 18.515215371833186
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding multimodal perception for embodied AI is an open question
because such inputs may contain highly complementary as well as redundant
information for the task. A relevant direction for multimodal policies is
understanding the global trends of each modality at the fusion layer. To this
end, we disentangle the attributions for visual, language, and previous action
inputs across different policies trained on the ALFRED dataset. Attribution
analysis can be utilized to rank and group the failure scenarios, investigate
modeling and dataset biases, and critically analyze multimodal EAI policies for
robustness and user trust before deployment. We present MAEA, a framework to
compute global attributions per modality of any differentiable policy. In
addition, we show how attributions enable lower-level behavior analysis in EAI
policies for language and visual attributions.
Related papers
- Navigating Trade-offs: Policy Summarization for Multi-Objective Reinforcement Learning [10.848218400641466]
Multi-objective reinforcement learning (MORL) is used to solve problems involving multiple objectives.
We propose an approach for clustering the solution set generated by MORL.
arXiv Detail & Related papers (2024-11-07T15:26:38Z) - An Information Criterion for Controlled Disentanglement of Multimodal Data [39.601584166020274]
Multimodal representation learning seeks to relate and decompose information inherent in multiple modalities.
Disentangled Self-Supervised Learning (DisentangledSSL) is a novel self-supervised approach for learning disentangled representations.
arXiv Detail & Related papers (2024-10-31T14:57:31Z) - Knowledge-Guided Dynamic Modality Attention Fusion Framework for Multimodal Sentiment Analysis [4.344546814121446]
We propose a Knowledge-Guided Dynamic Modality Attention Fusion Framework (KuDA) for multimodal sentiment analysis.
KuDA uses sentiment knowledge to guide the model dynamically selecting the dominant modality and adjusting the contributions of each modality.
Experiments on four MSA benchmark datasets indicate that KuDA achieves state-of-the-art performance and is able to adapt to different scenarios of dominant modality.
arXiv Detail & Related papers (2024-10-06T14:10:28Z) - Data Analysis in the Era of Generative AI [56.44807642944589]
This paper explores the potential of AI-powered tools to reshape data analysis, focusing on design considerations and challenges.
We explore how the emergence of large language and multimodal models offers new opportunities to enhance various stages of data analysis workflow.
We then examine human-centered design principles that facilitate intuitive interactions, build user trust, and streamline the AI-assisted analysis workflow across multiple apps.
arXiv Detail & Related papers (2024-09-27T06:31:03Z) - Knowledge-Aware Reasoning over Multimodal Semi-structured Tables [85.24395216111462]
This study investigates whether current AI models can perform knowledge-aware reasoning on multimodal structured data.
We introduce MMTabQA, a new dataset designed for this purpose.
Our experiments highlight substantial challenges for current AI models in effectively integrating and interpreting multiple text and image inputs.
arXiv Detail & Related papers (2024-08-25T15:17:43Z) - Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models [12.841405829775852]
We introduce the modality importance score (MIS) to identify bias inVidQA benchmarks and datasets.
We also propose an innovative method using state-of-the-art MLLMs to estimate the modality importance.
Our results indicate that current models do not effectively integrate information due to modality imbalance in existing datasets.
arXiv Detail & Related papers (2024-08-22T23:32:42Z) - NativE: Multi-modal Knowledge Graph Completion in the Wild [51.80447197290866]
We propose a comprehensive framework NativE to achieve MMKGC in the wild.
NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities.
We construct a new benchmark called WildKGC with five datasets to evaluate our method.
arXiv Detail & Related papers (2024-03-28T03:04:00Z) - DIME: Fine-grained Interpretations of Multimodal Models via Disentangled
Local Explanations [119.1953397679783]
We focus on advancing the state-of-the-art in interpreting multimodal models.
Our proposed approach, DIME, enables accurate and fine-grained analysis of multimodal models.
arXiv Detail & Related papers (2022-03-03T20:52:47Z) - Transformer-based Multi-Aspect Modeling for Multi-Aspect Multi-Sentiment
Analysis [56.893393134328996]
We propose a novel Transformer-based Multi-aspect Modeling scheme (TMM), which can capture potential relations between multiple aspects and simultaneously detect the sentiment of all aspects in a sentence.
Our method achieves noticeable improvements compared with strong baselines such as BERT and RoBERTa.
arXiv Detail & Related papers (2020-11-01T11:06:31Z) - Multimodal Routing: Improving Local and Global Interpretability of
Multimodal Language Analysis [103.69656907534456]
Recent multimodal learning with strong performances on human-centric tasks are often black-box.
We propose Multimodal Routing, which adjusts weights between input modalities and output representations differently for each input sample.
arXiv Detail & Related papers (2020-04-29T13:42:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.