Modality Influence in Multimodal Machine Learning
- URL: http://arxiv.org/abs/2306.06476v1
- Date: Sat, 10 Jun 2023 16:28:52 GMT
- Title: Modality Influence in Multimodal Machine Learning
- Authors: Abdelhamid Haouhat, Slimane Bellaouar, Attia Nehar, Hadda Cherroun
- Abstract summary: The study examines Multimodal Sentiment Analysis, Multimodal Emotion Recognition, Multimodal Hate Speech Recognition, and Multimodal Disease Detection.
The research aims to identify the most influential modality or set of modalities for each task and draw conclusions for diverse multimodal classification tasks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal Machine Learning has emerged as a prominent research direction
across various applications such as Sentiment Analysis, Emotion Recognition,
Machine Translation, Hate Speech Recognition, and Movie Genre Classification.
This approach has shown promising results by utilizing modern deep learning
architectures. Despite the achievements made, challenges remain in data
representation, alignment techniques, reasoning, generation, and quantification
within multimodal learning. Additionally, assumptions about the dominant role
of textual modality in decision-making have been made. However, limited
investigations have been conducted on the influence of different modalities in
Multimodal Machine Learning systems. This paper aims to address this gap by
studying the impact of each modality on multimodal learning tasks. The research
focuses on verifying presumptions and gaining insights into the usage of
different modalities. The main contribution of this work is the proposal of a
methodology to determine the effect of each modality on several Multimodal
Machine Learning models and datasets from various tasks. Specifically, the
study examines Multimodal Sentiment Analysis, Multimodal Emotion Recognition,
Multimodal Hate Speech Recognition, and Multimodal Disease Detection. The study
objectives include training SOTA MultiModal Machine Learning models with masked
modalities to evaluate their impact on performance. Furthermore, the research
aims to identify the most influential modality or set of modalities for each
task and draw conclusions for diverse multimodal classification tasks. By
undertaking these investigations, this research contributes to a better
understanding of the role of individual modalities in multi-modal learning and
provides valuable insights for future advancements in this field.
Related papers
- Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective [34.76568708378833]
Multimodal affective computing (MAC) has garnered increasing attention due to its broad applications in analyzing human behaviors and intentions.
This survey presents the recent trends of multimodal affective computing from NLP perspective through four hot tasks.
The goal of this survey is to explore the current landscape of multimodal affective research, identify development trends, and highlight the similarities and differences across various tasks.
arXiv Detail & Related papers (2024-09-11T16:24:06Z) - HEMM: Holistic Evaluation of Multimodal Foundation Models [91.60364024897653]
Multimodal foundation models can holistically process text alongside images, video, audio, and other sensory modalities.
It is challenging to characterize and study progress in multimodal foundation models, given the range of possible modeling decisions, tasks, and domains.
arXiv Detail & Related papers (2024-07-03T18:00:48Z) - Multi-Task Learning for Affect Analysis [0.0]
This project investigates two primary approaches: uni-task solutions and a multi-task approach to the same problems.
The project utilizes existing a neural network architecture, adapting it for multi-task learning by modifying output layers and loss functions.
The research aspires to contribute to the burgeoning field of affective computing, with applications spanning healthcare, marketing, and human-computer interaction.
arXiv Detail & Related papers (2024-06-30T12:36:37Z) - Attribution Regularization for Multimodal Paradigms [7.1262539590168705]
Multimodal machine learning can integrate information from multiple modalities to enhance learning and decision-making processes.
It is commonly observed that unimodal models outperform multimodal models, despite the latter having access to richer information.
This research project proposes a novel regularization term that encourages multimodal models to effectively utilize information from all modalities when making decisions.
arXiv Detail & Related papers (2024-04-02T23:05:56Z) - On Robustness in Multimodal Learning [75.03719000820388]
Multimodal learning is defined as learning over multiple input modalities such as video, audio, and text.
We present a multimodal robustness framework to provide a systematic analysis of common multimodal representation learning methods.
arXiv Detail & Related papers (2023-04-10T05:02:07Z) - Identifiability Results for Multimodal Contrastive Learning [72.15237484019174]
We show that it is possible to recover shared factors in a more general setup than the multi-view setting studied previously.
Our work provides a theoretical basis for multimodal representation learning and explains in which settings multimodal contrastive learning can be effective in practice.
arXiv Detail & Related papers (2023-03-16T09:14:26Z) - Multimodality Representation Learning: A Survey on Evolution,
Pretraining and Its Applications [47.501121601856795]
Multimodality Representation Learning is a technique of learning to embed information from different modalities and their correlations.
Cross-modal interaction and complementary information from different modalities are crucial for advanced models to perform any multimodal task.
This survey presents the literature on the evolution and enhancement of deep learning multimodal architectures.
arXiv Detail & Related papers (2023-02-01T11:48:34Z) - Foundations and Recent Trends in Multimodal Machine Learning:
Principles, Challenges, and Open Questions [68.6358773622615]
This paper provides an overview of the computational and theoretical foundations of multimodal machine learning.
We propose a taxonomy of 6 core technical challenges: representation, alignment, reasoning, generation, transference, and quantification.
Recent technical achievements will be presented through the lens of this taxonomy, allowing researchers to understand the similarities and differences across new approaches.
arXiv Detail & Related papers (2022-09-07T19:21:19Z) - Channel Exchanging Networks for Multimodal and Multitask Dense Image
Prediction [125.18248926508045]
We propose Channel-Exchanging-Network (CEN) which is self-adaptive, parameter-free, and more importantly, applicable for both multimodal fusion and multitask learning.
CEN dynamically exchanges channels betweenworks of different modalities.
For the application of dense image prediction, the validity of CEN is tested by four different scenarios.
arXiv Detail & Related papers (2021-12-04T05:47:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.