Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and
Fusion
- URL: http://arxiv.org/abs/2006.08159v1
- Date: Mon, 15 Jun 2020 06:42:04 GMT
- Title: Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and
Fusion
- Authors: Yang Wang
- Abstract summary: Multi-modal or multi-view data has surged as a major stream for big data, where each modal/view encodes individual property of data objects.
Most of the existing state-of-the-art focused on how to fuse the energy or information from multi-modal spaces to deliver a superior performance.
Deep neural networks have exhibited as a powerful architecture to well capture the nonlinear distribution of high-dimensional multimedia data.
- Score: 6.225190099424806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the development of web technology, multi-modal or multi-view data has
surged as a major stream for big data, where each modal/view encodes individual
property of data objects. Often, different modalities are complementary to each
other. Such fact motivated a lot of research attention on fusing the
multi-modal feature spaces to comprehensively characterize the data objects.
Most of the existing state-of-the-art focused on how to fuse the energy or
information from multi-modal spaces to deliver a superior performance over
their counterparts with single modal. Recently, deep neural networks have
exhibited as a powerful architecture to well capture the nonlinear distribution
of high-dimensional multimedia data, so naturally does for multi-modal data.
Substantial empirical studies are carried out to demonstrate its advantages
that are benefited from deep multi-modal methods, which can essentially deepen
the fusion from multi-modal deep feature spaces. In this paper, we provide a
substantial overview of the existing state-of-the-arts on the filed of
multi-modal data analytics from shallow to deep spaces. Throughout this survey,
we further indicate that the critical components for this field go to
collaboration, adversarial competition and fusion over multi-modal spaces.
Finally, we share our viewpoints regarding some future directions on this
field.
Related papers
- Leveraging Entity Information for Cross-Modality Correlation Learning: The Entity-Guided Multimodal Summarization [49.08348604716746]
Multimodal Summarization with Multimodal Output (MSMO) aims to produce a multimodal summary that integrates both text and relevant images.
In this paper, we propose an Entity-Guided Multimodal Summarization model (EGMS)
Our model, building on BART, utilizes dual multimodal encoders with shared weights to process text-image and entity-image information concurrently.
arXiv Detail & Related papers (2024-08-06T12:45:56Z) - U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation [63.31007867379312]
We introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantics.
We employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features.
Experimental results demonstrate that our approach achieves superior performance across multiple datasets.
arXiv Detail & Related papers (2024-05-24T08:58:48Z) - Multimodal Fusion on Low-quality Data: A Comprehensive Survey [110.22752954128738]
This paper surveys the common challenges and recent advances of multimodal fusion in the wild.
We identify four main challenges that are faced by multimodal fusion on low-quality data.
This new taxonomy will enable researchers to understand the state of the field and identify several potential directions.
arXiv Detail & Related papers (2024-04-27T07:22:28Z) - Multimodal Large Language Models: A Survey [36.06016060015404]
Multimodal language models integrate multiple data types, such as images, text, language, audio, and other heterogeneity.
This paper begins by defining the concept of multimodal and examining the historical development of multimodal algorithms.
A practical guide is provided, offering insights into the technical aspects of multimodal models.
Lastly, we explore the applications of multimodal models and discuss the challenges associated with their development.
arXiv Detail & Related papers (2023-11-22T05:15:12Z) - Alternative Telescopic Displacement: An Efficient Multimodal Alignment Method [3.0903319879656084]
This paper introduces an innovative approach to feature alignment that revolutionizes the fusion of multimodal information.
Our method employs a novel iterative process of telescopic displacement and expansion of feature representations across different modalities, culminating in a coherent unified representation within a shared feature space.
arXiv Detail & Related papers (2023-06-29T13:49:06Z) - Deep Equilibrium Multimodal Fusion [88.04713412107947]
Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently.
We propose a novel deep equilibrium (DEQ) method towards multimodal fusion via seeking a fixed point of the dynamic multimodal fusion process.
Experiments on BRCA, MM-IMDB, CMU-MOSI, SUN RGB-D, and VQA-v2 demonstrate the superiority of our DEQ fusion.
arXiv Detail & Related papers (2023-06-29T03:02:20Z) - Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications [90.6849884683226]
We study the challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data.
Using a precise information-theoretic definition of interactions, our key contribution is the derivation of lower and upper bounds.
We show how these theoretical results can be used to estimate multimodal model performance, guide data collection, and select appropriate multimodal models for various tasks.
arXiv Detail & Related papers (2023-06-07T15:44:53Z) - Generalized Product-of-Experts for Learning Multimodal Representations
in Noisy Environments [18.14974353615421]
We propose a novel method for multimodal representation learning in a noisy environment via the generalized product of experts technique.
In the proposed method, we train a separate network for each modality to assess the credibility of information coming from that modality.
We attain state-of-the-art performance on two challenging benchmarks: multimodal 3D hand-pose estimation and multimodal surgical video segmentation.
arXiv Detail & Related papers (2022-11-07T14:27:38Z) - Deep Learning in Multimodal Remote Sensing Data Fusion: A Comprehensive
Review [33.40031994803646]
This survey aims to present a systematic overview in DL-based multimodal RS data fusion.
Sub-fields in the multimodal RS data fusion are reviewed in terms of to-be-fused data modalities.
The remaining challenges and potential future directions are highlighted.
arXiv Detail & Related papers (2022-05-03T09:08:16Z) - MISA: Modality-Invariant and -Specific Representations for Multimodal
Sentiment Analysis [48.776247141839875]
We propose a novel framework, MISA, which projects each modality to two distinct subspaces.
The first subspace is modality-invariant, where the representations across modalities learn their commonalities and reduce the modality gap.
Our experiments on popular sentiment analysis benchmarks, MOSI and MOSEI, demonstrate significant gains over state-of-the-art models.
arXiv Detail & Related papers (2020-05-07T15:13:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.