A Discriminative Vectorial Framework for Multi-modal Feature
Representation
- URL: http://arxiv.org/abs/2103.05597v1
- Date: Tue, 9 Mar 2021 18:18:06 GMT
- Title: A Discriminative Vectorial Framework for Multi-modal Feature
Representation
- Authors: Lei Gao, and Ling Guan
- Abstract summary: A discriminative framework is proposed for multimodal feature representation in knowledge discovery.
It employs multi-modal hashing (MH) and discriminative correlation (DCM) analysis.
The framework is superior to state-of-the-art statistical machine learning (S.M.) and deep network neural (DNN) algorithms.
- Score: 19.158947368297557
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the rapid advancements of sensory and computing technology,
multi-modal data sources that represent the same pattern or phenomenon have
attracted growing attention. As a result, finding means to explore useful
information from these multi-modal data sources has quickly become a necessity.
In this paper, a discriminative vectorial framework is proposed for multi-modal
feature representation in knowledge discovery by employing multi-modal hashing
(MH) and discriminative correlation maximization (DCM) analysis. Specifically,
the proposed framework is capable of minimizing the semantic similarity among
different modalities by MH and exacting intrinsic discriminative
representations across multiple data sources by DCM analysis jointly, enabling
a novel vectorial framework of multi-modal feature representation. Moreover,
the proposed feature representation strategy is analyzed and further optimized
based on canonical and non-canonical cases, respectively. Consequently, the
generated feature representation leads to effective utilization of the input
data sources of high quality, producing improved, sometimes quite impressive,
results in various applications. The effectiveness and generality of the
proposed framework are demonstrated by utilizing classical features and deep
neural network (DNN) based features with applications to image and multimedia
analysis and recognition tasks, including data visualization, face recognition,
object recognition; cross-modal (text-image) recognition and audio emotion
recognition. Experimental results show that the proposed solutions are superior
to state-of-the-art statistical machine learning (SML) and DNN algorithms.
Related papers
- MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications.
Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders.
We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - UniDiff: Advancing Vision-Language Models with Generative and
Discriminative Learning [86.91893533388628]
This paper presents UniDiff, a unified multi-modal model that integrates image-text contrastive learning (ITC), text-conditioned image synthesis learning (IS), and reciprocal semantic consistency modeling (RSC)
UniDiff demonstrates versatility in both multi-modal understanding and generative tasks.
arXiv Detail & Related papers (2023-06-01T15:39:38Z) - Multimodal Adversarially Learned Inference with Factorized
Discriminators [10.818838437018682]
We propose a novel approach to generative modeling of multimodal data based on generative adversarial networks.
To learn a coherent multimodal generative model, we show that it is necessary to align different encoder distributions with the joint decoder distribution simultaneously.
By taking advantage of contrastive learning through factorizing the discriminator, we train our model on unimodal data.
arXiv Detail & Related papers (2021-12-20T08:18:49Z) - How to Sense the World: Leveraging Hierarchy in Multimodal Perception
for Robust Reinforcement Learning Agents [9.840104333194663]
We argue for hierarchy in the design of representation models and contribute with a novel multimodal representation model, MUSE.
MUSE is the sensory representation model of deep reinforcement learning agents provided with multimodal observations in Atari games.
We perform a comparative study over different designs of reinforcement learning agents, showing that MUSE allows agents to perform tasks under incomplete perceptual experience with minimal performance loss.
arXiv Detail & Related papers (2021-10-07T16:35:23Z) - The Labeled Multiple Canonical Correlation Analysis for Information
Fusion [25.23035811685684]
We introduce a new method for multimodal information fusion and representation based on the Labeled Multiple Canonical Correlation Analysis (LMCCA)
We implement a prototype of LMCCA to demonstrate its effectiveness on handwritten digit recognition,face recognition and object recognition.
arXiv Detail & Related papers (2021-02-28T00:13:36Z) - Deep Co-Attention Network for Multi-View Subspace Learning [73.3450258002607]
We propose a deep co-attention network for multi-view subspace learning.
It aims to extract both the common information and the complementary information in an adversarial setting.
In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation.
arXiv Detail & Related papers (2021-02-15T18:46:44Z) - Self-Supervised Multimodal Domino: in Search of Biomarkers for
Alzheimer's Disease [19.86082635340699]
We propose a taxonomy of all reasonable ways to organize self-supervised representation-learning algorithms.
We first evaluate models on toy multimodal MNIST datasets and then apply them to a multimodal neuroimaging dataset with Alzheimer's disease patients.
Results show that the proposed approach outperforms previous self-supervised encoder-decoder methods.
arXiv Detail & Related papers (2020-12-25T20:28:13Z) - Heterogeneous Network Representation Learning: A Unified Framework with
Survey and Benchmark [57.10850350508929]
We aim to provide a unified framework to summarize and evaluate existing research on heterogeneous network embedding (HNE)
As the first contribution, we provide a generic paradigm for the systematic categorization and analysis over the merits of various existing HNE algorithms.
As the second contribution, we create four benchmark datasets with various properties regarding scale, structure, attribute/label availability, and etcfrom different sources.
As the third contribution, we create friendly interfaces for 13 popular HNE algorithms, and provide all-around comparisons among them over multiple tasks and experimental settings.
arXiv Detail & Related papers (2020-04-01T03:42:11Z) - Modality Compensation Network: Cross-Modal Adaptation for Action
Recognition [77.24983234113957]
We propose a Modality Compensation Network (MCN) to explore the relationships of different modalities.
Our model bridges data from source and auxiliary modalities by a modality adaptation block to achieve adaptive representation learning.
Experimental results reveal that MCN outperforms state-of-the-art approaches on four widely-used action recognition benchmarks.
arXiv Detail & Related papers (2020-01-31T04:51:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.