A Discriminative Vectorial Framework for Multi-modal Feature
Representation
- URL: http://arxiv.org/abs/2103.05597v1
- Date: Tue, 9 Mar 2021 18:18:06 GMT
- Title: A Discriminative Vectorial Framework for Multi-modal Feature
Representation
- Authors: Lei Gao, and Ling Guan
- Abstract summary: A discriminative framework is proposed for multimodal feature representation in knowledge discovery.
It employs multi-modal hashing (MH) and discriminative correlation (DCM) analysis.
The framework is superior to state-of-the-art statistical machine learning (S.M.) and deep network neural (DNN) algorithms.
- Score: 19.158947368297557
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the rapid advancements of sensory and computing technology,
multi-modal data sources that represent the same pattern or phenomenon have
attracted growing attention. As a result, finding means to explore useful
information from these multi-modal data sources has quickly become a necessity.
In this paper, a discriminative vectorial framework is proposed for multi-modal
feature representation in knowledge discovery by employing multi-modal hashing
(MH) and discriminative correlation maximization (DCM) analysis. Specifically,
the proposed framework is capable of minimizing the semantic similarity among
different modalities by MH and exacting intrinsic discriminative
representations across multiple data sources by DCM analysis jointly, enabling
a novel vectorial framework of multi-modal feature representation. Moreover,
the proposed feature representation strategy is analyzed and further optimized
based on canonical and non-canonical cases, respectively. Consequently, the
generated feature representation leads to effective utilization of the input
data sources of high quality, producing improved, sometimes quite impressive,
results in various applications. The effectiveness and generality of the
proposed framework are demonstrated by utilizing classical features and deep
neural network (DNN) based features with applications to image and multimedia
analysis and recognition tasks, including data visualization, face recognition,
object recognition; cross-modal (text-image) recognition and audio emotion
recognition. Experimental results show that the proposed solutions are superior
to state-of-the-art statistical machine learning (SML) and DNN algorithms.
Related papers
- Where Do We Stand with Implicit Neural Representations? A Technical and Performance Survey [16.89460694470542]
Implicit Neural Representations (INRs) have emerged as a paradigm in knowledge representation.
INRs leverage multilayer perceptrons (MLPs) to model data as continuous implicit functions.
This survey introduces a clear taxonomy that categorises them into four key areas: activation functions, position encoding, combined strategies, and network structure.
arXiv Detail & Related papers (2024-11-06T06:14:24Z) - Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations [52.34030226129628]
Binary Code Similarity Detection (BCSD) plays a crucial role in numerous fields, including vulnerability detection, malware analysis, and code reuse identification.
In this paper, we propose IRBinDiff, which mitigates compilation differences by leveraging LLVM-IR with higher-level semantic abstraction.
Our extensive experiments, conducted under varied compilation settings, demonstrate that IRBinDiff outperforms other leading BCSD methods in both One-to-one comparison and One-to-many search scenarios.
arXiv Detail & Related papers (2024-10-24T09:09:20Z) - Online Multi-modal Root Cause Analysis [61.94987309148539]
Root Cause Analysis (RCA) is essential for pinpointing the root causes of failures in microservice systems.
Existing online RCA methods handle only single-modal data overlooking, complex interactions in multi-modal systems.
We introduce OCEAN, a novel online multi-modal causal structure learning method for root cause localization.
arXiv Detail & Related papers (2024-10-13T21:47:36Z) - MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications.
Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders.
We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - UniDiff: Advancing Vision-Language Models with Generative and
Discriminative Learning [86.91893533388628]
This paper presents UniDiff, a unified multi-modal model that integrates image-text contrastive learning (ITC), text-conditioned image synthesis learning (IS), and reciprocal semantic consistency modeling (RSC)
UniDiff demonstrates versatility in both multi-modal understanding and generative tasks.
arXiv Detail & Related papers (2023-06-01T15:39:38Z) - How to Sense the World: Leveraging Hierarchy in Multimodal Perception
for Robust Reinforcement Learning Agents [9.840104333194663]
We argue for hierarchy in the design of representation models and contribute with a novel multimodal representation model, MUSE.
MUSE is the sensory representation model of deep reinforcement learning agents provided with multimodal observations in Atari games.
We perform a comparative study over different designs of reinforcement learning agents, showing that MUSE allows agents to perform tasks under incomplete perceptual experience with minimal performance loss.
arXiv Detail & Related papers (2021-10-07T16:35:23Z) - The Labeled Multiple Canonical Correlation Analysis for Information
Fusion [25.23035811685684]
We introduce a new method for multimodal information fusion and representation based on the Labeled Multiple Canonical Correlation Analysis (LMCCA)
We implement a prototype of LMCCA to demonstrate its effectiveness on handwritten digit recognition,face recognition and object recognition.
arXiv Detail & Related papers (2021-02-28T00:13:36Z) - Self-Supervised Multimodal Domino: in Search of Biomarkers for
Alzheimer's Disease [19.86082635340699]
We propose a taxonomy of all reasonable ways to organize self-supervised representation-learning algorithms.
We first evaluate models on toy multimodal MNIST datasets and then apply them to a multimodal neuroimaging dataset with Alzheimer's disease patients.
Results show that the proposed approach outperforms previous self-supervised encoder-decoder methods.
arXiv Detail & Related papers (2020-12-25T20:28:13Z) - Modality Compensation Network: Cross-Modal Adaptation for Action
Recognition [77.24983234113957]
We propose a Modality Compensation Network (MCN) to explore the relationships of different modalities.
Our model bridges data from source and auxiliary modalities by a modality adaptation block to achieve adaptive representation learning.
Experimental results reveal that MCN outperforms state-of-the-art approaches on four widely-used action recognition benchmarks.
arXiv Detail & Related papers (2020-01-31T04:51:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.