A Discriminative Vectorial Framework for Multi-modal Feature
Representation
- URL: http://arxiv.org/abs/2103.05597v1
- Date: Tue, 9 Mar 2021 18:18:06 GMT
- Title: A Discriminative Vectorial Framework for Multi-modal Feature
Representation
- Authors: Lei Gao, and Ling Guan
- Abstract summary: A discriminative framework is proposed for multimodal feature representation in knowledge discovery.
It employs multi-modal hashing (MH) and discriminative correlation (DCM) analysis.
The framework is superior to state-of-the-art statistical machine learning (S.M.) and deep network neural (DNN) algorithms.
- Score: 19.158947368297557
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the rapid advancements of sensory and computing technology,
multi-modal data sources that represent the same pattern or phenomenon have
attracted growing attention. As a result, finding means to explore useful
information from these multi-modal data sources has quickly become a necessity.
In this paper, a discriminative vectorial framework is proposed for multi-modal
feature representation in knowledge discovery by employing multi-modal hashing
(MH) and discriminative correlation maximization (DCM) analysis. Specifically,
the proposed framework is capable of minimizing the semantic similarity among
different modalities by MH and exacting intrinsic discriminative
representations across multiple data sources by DCM analysis jointly, enabling
a novel vectorial framework of multi-modal feature representation. Moreover,
the proposed feature representation strategy is analyzed and further optimized
based on canonical and non-canonical cases, respectively. Consequently, the
generated feature representation leads to effective utilization of the input
data sources of high quality, producing improved, sometimes quite impressive,
results in various applications. The effectiveness and generality of the
proposed framework are demonstrated by utilizing classical features and deep
neural network (DNN) based features with applications to image and multimedia
analysis and recognition tasks, including data visualization, face recognition,
object recognition; cross-modal (text-image) recognition and audio emotion
recognition. Experimental results show that the proposed solutions are superior
to state-of-the-art statistical machine learning (SML) and DNN algorithms.
Related papers
- MIND: Modality-Informed Knowledge Distillation Framework for Multimodal Clinical Prediction Tasks [50.98856172702256]
We propose the Modality-INformed knowledge Distillation (MIND) framework, a multimodal model compression approach.
MIND transfers knowledge from ensembles of pre-trained deep neural networks of varying sizes into a smaller multimodal student.
We evaluate MIND on binary and multilabel clinical prediction tasks using time series data and chest X-ray images.
arXiv Detail & Related papers (2025-02-03T08:50:00Z) - GAMED: Knowledge Adaptive Multi-Experts Decoupling for Multimodal Fake News Detection [18.157900272828602]
Multimodal fake news detection often involves modelling heterogeneous data sources, such as vision and language.
This paper develops a significantly novel approach, GAMED, for multimodal modelling.
It focuses on generating distinctive and discriminative features through modal decoupling to enhance cross-modal synergies.
arXiv Detail & Related papers (2024-12-11T19:12:22Z) - Where Do We Stand with Implicit Neural Representations? A Technical and Performance Survey [16.89460694470542]
Implicit Neural Representations (INRs) have emerged as a paradigm in knowledge representation.
INRs leverage multilayer perceptrons (MLPs) to model data as continuous implicit functions.
This survey introduces a clear taxonomy that categorises them into four key areas: activation functions, position encoding, combined strategies, and network structure.
arXiv Detail & Related papers (2024-11-06T06:14:24Z) - Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations [52.34030226129628]
Binary Code Similarity Detection (BCSD) plays a crucial role in numerous fields, including vulnerability detection, malware analysis, and code reuse identification.
In this paper, we propose IRBinDiff, which mitigates compilation differences by leveraging LLVM-IR with higher-level semantic abstraction.
Our extensive experiments, conducted under varied compilation settings, demonstrate that IRBinDiff outperforms other leading BCSD methods in both One-to-one comparison and One-to-many search scenarios.
arXiv Detail & Related papers (2024-10-24T09:09:20Z) - MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications.
Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders.
We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - UniDiff: Advancing Vision-Language Models with Generative and
Discriminative Learning [86.91893533388628]
This paper presents UniDiff, a unified multi-modal model that integrates image-text contrastive learning (ITC), text-conditioned image synthesis learning (IS), and reciprocal semantic consistency modeling (RSC)
UniDiff demonstrates versatility in both multi-modal understanding and generative tasks.
arXiv Detail & Related papers (2023-06-01T15:39:38Z) - The Labeled Multiple Canonical Correlation Analysis for Information
Fusion [25.23035811685684]
We introduce a new method for multimodal information fusion and representation based on the Labeled Multiple Canonical Correlation Analysis (LMCCA)
We implement a prototype of LMCCA to demonstrate its effectiveness on handwritten digit recognition,face recognition and object recognition.
arXiv Detail & Related papers (2021-02-28T00:13:36Z) - Self-Supervised Multimodal Domino: in Search of Biomarkers for
Alzheimer's Disease [19.86082635340699]
We propose a taxonomy of all reasonable ways to organize self-supervised representation-learning algorithms.
We first evaluate models on toy multimodal MNIST datasets and then apply them to a multimodal neuroimaging dataset with Alzheimer's disease patients.
Results show that the proposed approach outperforms previous self-supervised encoder-decoder methods.
arXiv Detail & Related papers (2020-12-25T20:28:13Z) - Modality Compensation Network: Cross-Modal Adaptation for Action
Recognition [77.24983234113957]
We propose a Modality Compensation Network (MCN) to explore the relationships of different modalities.
Our model bridges data from source and auxiliary modalities by a modality adaptation block to achieve adaptive representation learning.
Experimental results reveal that MCN outperforms state-of-the-art approaches on four widely-used action recognition benchmarks.
arXiv Detail & Related papers (2020-01-31T04:51:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.