Related papers: A Discriminative Vectorial Framework for Multi-modal Feature Representation

A Discriminative Vectorial Framework for Multi-modal Feature Representation

URL: http://arxiv.org/abs/2103.05597v1
Date: Tue, 9 Mar 2021 18:18:06 GMT
Title: A Discriminative Vectorial Framework for Multi-modal Feature Representation
Authors: Lei Gao, and Ling Guan
Abstract summary: A discriminative framework is proposed for multimodal feature representation in knowledge discovery. It employs multi-modal hashing (MH) and discriminative correlation (DCM) analysis. The framework is superior to state-of-the-art statistical machine learning (S.M.) and deep network neural (DNN) algorithms.
Score: 19.158947368297557
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Due to the rapid advancements of sensory and computing technology, multi-modal data sources that represent the same pattern or phenomenon have attracted growing attention. As a result, finding means to explore useful information from these multi-modal data sources has quickly become a necessity. In this paper, a discriminative vectorial framework is proposed for multi-modal feature representation in knowledge discovery by employing multi-modal hashing (MH) and discriminative correlation maximization (DCM) analysis. Specifically, the proposed framework is capable of minimizing the semantic similarity among different modalities by MH and exacting intrinsic discriminative representations across multiple data sources by DCM analysis jointly, enabling a novel vectorial framework of multi-modal feature representation. Moreover, the proposed feature representation strategy is analyzed and further optimized based on canonical and non-canonical cases, respectively. Consequently, the generated feature representation leads to effective utilization of the input data sources of high quality, producing improved, sometimes quite impressive, results in various applications. The effectiveness and generality of the proposed framework are demonstrated by utilizing classical features and deep neural network (DNN) based features with applications to image and multimedia analysis and recognition tasks, including data visualization, face recognition, object recognition; cross-modal (text-image) recognition and audio emotion recognition. Experimental results show that the proposed solutions are superior to state-of-the-art statistical machine learning (SML) and DNN algorithms.

Related papers

MultiSensor-Home: A Wide-area Multi-modal Multi-view Dataset for Action Recognition and Transformer-based Sensor Fusion [2.7745600113170994]
We introduce the MultiSensor-Home dataset, a novel benchmark for comprehensive action recognition in home environments. We also propose the Multi-modal Multi-view Transformer-based Sensor Fusion (MultiTSF) method.
arXiv Detail & Related papers (2025-04-03T05:23:08Z)
MIND: Modality-Informed Knowledge Distillation Framework for Multimodal Clinical Prediction Tasks [50.98856172702256]
We propose the Modality-INformed knowledge Distillation (MIND) framework, a multimodal model compression approach. MIND transfers knowledge from ensembles of pre-trained deep neural networks of varying sizes into a smaller multimodal student. We evaluate MIND on binary and multilabel clinical prediction tasks using time series data and chest X-ray images.
arXiv Detail & Related papers (2025-02-03T08:50:00Z)
GAMED: Knowledge Adaptive Multi-Experts Decoupling for Multimodal Fake News Detection [18.157900272828602]
Multimodal fake news detection often involves modelling heterogeneous data sources, such as vision and language. This paper develops a significantly novel approach, GAMED, for multimodal modelling. It focuses on generating distinctive and discriminative features through modal decoupling to enhance cross-modal synergies.
arXiv Detail & Related papers (2024-12-11T19:12:22Z)
Where Do We Stand with Implicit Neural Representations? A Technical and Performance Survey [16.89460694470542]
Implicit Neural Representations (INRs) have emerged as a paradigm in knowledge representation. INRs leverage multilayer perceptrons (MLPs) to model data as continuous implicit functions. This survey introduces a clear taxonomy that categorises them into four key areas: activation functions, position encoding, combined strategies, and network structure.
arXiv Detail & Related papers (2024-11-06T06:14:24Z)
Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations [52.34030226129628]
Binary Code Similarity Detection (BCSD) plays a crucial role in numerous fields, including vulnerability detection, malware analysis, and code reuse identification. In this paper, we propose IRBinDiff, which mitigates compilation differences by leveraging LLVM-IR with higher-level semantic abstraction. Our extensive experiments, conducted under varied compilation settings, demonstrate that IRBinDiff outperforms other leading BCSD methods in both One-to-one comparison and One-to-many search scenarios.
arXiv Detail & Related papers (2024-10-24T09:09:20Z)
Online Multi-modal Root Cause Analysis [61.94987309148539]
Root Cause Analysis (RCA) is essential for pinpointing the root causes of failures in microservice systems. Existing online RCA methods handle only single-modal data overlooking, complex interactions in multi-modal systems. We introduce OCEAN, a novel online multi-modal causal structure learning method for root cause localization.
arXiv Detail & Related papers (2024-10-13T21:47:36Z)
MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications. Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders. We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z)
Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks. Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment. We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z)
UniDiff: Advancing Vision-Language Models with Generative and Discriminative Learning [86.91893533388628]
This paper presents UniDiff, a unified multi-modal model that integrates image-text contrastive learning (ITC), text-conditioned image synthesis learning (IS), and reciprocal semantic consistency modeling (RSC) UniDiff demonstrates versatility in both multi-modal understanding and generative tasks.
arXiv Detail & Related papers (2023-06-01T15:39:38Z)
How to Sense the World: Leveraging Hierarchy in Multimodal Perception for Robust Reinforcement Learning Agents [9.840104333194663]
We argue for hierarchy in the design of representation models and contribute with a novel multimodal representation model, MUSE. MUSE is the sensory representation model of deep reinforcement learning agents provided with multimodal observations in Atari games. We perform a comparative study over different designs of reinforcement learning agents, showing that MUSE allows agents to perform tasks under incomplete perceptual experience with minimal performance loss.
arXiv Detail & Related papers (2021-10-07T16:35:23Z)
The Labeled Multiple Canonical Correlation Analysis for Information Fusion [25.23035811685684]
We introduce a new method for multimodal information fusion and representation based on the Labeled Multiple Canonical Correlation Analysis (LMCCA) We implement a prototype of LMCCA to demonstrate its effectiveness on handwritten digit recognition,face recognition and object recognition.
arXiv Detail & Related papers (2021-02-28T00:13:36Z)
Self-Supervised Multimodal Domino: in Search of Biomarkers for Alzheimer's Disease [19.86082635340699]
We propose a taxonomy of all reasonable ways to organize self-supervised representation-learning algorithms. We first evaluate models on toy multimodal MNIST datasets and then apply them to a multimodal neuroimaging dataset with Alzheimer's disease patients. Results show that the proposed approach outperforms previous self-supervised encoder-decoder methods.
arXiv Detail & Related papers (2020-12-25T20:28:13Z)
Modality Compensation Network: Cross-Modal Adaptation for Action Recognition [77.24983234113957]
We propose a Modality Compensation Network (MCN) to explore the relationships of different modalities. Our model bridges data from source and auxiliary modalities by a modality adaptation block to achieve adaptive representation learning. Experimental results reveal that MCN outperforms state-of-the-art approaches on four widely-used action recognition benchmarks.
arXiv Detail & Related papers (2020-01-31T04:51:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.