The Labeled Multiple Canonical Correlation Analysis for Information
Fusion
- URL: http://arxiv.org/abs/2103.00359v1
- Date: Sun, 28 Feb 2021 00:13:36 GMT
- Title: The Labeled Multiple Canonical Correlation Analysis for Information
Fusion
- Authors: Lei Gao, Rui Zhang, Lin Qi, Enqing Chen, and Ling Guan
- Abstract summary: We introduce a new method for multimodal information fusion and representation based on the Labeled Multiple Canonical Correlation Analysis (LMCCA)
We implement a prototype of LMCCA to demonstrate its effectiveness on handwritten digit recognition,face recognition and object recognition.
- Score: 25.23035811685684
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The objective of multimodal information fusion is to mathematically analyze
information carried in different sources and create a new representation which
will be more effectively utilized in pattern recognition and other multimedia
information processing tasks. In this paper, we introduce a new method for
multimodal information fusion and representation based on the Labeled Multiple
Canonical Correlation Analysis (LMCCA). By incorporating class label
information of the training samples,the proposed LMCCA ensures that the fused
features carry discriminative characteristics of the multimodal information
representations, and are capable of providing superior recognition performance.
We implement a prototype of LMCCA to demonstrate its effectiveness on
handwritten digit recognition,face recognition and object recognition utilizing
multiple features,bimodal human emotion recognition involving information from
both audio and visual domains. The generic nature of LMCCA allows it to take as
input features extracted by any means,including those by deep learning (DL)
methods. Experimental results show that the proposed method enhanced the
performance of both statistical machine learning (SML) methods, and methods
based on DL.
Related papers
- RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks.
Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs.
In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z) - Completed Feature Disentanglement Learning for Multimodal MRIs Analysis [36.32164729310868]
Feature disentanglement (FD)-based methods have achieved significant success in multimodal learning (MML)
We propose a novel Complete Feature Disentanglement (CFD) strategy that recovers the lost information during feature decoupling.
Specifically, the CFD strategy not only identifies modality-shared and modality-specific features, but also decouples shared features among subsets of multimodal inputs.
arXiv Detail & Related papers (2024-07-06T01:49:38Z) - Multimodal Multilabel Classification by CLIP [3.1002416427168304]
Multimodal multilabel classification (MMC) is a challenging task that aims to design a learning algorithm to handle two data sources.
We leverage a novel technique that utilise the Contrastive Language-Image Pre-training (CLIP) as the feature extractor.
arXiv Detail & Related papers (2024-06-23T15:28:07Z) - Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning [23.671999163027284]
This paper proposes a novel framework for multi-label image recognition without any training data.
It uses knowledge of pre-trained Large Language Model to learn prompts to adapt pretrained Vision-Language Model like CLIP to multilabel classification.
Our framework presents a new way to explore the synergies between multiple pre-trained models for novel category recognition.
arXiv Detail & Related papers (2024-03-02T13:43:32Z) - MLIP: Enhancing Medical Visual Representation with Divergence Encoder
and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning.
Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge.
Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv Detail & Related papers (2024-02-03T05:48:50Z) - Convolutional autoencoder-based multimodal one-class classification [80.52334952912808]
One-class classification refers to approaches of learning using data from a single class only.
We propose a deep learning one-class classification method suitable for multimodal data.
arXiv Detail & Related papers (2023-09-25T12:31:18Z) - RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation.
Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal.
We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z) - A Low-rank Matching Attention based Cross-modal Feature Fusion Method
for Conversational Emotion Recognition [56.20144064187554]
This paper develops a novel cross-modal feature fusion method for the Conversational emotion recognition (CER) task.
By setting a matching weight and calculating attention scores between modal features row by row, LMAM contains fewer parameters than the self-attention method.
We show that LMAM can be embedded into any existing state-of-the-art DL-based CER methods and help boost their performance in a plug-and-play manner.
arXiv Detail & Related papers (2023-06-16T16:02:44Z) - MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal
Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition.
Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction.
Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z) - A Discriminative Vectorial Framework for Multi-modal Feature
Representation [19.158947368297557]
A discriminative framework is proposed for multimodal feature representation in knowledge discovery.
It employs multi-modal hashing (MH) and discriminative correlation (DCM) analysis.
The framework is superior to state-of-the-art statistical machine learning (S.M.) and deep network neural (DNN) algorithms.
arXiv Detail & Related papers (2021-03-09T18:18:06Z) - Multi-view Data Visualisation via Manifold Learning [0.03222802562733786]
This manuscript proposes extensions of Student's t-distributed SNE, LLE and ISOMAP, to allow for dimensionality reduction and visualisation of multi-view data.
We show that by incorporating the low-dimensional embeddings obtained via the multi-view manifold learning approaches into the K-means algorithm, clusters of the samples are accurately identified.
arXiv Detail & Related papers (2021-01-17T19:54:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.