Bidirectional Contrastive Split Learning for Visual Question Answering
- URL: http://arxiv.org/abs/2208.11435v4
- Date: Mon, 11 Dec 2023 14:39:56 GMT
- Title: Bidirectional Contrastive Split Learning for Visual Question Answering
- Authors: Yuwei Sun, Hideya Ochiai
- Abstract summary: Visual Question Answering (VQA) based on multi-modal data facilitates real-life applications such as home robots and medical diagnoses.
One challenge is to devise a robust decentralized learning framework for various client models.
We propose Bidirectional Contrastive Split Learning (BiCSL) to train a global multi-modal model on the entire data distribution of decentralized clients.
- Score: 6.135215040323833
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual Question Answering (VQA) based on multi-modal data facilitates
real-life applications such as home robots and medical diagnoses. One
significant challenge is to devise a robust decentralized learning framework
for various client models where centralized data collection is refrained due to
confidentiality concerns. This work aims to tackle privacy-preserving VQA by
decoupling a multi-modal model into representation modules and a contrastive
module and leveraging inter-module gradients sharing and inter-client weight
sharing. To this end, we propose Bidirectional Contrastive Split Learning
(BiCSL) to train a global multi-modal model on the entire data distribution of
decentralized clients. We employ the contrastive loss that enables a more
efficient self-supervised learning of decentralized modules. Comprehensive
experiments are conducted on the VQA-v2 dataset based on five SOTA VQA models,
demonstrating the effectiveness of the proposed method. Furthermore, we inspect
BiCSL's robustness against a dual-key backdoor attack on VQA. Consequently,
BiCSL shows much better robustness to the multi-modal adversarial attack
compared to the centralized learning method, which provides a promising
approach to decentralized multi-modal learning.
Related papers
- Leveraging Foundation Models for Multi-modal Federated Learning with Incomplete Modality [41.79433449873368]
We propose a novel multi-modal federated learning method, Federated Multi-modal contrastiVe training with Pre-trained completion (FedMVP)
FedMVP integrates the large-scale pre-trained models to enhance the federated training.
We demonstrate that the model achieves superior performance over two real-world image-text classification datasets.
arXiv Detail & Related papers (2024-06-16T19:18:06Z) - Enhancing Information Maximization with Distance-Aware Contrastive
Learning for Source-Free Cross-Domain Few-Shot Learning [55.715623885418815]
Cross-Domain Few-Shot Learning methods require access to source domain data to train a model in the pre-training phase.
Due to increasing concerns about data privacy and the desire to reduce data transmission and training costs, it is necessary to develop a CDFSL solution without accessing source data.
This paper proposes an Enhanced Information Maximization with Distance-Aware Contrastive Learning method to address these challenges.
arXiv Detail & Related papers (2024-03-04T12:10:24Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - Unimodal Training-Multimodal Prediction: Cross-modal Federated Learning
with Hierarchical Aggregation [16.308470947384134]
HA-Fedformer is a novel transformer-based model that empowers unimodal training with only a unimodal dataset at the client.
We develop an uncertainty-aware aggregation method for the local encoders with layer-wise Markov Chain Monte Carlo sampling.
Our experiments on popular sentiment analysis benchmarks, CMU-MOSI and CMU-MOSEI, demonstrate that HA-Fedformer significantly outperforms state-of-the-art multimodal models.
arXiv Detail & Related papers (2023-03-27T07:07:33Z) - Multimodal Federated Learning via Contrastive Representation Ensemble [17.08211358391482]
Federated learning (FL) serves as a privacy-conscious alternative to centralized machine learning.
Existing FL methods all rely on model aggregation on single modality level.
We propose Contrastive Representation Ensemble and Aggregation for Multimodal FL (CreamFL)
arXiv Detail & Related papers (2023-02-17T14:17:44Z) - USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text
Retrieval [115.28586222748478]
Image-Text Retrieval (ITR) aims at searching for the target instances that are semantically relevant to the given query from the other modality.
Existing approaches typically suffer from two major limitations.
arXiv Detail & Related papers (2023-01-17T12:42:58Z) - Scalable Collaborative Learning via Representation Sharing [53.047460465980144]
Federated learning (FL) and Split Learning (SL) are two frameworks that enable collaborative learning while keeping the data private (on device)
In FL, each data holder trains a model locally and releases it to a central server for aggregation.
In SL, the clients must release individual cut-layer activations (smashed data) to the server and wait for its response (during both inference and back propagation).
In this work, we present a novel approach for privacy-preserving machine learning, where the clients collaborate via online knowledge distillation using a contrastive loss.
arXiv Detail & Related papers (2022-11-20T10:49:22Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - RobustFed: A Truth Inference Approach for Robust Federated Learning [9.316565110931743]
Federated learning is a framework that enables clients to train a collaboratively global model under a central server's orchestration.
The aggregation step in federated learning is vulnerable to adversarial attacks as the central server cannot manage clients' behavior.
We propose a novel robust aggregation algorithm inspired by the truth inference methods in crowdsourcing.
arXiv Detail & Related papers (2021-07-18T09:34:57Z) - Decentralized Federated Learning via Mutual Knowledge Transfer [37.5341683644709]
Decentralized federated learning (DFL) is a problem in the Internet of things (IoT) systems.
We propose a mutual knowledge transfer (Def-KT) algorithm where local clients fuse models by transferring their learnt knowledge to each other.
Our experiments on the MNIST, Fashion-MNIST, and CIFAR10 datasets reveal datasets that the proposed Def-KT algorithm significantly outperforms the baseline DFL methods.
arXiv Detail & Related papers (2020-12-24T01:43:53Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.