Federated Learning in Big Model Era: Domain-Specific Multimodal Large
Models
- URL: http://arxiv.org/abs/2308.11217v3
- Date: Thu, 24 Aug 2023 06:24:13 GMT
- Title: Federated Learning in Big Model Era: Domain-Specific Multimodal Large
Models
- Authors: Zengxiang Li and Zhaoxiang Hou and Hui Liu and Ying Wang and Tongzhi
Li and Longfei Xie and Chao Shi and Chengyi Yang and Weishan Zhang and Zelei
Liu and Liang Xu
- Abstract summary: Multimodal data, which can comprehensively perceive and recognize the physical world, has become an essential path towards general artificial intelligence.
This paper proposes a multimodal federated learning framework that enables multiple enterprises to utilize private domain data to train large models for vertical domains.
Preliminary experiments show that enterprises can enhance and accumulate intelligent capabilities through multimodal model federated learning.
- Score: 15.86733931081981
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal data, which can comprehensively perceive and recognize the
physical world, has become an essential path towards general artificial
intelligence. However, multimodal large models trained on public datasets often
underperform in specific industrial domains. This paper proposes a multimodal
federated learning framework that enables multiple enterprises to utilize
private domain data to collaboratively train large models for vertical domains,
achieving intelligent services across scenarios. The authors discuss in-depth
the strategic transformation of federated learning in terms of intelligence
foundation and objectives in the era of big model, as well as the new
challenges faced in heterogeneous data, model aggregation, performance and cost
trade-off, data privacy, and incentive mechanism. The paper elaborates a case
study of leading enterprises contributing multimodal data and expert knowledge
to city safety operation management , including distributed deployment and
efficient coordination of the federated learning platform, technical
innovations on data quality improvement based on large model capabilities and
efficient joint fine-tuning approaches. Preliminary experiments show that
enterprises can enhance and accumulate intelligent capabilities through
multimodal model federated learning, thereby jointly creating an smart city
model that provides high-quality intelligent services covering energy
infrastructure safety, residential community security, and urban operation
management. The established federated learning cooperation ecosystem is
expected to further aggregate industry, academia, and research resources,
realize large models in multiple vertical domains, and promote the large-scale
industrial application of artificial intelligence and cutting-edge research on
multimodal federated learning.
Related papers
- Towards Harnessing the Collaborative Power of Large and Small Models for Domain Tasks [20.370633539861746]
Large language models (LLMs) have demonstrated remarkable capabilities, but they require vast amounts of data and computational resources.
In contrast, smaller models (SMs) can be more efficient and tailored to specific domains.
arXiv Detail & Related papers (2025-04-24T10:24:35Z) - Towards deployment-centric multimodal AI beyond vision and language [67.02589156099391]
We advocate a deployment-centric workflow that incorporates deployment constraints early to reduce the likelihood of undeployable solutions.
We identify common multimodal-AI-specific challenges shared across disciplines and examine three real-world use cases.
By fostering multidisciplinary dialogue and open research practices, our community can accelerate deployment-centric development for broad societal impact.
arXiv Detail & Related papers (2025-04-04T17:20:05Z) - Big Cooperative Learning [7.958840888809145]
We show that the training of foundation models can be interpreted as a form of big cooperative learning.
We propose the BigLearn-GAN, which is a novel adversarially-trained foundation model with versatile data sampling capabilities.
arXiv Detail & Related papers (2024-07-31T03:59:14Z) - Generative AI like ChatGPT in Blockchain Federated Learning: use cases, opportunities and future [4.497001527881303]
This research explores potential integrations of generative AI in federated learning.
generative adversarial networks (GANs) and variational autoencoders (VAEs)
Generating synthetic data helps federated learning address challenges related to limited data availability.
arXiv Detail & Related papers (2024-07-25T19:43:49Z) - HEMM: Holistic Evaluation of Multimodal Foundation Models [91.60364024897653]
Multimodal foundation models can holistically process text alongside images, video, audio, and other sensory modalities.
It is challenging to characterize and study progress in multimodal foundation models, given the range of possible modeling decisions, tasks, and domains.
arXiv Detail & Related papers (2024-07-03T18:00:48Z) - From Efficient Multimodal Models to World Models: A Survey [28.780451336834876]
Multimodal Large Models (MLMs) are becoming a significant research focus combining powerful language models with multimodal learning.
This review explores the latest developments and challenges in large instructions, emphasizing their potential in achieving artificial general intelligence.
arXiv Detail & Related papers (2024-06-27T15:36:43Z) - PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents [58.35492519636351]
PIN format is built on three foundational principles: knowledge intensity, scalability, and support for diverse training modalities.
We present PIN-14M, an open-source dataset comprising 14 million samples derived from a diverse range of Chinese and English sources.
arXiv Detail & Related papers (2024-06-20T01:43:08Z) - An Interactive Agent Foundation Model [49.77861810045509]
We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents.
Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction.
We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare.
arXiv Detail & Related papers (2024-02-08T18:58:02Z) - Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives [56.2139730920855]
We present a systematic analysis of MM-VUFMs specifically designed for road scenes.
Our objective is to provide a comprehensive overview of common practices, referring to task-specific models, unified multi-modal models, unified multi-task models, and foundation model prompting techniques.
We provide insights into key challenges and future trends, such as closed-loop driving systems, interpretability, embodied driving agents, and world models.
arXiv Detail & Related papers (2024-02-05T12:47:09Z) - Multimodal Federated Learning via Contrastive Representation Ensemble [17.08211358391482]
Federated learning (FL) serves as a privacy-conscious alternative to centralized machine learning.
Existing FL methods all rely on model aggregation on single modality level.
We propose Contrastive Representation Ensemble and Aggregation for Multimodal FL (CreamFL)
arXiv Detail & Related papers (2023-02-17T14:17:44Z) - Heterogeneous Ensemble Knowledge Transfer for Training Large Models in
Federated Learning [22.310090483499035]
Federated learning (FL) enables edge-devices to collaboratively learn a model without disclosing their private data to a central aggregating server.
Most existing FL algorithms require models of identical architecture to be deployed across the clients and server.
We propose a novel ensemble knowledge transfer method named Fed-ET in which small models are trained on clients, and used to train a larger model at the server.
arXiv Detail & Related papers (2022-04-27T05:18:32Z) - DIME: Fine-grained Interpretations of Multimodal Models via Disentangled
Local Explanations [119.1953397679783]
We focus on advancing the state-of-the-art in interpreting multimodal models.
Our proposed approach, DIME, enables accurate and fine-grained analysis of multimodal models.
arXiv Detail & Related papers (2022-03-03T20:52:47Z) - INTERN: A New Learning Paradigm Towards General Vision [117.3343347061931]
We develop a new learning paradigm named INTERN.
By learning with supervisory signals from multiple sources in multiple stages, the model being trained will develop strong generalizability.
In most cases, our models, adapted with only 10% of the training data in the target domain, outperform the counterparts trained with the full set of data.
arXiv Detail & Related papers (2021-11-16T18:42:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.