Hybrid RAG-empowered Multi-modal LLM for Secure Data Management in Internet of Medical Things: A Diffusion-based Contract Approach
- URL: http://arxiv.org/abs/2407.00978v2
- Date: Mon, 09 Dec 2024 02:28:01 GMT
- Title: Hybrid RAG-empowered Multi-modal LLM for Secure Data Management in Internet of Medical Things: A Diffusion-based Contract Approach
- Authors: Cheng Su, Jinbo Wen, Jiawen Kang, Yonghua Wang, Yuanjia Su, Hudan Pan, Zishao Zhong, M. Shamim Hossain,
- Abstract summary: We propose a hybrid Retrieval-Augmented Generation (RAG)-empowered medical MLLM framework for healthcare data management.
We employ age of information to indirectly evaluate the data freshness impact of MLLMs and utilize contract theory to incentivize healthcare data holders to share their fresh data.
- Score: 13.952971893313231
- License:
- Abstract: Secure data management and effective data sharing have become paramount in the rapidly evolving healthcare landscape, especially with the growing integration of the Internet of Medical Things (IoMT). The rise of generative artificial intelligence has further elevated Multi-modal Large Language Models (MLLMs) as essential tools for managing and optimizing healthcare data in IoMT. MLLMs can support multi-modal inputs and generate diverse types of content by leveraging large-scale training on vast amounts of multi-modal data. However, critical challenges persist in developing medical MLLMs, including security and freshness issues of healthcare data, affecting the output quality of MLLMs. To this end, in this paper, we propose a hybrid Retrieval-Augmented Generation (RAG)-empowered medical MLLM framework for healthcare data management. This framework leverages a hierarchical cross-chain architecture to facilitate secure data training. Moreover, it enhances the output quality of MLLMs through hybrid RAG, which employs multi-modal metrics to filter various unimodal RAG results and incorporates these retrieval results as additional inputs to MLLMs. Additionally, we employ age of information to indirectly evaluate the data freshness impact of MLLMs and utilize contract theory to incentivize healthcare data holders to share their fresh data, mitigating information asymmetry during data sharing. Finally, we utilize a generative diffusion model-based deep reinforcement learning algorithm to identify the optimal contract for efficient data sharing. Numerical results demonstrate the effectiveness of the proposed schemes, which achieve secure and efficient healthcare data management.
Related papers
- Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets.
The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method.
The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z) - Mitigating Hallucinations of Large Language Models in Medical Information Extraction via Contrastive Decoding [92.32881381717594]
We introduce ALternate Contrastive Decoding (ALCD) to solve hallucination issues in medical information extraction tasks.
ALCD demonstrates significant improvements in resolving hallucination issues compared to conventional decoding methods.
arXiv Detail & Related papers (2024-10-21T07:19:19Z) - Detecting Training Data of Large Language Models via Expectation Maximization [62.28028046993391]
Membership inference attacks (MIAs) aim to determine whether a specific instance was part of a target model's training data.
Applying MIAs to large language models (LLMs) presents unique challenges due to the massive scale of pre-training data and the ambiguous nature of membership.
We introduce EM-MIA, a novel MIA method for LLMs that iteratively refines membership scores and prefix scores via an expectation-maximization algorithm.
arXiv Detail & Related papers (2024-10-10T03:31:16Z) - MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct [148.39859547619156]
We propose MMEvol, a novel multimodal instruction data evolution framework.
MMEvol iteratively improves data quality through a refined combination of fine-grained perception, cognitive reasoning, and interaction evolution.
Our approach reaches state-of-the-art (SOTA) performance in nine tasks using significantly less data compared to state-of-the-art models.
arXiv Detail & Related papers (2024-09-09T17:44:00Z) - Dynamic Fog Computing for Enhanced LLM Execution in Medical Applications [1.0500536774309863]
Large language models (LLMs) can transform, interpret, and comprehend vast quantities of heterogeneous data.
The sensitive nature of protected health information (PHI) raises valid concerns about data privacy and trust in remote LLM platforms.
We propose a shift in the LLM execution environment from opaque, centralized cloud providers to a decentralized and dynamic fog computing architecture.
arXiv Detail & Related papers (2024-08-08T04:49:21Z) - The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective [53.48484062444108]
We find that the development of models and data is not two separate paths but rather interconnected.
On the one hand, vaster and higher-quality data contribute to better performance of MLLMs; on the other hand, MLLMs can facilitate the development of data.
To promote the data-model co-development for MLLM community, we systematically review existing works related to MLLMs from the data-model co-development perspective.
arXiv Detail & Related papers (2024-07-11T15:08:11Z) - REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records
Analysis via Large Language Models [19.62552013839689]
Existing models often lack the medical context relevent to clinical tasks, prompting the incorporation of external knowledge.
We propose REALM, a Retrieval-Augmented Generation (RAG) driven framework to enhance multimodal EHR representations.
Our experiments on MIMIC-III mortality and readmission tasks showcase the superior performance of our REALM framework over baselines.
arXiv Detail & Related papers (2024-02-10T18:27:28Z) - Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in low-data regimes [57.62036621319563]
We introduce CLLM, which leverages the prior knowledge of Large Language Models (LLMs) for data augmentation in the low-data regime.
We demonstrate the superior performance of CLLM in the low-data regime compared to conventional generators.
arXiv Detail & Related papers (2023-12-19T12:34:46Z) - Multimodal LLMs for health grounded in individual-specific data [1.8473477867376036]
Foundation large language models (LLMs) have shown an impressive ability to solve tasks across a wide range of fields including health.
We take a step towards creating multimodal LLMs for health that are grounded in individual-specific data.
We show that HeLM can effectively use demographic and clinical features in addition to high-dimensional time-series data to estimate disease risk.
arXiv Detail & Related papers (2023-07-18T07:12:46Z) - Patchwork Learning: A Paradigm Towards Integrative Analysis across
Diverse Biomedical Data Sources [40.32772510980854]
"patchwork learning" (PL) is a paradigm that integrates information from disparate datasets composed of different data modalities.
PL allows the simultaneous utilization of complementary data sources while preserving data privacy.
We present the concept of patchwork learning and its current implementations in healthcare, exploring the potential opportunities and applicable data sources.
arXiv Detail & Related papers (2023-05-10T14:50:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.