Is Your AI Truly Yours? Leveraging Blockchain for Copyrights, Provenance, and Lineage
- URL: http://arxiv.org/abs/2404.06077v2
- Date: Mon, 07 Jul 2025 07:28:50 GMT
- Title: Is Your AI Truly Yours? Leveraging Blockchain for Copyrights, Provenance, and Lineage
- Authors: Qin Wang, Guangsheng Yu, Yilin Sai, H. M. N. Dilum Bandara, Shiping Chen,
- Abstract summary: textscIBis is a blockchain-based framework tailored for AI model training.<n>textscIBis integrates on-chain registries for datasets, licenses and models, alongside off-chain signing services.<n>We implement textscIBis using Daml on the Canton blockchain.
- Score: 3.114654787133255
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As Artificial Intelligence (AI) integrates into diverse areas, particularly in content generation, ensuring rightful ownership and ethical use becomes paramount, AI service providers are expected to prioritize responsibly sourcing training data and obtaining licenses from data owners. However, existing studies primarily center on safeguarding static copyrights, which simply treat metadata/datasets as non-fungible items with transferable/trading capabilities, neglecting the dynamic nature of training procedures that can shape an ongoing trajectory. In this paper, we present \textsc{IBis}, a blockchain-based framework tailored for AI model training workflows. Our design can dynamically manage copyright compliance and data provenance in decentralized AI model training processes, ensuring that intellectual property rights are respected throughout iterative model enhancements and licensing updates. Technically, \textsc{IBis} integrates on-chain registries for datasets, licenses and models, alongside off-chain signing services to facilitate collaboration among multiple participants. Further, \textsc{IBis} provides APIs designed for seamless integration with existing contract management software, minimizing disruptions to established model training processes. We implement \textsc{IBis} using Daml on the Canton blockchain. Evaluation results showcase the feasibility and scalability of \textsc{IBis} across varying numbers of users, datasets, models, and licenses.
Related papers
- TD-Suite: All Batteries Included Framework for Technical Debt Classification [5.669063174637433]
TD-Suite provides a seamless end-to-end pipeline, managing everything from initial data ingestion to model training.
To ensure the generated models are robust and perform reliably on real-world, often imbalanced, datasets, TD-Suite incorporates critical training methodologies.
The framework integrates tracking and reporting of carbon emissions associated with the computationally intensive model training process.
arXiv Detail & Related papers (2025-04-15T11:31:17Z) - Towards Human-Guided, Data-Centric LLM Co-Pilots [53.35493881390917]
CliMB-DC is a human-guided, data-centric framework for machine learning co-pilots.<n>It combines advanced data-centric tools with LLM-driven reasoning to enable robust, context-aware data processing.<n>We show how CliMB-DC can transform uncurated datasets into ML-ready formats.
arXiv Detail & Related papers (2025-01-17T17:51:22Z) - MetaTrading: An Immersion-Aware Model Trading Framework for Vehicular Metaverse Services [94.61039892220037]
We present a novel immersion-aware model trading framework that incentivizes metaverse users (MUs) to contribute learning models for augmented reality (AR) services in the vehicular metaverse.
Considering dynamic network conditions and privacy concerns, we formulate the reward decisions of MSPs as a multi-agent Markov decision process.
Experimental results demonstrate that the proposed framework can effectively provide higher-value models for object detection and classification in AR services on real AR-related vehicle datasets.
arXiv Detail & Related papers (2024-10-25T16:20:46Z) - KModels: Unlocking AI for Business Applications [10.833754921830154]
This paper presents the architecture of KModels and the key decisions that shape it.
KModels enables AI consumers to eliminate the need for a dedicated data scientist.
It is highly suited for on-premise deployment but can also be used in cloud environments.
arXiv Detail & Related papers (2024-09-08T13:19:12Z) - An Economic Solution to Copyright Challenges of Generative AI [35.37023083413299]
Generative artificial intelligence systems are trained to generate new pieces of text, images, videos, and other media.
There is growing concern that such systems may infringe on the copyright interests of training data contributors.
We propose a framework that compensates copyright owners proportionally to their contributions to the creation of AI-generated content.
arXiv Detail & Related papers (2024-04-22T08:10:38Z) - AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning [98.26836657967162]
textbfAgentOhana aggregates agent trajectories from distinct environments, spanning a wide array of scenarios.
textbfxLAM-v0.1, a large action model tailored for AI agents, demonstrates exceptional performance across various benchmarks.
arXiv Detail & Related papers (2024-02-23T18:56:26Z) - Blockchain-enabled Trustworthy Federated Unlearning [50.01101423318312]
Federated unlearning is a promising paradigm for protecting the data ownership of distributed clients.
Existing works require central servers to retain the historical model parameters from distributed clients.
This paper proposes a new blockchain-enabled trustworthy federated unlearning framework.
arXiv Detail & Related papers (2024-01-29T07:04:48Z) - DECORAIT -- DECentralized Opt-in/out Registry for AI Training [20.683704089165406]
We present DECORAIT; a decentralized registry through which content creators may assert their right to opt in or out of AI training.
GenAI enables images to be synthesized using AI models trained on vast amounts of data scraped from public sources.
arXiv Detail & Related papers (2023-09-25T16:19:35Z) - CTP: Towards Vision-Language Continual Pretraining via Compatible
Momentum Contrast and Topology Preservation [128.00940554196976]
Vision-Language Continual Pretraining (VLCP) has shown impressive results on diverse downstream tasks by offline training on large-scale datasets.
To support the study of Vision-Language Continual Pretraining (VLCP), we first contribute a comprehensive and unified benchmark dataset P9D.
The data from each industry as an independent task supports continual learning and conforms to the real-world long-tail nature to simulate pretraining on web data.
arXiv Detail & Related papers (2023-08-14T13:53:18Z) - Federated Learning-Empowered AI-Generated Content in Wireless Networks [58.48381827268331]
Federated learning (FL) can be leveraged to improve learning efficiency and achieve privacy protection for AIGC.
We present FL-based techniques for empowering AIGC, and aim to enable users to generate diverse, personalized, and high-quality content.
arXiv Detail & Related papers (2023-07-14T04:13:11Z) - Scalable Collaborative Learning via Representation Sharing [53.047460465980144]
Federated learning (FL) and Split Learning (SL) are two frameworks that enable collaborative learning while keeping the data private (on device)
In FL, each data holder trains a model locally and releases it to a central server for aggregation.
In SL, the clients must release individual cut-layer activations (smashed data) to the server and wait for its response (during both inference and back propagation).
In this work, we present a novel approach for privacy-preserving machine learning, where the clients collaborate via online knowledge distillation using a contrastive loss.
arXiv Detail & Related papers (2022-11-20T10:49:22Z) - Privacy-Preserving Machine Learning for Collaborative Data Sharing via
Auto-encoder Latent Space Embeddings [57.45332961252628]
Privacy-preserving machine learning in data-sharing processes is an ever-critical task.
This paper presents an innovative framework that uses Representation Learning via autoencoders to generate privacy-preserving embedded data.
arXiv Detail & Related papers (2022-11-10T17:36:58Z) - APPFLChain: A Privacy Protection Distributed Artificial-Intelligence
Architecture Based on Federated Learning and Consortium Blockchain [6.054775780656853]
We propose a new system architecture called APPFLChain.
It is an integrated architecture of a Hyperledger Fabric-based blockchain and a federated-learning paradigm.
Our new system can maintain a high degree of security and privacy as users do not need to share sensitive personal information to the server.
arXiv Detail & Related papers (2022-06-26T05:30:07Z) - Leveraging Centric Data Federated Learning Using Blockchain For
Integrity Assurance [14.347917009290814]
We propose a data-centric federated learning architecture leveraged by a public blockchain and smart contracts.
Our proposed solution provides a virtual public marketplace where developers, data scientists, and AI-engineer can publish their models.
We enhance data quality and integrity through an incentive mechanism that rewards contributors for data contribution and verification.
arXiv Detail & Related papers (2022-06-09T19:06:05Z) - Decentralized Federated Learning Preserves Model and Data Privacy [77.454688257702]
We propose a fully decentralized approach, which allows to share knowledge between trained models.
Students are trained on the output of their teachers via synthetically generated input data.
The results show that an untrained student model, trained on the teachers output reaches comparable F1-scores as the teacher.
arXiv Detail & Related papers (2021-02-01T14:38:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.