Who Owns the Output? Bridging Law and Technology in LLMs Attribution
- URL: http://arxiv.org/abs/2504.01032v1
- Date: Sat, 29 Mar 2025 18:08:04 GMT
- Title: Who Owns the Output? Bridging Law and Technology in LLMs Attribution
- Authors: Emanuele Mezzi, Asimina Mertzani, Michael P. Manis, Siyanna Lilova, Nicholas Vadivoulis, Stamatis Gatirdakis, Styliani Roussou, Rodayna Hmede,
- Abstract summary: Large language models (LLMs) and Large Multimodal Models (LMM) have transformed content creation.<n>The chances offered by generative AI models are endless and are drastically reducing the time required to generate content.<n>However, considering the complexity and the difficult traceability of the generated content, the use of these tools provides challenges in attributing AI-generated content.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Since the introduction of ChatGPT in 2022, Large language models (LLMs) and Large Multimodal Models (LMM) have transformed content creation, enabling the generation of human-quality content, spanning every medium, text, images, videos, and audio. The chances offered by generative AI models are endless and are drastically reducing the time required to generate content and usually raising the quality of the generation. However, considering the complexity and the difficult traceability of the generated content, the use of these tools provides challenges in attributing AI-generated content. The difficult attribution resides for a variety of reasons, starting from the lack of a systematic fingerprinting of the generated content and ending with the enormous amount of data on which LLMs and LMM are trained, which makes it difficult to connect generated content to the training data. This scenario is raising concerns about intellectual property and ethical responsibilities. To address these concerns, in this paper, we bridge the technological, ethical, and legislative aspects, by proposing a review of the legislative and technological instruments today available and proposing a legal framework to ensure accountability. In the end, we propose three use cases of how these can be combined to guarantee that attribution is respected. However, even though the techniques available today can guarantee a greater attribution to a greater extent, strong limitations still apply, that can be solved uniquely by the development of new attribution techniques, to be applied to LLMs and LMMs.
Related papers
- Information Retrieval in the Age of Generative AI: The RGB Model [77.96475639967431]
This paper presents a novel quantitative approach to shed light on the complex information dynamics arising from the growing use of generative AI tools.
We propose a model to characterize the generation, indexing, and dissemination of information in response to new topics.
Our findings suggest that the rapid pace of generative AI adoption, combined with increasing user reliance, can outpace human verification, escalating the risk of inaccurate information proliferation.
arXiv Detail & Related papers (2025-04-29T10:21:40Z) - Content ARCs: Decentralized Content Rights in the Age of Generative AI [14.208062688463524]
This paper proposes a framework called emphContent ARCs (Authenticity, Rights, Compensation)<n>By combining open standards for provenance and dynamic licensing with data attribution, Content ARCs create a mechanism for managing rights and compensating creators for using their work in AI training.<n>We characterize several nascent works in the AI data licensing space within Content ARCs and identify where challenges remain to fully implement the end-to-end framework.
arXiv Detail & Related papers (2025-03-14T11:57:08Z) - Towards Best Practices for Open Datasets for LLM Training [21.448011162803866]
Many AI companies are training their large language models (LLMs) on data without the permission of the copyright owners.
Creative producers have led to several high-profile copyright lawsuits.
This trend in limiting data information causes harm by hindering transparency, accountability, and innovation.
arXiv Detail & Related papers (2025-01-14T17:18:05Z) - LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models [55.903148392998965]
We introduce LOKI, a novel benchmark designed to evaluate the ability of LMMs to detect synthetic data across multiple modalities.
The benchmark includes coarse-grained judgment and multiple-choice questions, as well as fine-grained anomaly selection and explanation tasks.
We evaluate 22 open-source LMMs and 6 closed-source models on LOKI, highlighting their potential as synthetic data detectors and also revealing some limitations in the development of LMM capabilities.
arXiv Detail & Related papers (2024-10-13T05:26:36Z) - Unlearning Targeted Information via Single Layer Unlearning Gradient [15.374381635334897]
Unauthorized privacy-related computation is a significant concern for society.
The EU's General Protection Regulation includes a "right to be forgotten"
We propose Single Layer Unlearning Gradient (SLUG) to unlearn targeted information by updating targeted layers of a model.
arXiv Detail & Related papers (2024-07-16T15:52:36Z) - Data Shapley in One Training Run [88.59484417202454]
Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts.
Existing approaches require re-training models on different data subsets, which is computationally intensive.
This paper introduces In-Run Data Shapley, which addresses these limitations by offering scalable data attribution for a target model of interest.
arXiv Detail & Related papers (2024-06-16T17:09:24Z) - An Economic Solution to Copyright Challenges of Generative AI [35.37023083413299]
Generative artificial intelligence systems are trained to generate new pieces of text, images, videos, and other media.
There is growing concern that such systems may infringe on the copyright interests of training data contributors.
We propose a framework that compensates copyright owners proportionally to their contributions to the creation of AI-generated content.
arXiv Detail & Related papers (2024-04-22T08:10:38Z) - Copyright Protection in Generative AI: A Technical Perspective [58.84343394349887]
Generative AI has witnessed rapid advancement in recent years, expanding their capabilities to create synthesized content such as text, images, audio, and code.
The high fidelity and authenticity of contents generated by these Deep Generative Models (DGMs) have sparked significant copyright concerns.
This work delves into this issue by providing a comprehensive overview of copyright protection from a technical perspective.
arXiv Detail & Related papers (2024-02-04T04:00:33Z) - Viz: A QLoRA-based Copyright Marketplace for Legally Compliant
Generative AI [0.0]
Viz is a novel system architecture that integrates Quantized Low-Rank Adapters (QLoRA) to fine-tune large language models (LLM)
Viz represents a significant contribution to the field of artificial intelligence.
arXiv Detail & Related papers (2023-12-31T13:53:06Z) - A Survey on Detection of LLMs-Generated Content [97.87912800179531]
The ability to detect LLMs-generated content has become of paramount importance.
We aim to provide a detailed overview of existing detection strategies and benchmarks.
We also posit the necessity for a multi-faceted approach to defend against various attacks.
arXiv Detail & Related papers (2023-10-24T09:10:26Z) - Semantic Communications for Artificial Intelligence Generated Content
(AIGC) Toward Effective Content Creation [75.73229320559996]
This paper develops a conceptual model for the integration of AIGC and SemCom.
A novel framework that employs AIGC technology is proposed as an encoder and decoder for semantic information.
The framework can adapt to different types of content generated, the required quality, and the semantic information utilized.
arXiv Detail & Related papers (2023-08-09T13:17:21Z) - A Comprehensive Survey of AI-Generated Content (AIGC): A History of
Generative AI from GAN to ChatGPT [63.58711128819828]
ChatGPT and other Generative AI (GAI) techniques belong to the category of Artificial Intelligence Generated Content (AIGC)
The goal of AIGC is to make the content creation process more efficient and accessible, allowing for the production of high-quality content at a faster pace.
arXiv Detail & Related papers (2023-03-07T20:36:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.