"They've Stolen My GPL-Licensed Model!": Toward Standardized and Transparent Model Licensing
- URL: http://arxiv.org/abs/2412.11483v1
- Date: Mon, 16 Dec 2024 06:52:09 GMT
- Title: "They've Stolen My GPL-Licensed Model!": Toward Standardized and Transparent Model Licensing
- Authors: Moming Duan, Rui Zhao, Linshan Jiang, Nigel Shadbolt, Bingsheng He,
- Abstract summary: We develop a new vocabulary for ML workflow management and encoded license rules to enable ontological reasoning for analyzing rights granting and compliance issues.<n>Our analysis tool is built on Turtle language and Notation3 reasoning engine, envisioned as first step toward Linked Open Model Data.
- Score: 30.19362102481241
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: As model parameter sizes reach the billion-level range and their training consumes zettaFLOPs of computation, components reuse and collaborative development are become increasingly prevalent in the Machine Learning (ML) community. These components, including models, software, and datasets, may originate from various sources and be published under different licenses, which govern the use and distribution of licensed works and their derivatives. However, commonly chosen licenses, such as GPL and Apache, are software-specific and are not clearly defined or bounded in the context of model publishing. Meanwhile, the reused components may also have free-content licenses and model licenses, which pose a potential risk of license noncompliance and rights infringement within the model production workflow. In this paper, we propose addressing the above challenges along two lines: 1) For license analysis, we have developed a new vocabulary for ML workflow management and encoded license rules to enable ontological reasoning for analyzing rights granting and compliance issues. 2) For standardized model publishing, we have drafted a set of model licenses that provide flexible options to meet the diverse needs of model publishing. Our analysis tool is built on Turtle language and Notation3 reasoning engine, envisioned as a first step toward Linked Open Model Production Data. We have also encoded our proposed model licenses into rules and demonstrated the effects of GPL and other commonly used licenses in model publishing, along with the flexibility advantages of our licenses, through comparisons and experiments.
Related papers
- Open Source at a Crossroads: The Future of Licensing Driven by Monetization [11.149764135999437]
Open Source Software Licenses (OSS licenses) ensure that software can be sold or distributed as part of aggregate programs from various sources without requiring a royalty or fee.
We argue that open source is at a crossroads, with a growing need to redefine its licensing models and support communities and critical software.
arXiv Detail & Related papers (2025-03-04T17:44:01Z) - OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [70.72097493954067]
Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems.
While open-access code LLMs are increasingly approaching the performance levels of proprietary models, high-quality code LLMs remain limited.
We introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.
arXiv Detail & Related papers (2024-11-07T17:47:25Z) - LiCoEval: Evaluating LLMs on License Compliance in Code Generation [27.368667936460508]
Large Language Models (LLMs) have revolutionized code generation, leading to widespread adoption of AI coding tools by developers.
LLMs can generate license-protected code without providing the necessary license information, leading to potential intellectual property violations during software production.
This paper addresses the critical, yet underexplored, issue of license compliance in LLM-generated code.
arXiv Detail & Related papers (2024-08-05T14:09:30Z) - Catch the Butterfly: Peeking into the Terms and Conflicts among SPDX
Licenses [16.948633594354412]
Third-party libraries (TPLs) in software development has accelerated the creation of modern software.
Developers may inadvertently violate the licenses of TPLs, leading to legal issues.
There is a need for a high-quality license dataset that encompasses a broad range of mainstream licenses.
arXiv Detail & Related papers (2024-01-19T11:27:34Z) - Language Models as a Service: Overview of a New Paradigm and its
Challenges [47.75762014254756]
Some of the most powerful language models currently are proprietary systems, accessible only via (typically restrictive) web or programming.
This paper has two goals: on the one hand, we delineate how the aforementioned challenges act as impediments to the accessibility, replicability, reliability, and trustworthiness of LM interfaces.
On the other hand, it serves as a comprehensive resource for existing knowledge on current, major LM, offering a synthesized overview of the licences and capabilities their interfaces offer.
arXiv Detail & Related papers (2023-09-28T16:29:52Z) - QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models [85.02796681773447]
We propose a quantization-aware low-rank adaptation (QA-LoRA) algorithm.
The motivation lies in the imbalanced degrees of freedom of quantization and adaptation.
QA-LoRA is easily implemented with a few lines of code.
arXiv Detail & Related papers (2023-09-26T07:22:23Z) - LiSum: Open Source Software License Summarization with Multi-Task
Learning [16.521420821183995]
Open source software (OSS) licenses regulate the conditions under which users can reuse, modify, and distribute the software legally.
There exist various OSS licenses in the community, written in a formal language, which are typically long and complicated to understand.
Motivated by the user study and the fast growth of licenses in the community, we propose the first study towards automated license summarization.
arXiv Detail & Related papers (2023-09-10T16:43:51Z) - LiResolver: License Incompatibility Resolution for Open Source Software [13.28021004336228]
LiResolver is a fine-grained, scalable, and flexible tool to resolve license incompatibility issues for open source software.
Comprehensive experiments demonstrate the effectiveness of LiResolver, with 4.09% false positive (FP) rate and 0.02% false negative (FN) rate for incompatibility issue localization.
arXiv Detail & Related papers (2023-06-26T13:16:09Z) - CodeTF: One-stop Transformer Library for State-of-the-art Code LLM [72.1638273937025]
We present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence.
Our library supports a collection of pretrained Code LLM models and popular code benchmarks.
We hope CodeTF is able to bridge the gap between machine learning/generative AI and software engineering.
arXiv Detail & Related papers (2023-05-31T05:24:48Z) - Foundation Models and Fair Use [96.04664748698103]
In the U.S. and other countries, copyrighted content may be used to build foundation models without incurring liability due to the fair use doctrine.
In this work, we survey the potential risks of developing and deploying foundation models based on copyrighted content.
We discuss technical mitigations that can help foundation models stay in line with fair use.
arXiv Detail & Related papers (2023-03-28T03:58:40Z) - DIETERpy: a Python framework for The Dispatch and Investment Evaluation
Tool with Endogenous Renewables [62.997667081978825]
DIETER is an open-source power sector model designed to analyze future settings with very high shares of variable renewable energy sources.
It minimizes overall system costs, including fixed and variable costs of various generation, flexibility and sector coupling options.
We introduce DIETERpy that builds on the existing model version, written in the General Algebraic Modeling System (GAMS) and enhances it with a Python framework.
arXiv Detail & Related papers (2020-10-02T09:27:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.