Catch the Butterfly: Peeking into the Terms and Conflicts among SPDX
Licenses
- URL: http://arxiv.org/abs/2401.10636v1
- Date: Fri, 19 Jan 2024 11:27:34 GMT
- Title: Catch the Butterfly: Peeking into the Terms and Conflicts among SPDX
Licenses
- Authors: Tao Liu, Chengwei Liu, Tianwei Liu, He Wang, Gaofei Wu, Yang Liu,
Yuqing Zhang
- Abstract summary: Third-party libraries (TPLs) in software development has accelerated the creation of modern software.
Developers may inadvertently violate the licenses of TPLs, leading to legal issues.
There is a need for a high-quality license dataset that encompasses a broad range of mainstream licenses.
- Score: 16.948633594354412
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The widespread adoption of third-party libraries (TPLs) in software
development has accelerated the creation of modern software. However, this
convenience comes with potential legal risks. Developers may inadvertently
violate the licenses of TPLs, leading to legal issues. While existing studies
have explored software licenses and potential incompatibilities, these studies
often focus on a limited set of licenses or rely on low-quality license data,
which may affect their conclusions. To address this gap, there is a need for a
high-quality license dataset that encompasses a broad range of mainstream
licenses to help developers navigate the complex landscape of software
licenses, avoid potential legal pitfalls, and guide solutions for managing
license compliance and compatibility in software development. To this end, we
conduct the first work to understand the mainstream software licenses based on
term granularity and obtain a high-quality dataset of 453 SPDX licenses with
well-labeled terms and conflicts. Specifically, we first conduct a differential
analysis of the mainstream platforms to understand the terms and attitudes of
each license. Next, we propose a standardized set of license terms to capture
and label existing mainstream licenses with high quality. Moreover, we include
copyleft conflicts and conclude the three major types of license conflicts
among the 453 SPDX licenses. Based on these, we carry out two empirical studies
to reveal the concerns and threats from the perspectives of both licensors and
licensees. One study provides an in-depth analysis of the similarities,
differences, and conflicts among SPDX licenses, revisits the usage and
conflicts of licenses in the NPM ecosystem, and draws conclusions that differ
from previous work. Our studies reveal some insightful findings and disclose
relevant analytical data, which set the stage for further research.
Related papers
- An Overview and Catalogue of Dependency Challenges in Open Source Software Package Registries [52.23798016734889]
This article provides a catalogue of dependency-related challenges that come with relying on OSS packages or libraries.
The catalogue is based on the scientific literature on empirical research that has been conducted to understand, quantify and overcome these challenges.
arXiv Detail & Related papers (2024-09-27T16:20:20Z) - Evaluating Copyright Takedown Methods for Language Models [100.38129820325497]
Language models (LMs) derive their capabilities from extensive training on diverse data, including potentially copyrighted material.
This paper introduces the first evaluation of the feasibility and side effects of copyright takedowns for LMs.
We examine several strategies, including adding system prompts, decoding-time filtering interventions, and unlearning approaches.
arXiv Detail & Related papers (2024-06-26T18:09:46Z) - "The Law Doesn't Work Like a Computer": Exploring Software Licensing Issues Faced by Legal Practitioners [7.323456975282423]
We conducted a survey with 30 legal practitioners and related occupations.
We identified different aspects of OSS license compliance from the perspective of legal practitioners.
We discuss the implications of our findings.
arXiv Detail & Related papers (2024-03-22T03:07:11Z) - A Dataset and Benchmark for Copyright Infringement Unlearning from Text-to-Image Diffusion Models [52.49582606341111]
Copyright law confers creators the exclusive rights to reproduce, distribute, and monetize their creative works.
Recent progress in text-to-image generation has introduced formidable challenges to copyright enforcement.
We introduce a novel pipeline that harmonizes CLIP, ChatGPT, and diffusion models to curate a dataset.
arXiv Detail & Related papers (2024-01-04T11:14:01Z) - Detecting and Fixing Violations of Modification Terms in Open Source
Licenses during Forking [4.682961105225832]
We first empirically characterize modification terms in 47 open source licenses.
Inspired by our study, we then design LiVo to automatically detect and fix violations of modification terms in open source licenses during forking.
arXiv Detail & Related papers (2023-10-12T02:37:06Z) - LiSum: Open Source Software License Summarization with Multi-Task
Learning [16.521420821183995]
Open source software (OSS) licenses regulate the conditions under which users can reuse, modify, and distribute the software legally.
There exist various OSS licenses in the community, written in a formal language, which are typically long and complicated to understand.
Motivated by the user study and the fast growth of licenses in the community, we propose the first study towards automated license summarization.
arXiv Detail & Related papers (2023-09-10T16:43:51Z) - LiResolver: License Incompatibility Resolution for Open Source Software [13.28021004336228]
LiResolver is a fine-grained, scalable, and flexible tool to resolve license incompatibility issues for open source software.
Comprehensive experiments demonstrate the effectiveness of LiResolver, with 4.09% false positive (FP) rate and 0.02% false negative (FN) rate for incompatibility issue localization.
arXiv Detail & Related papers (2023-06-26T13:16:09Z) - Foundation Models and Fair Use [96.04664748698103]
In the U.S. and other countries, copyrighted content may be used to build foundation models without incurring liability due to the fair use doctrine.
In this work, we survey the potential risks of developing and deploying foundation models based on copyrighted content.
We discuss technical mitigations that can help foundation models stay in line with fair use.
arXiv Detail & Related papers (2023-03-28T03:58:40Z) - Lessons from Formally Verified Deployed Software Systems (Extended version) [65.69802414600832]
This article examines a range of projects, in various application areas, that have produced formally verified systems and deployed them for actual use.
It considers the technologies used, the form of verification applied, the results obtained, and the lessons that the software industry should draw regarding its ability to benefit from formal verification techniques and tools.
arXiv Detail & Related papers (2023-01-05T18:18:46Z) - Can I use this publicly available dataset to build commercial AI
software? Most likely not [8.853674186565934]
We propose a new approach to assess the potential license compliance violations if a given publicly available dataset were to be used for building commercial AI software.
Our results show that there are risks of license violations on 5 of these 6 studied datasets if they were used for commercial purposes.
arXiv Detail & Related papers (2021-11-03T17:44:06Z) - The Problem of Zombie Datasets:A Framework For Deprecating Datasets [55.878249096379804]
We examine the public afterlives of several prominent datasets, including ImageNet, 80 Million Tiny Images, MS-Celeb-1M, Duke MTMC, Brainwash, and HRT Transgender.
We propose a dataset deprecation framework that includes considerations of risk, mitigation of impact, appeal mechanisms, timeline, post-deprecation protocol, and publication checks.
arXiv Detail & Related papers (2021-10-18T20:13:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.