Detecting and Fixing Violations of Modification Terms in Open Source
Licenses during Forking
- URL: http://arxiv.org/abs/2310.07991v1
- Date: Thu, 12 Oct 2023 02:37:06 GMT
- Title: Detecting and Fixing Violations of Modification Terms in Open Source
Licenses during Forking
- Authors: Kaifeng Huang, Yingfeng Xia, Bihuan Chen, Zhuotong Zhou, Jin Guo, Xin
Peng
- Abstract summary: We first empirically characterize modification terms in 47 open source licenses.
Inspired by our study, we then design LiVo to automatically detect and fix violations of modification terms in open source licenses during forking.
- Score: 4.682961105225832
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Open source software brings benefit to software community, but also
introduces legal risks caused by license violations, which result in serious
consequences such as lawsuits and financial losses. To mitigate legal risks,
some approaches have been proposed to identify licenses, detect license
incompatibilities and inconsistencies, and recommend licenses. As far as we
know, however, there is no prior work to understand modification terms in open
source licenses or to detect and fix violations of modification terms. To
bridge this gap, we first empirically characterize modification terms in 47
open source licenses. These licenses all require certain forms of "notice" to
describe the modifications made to the original work. Inspired by our study, we
then design LiVo to automatically detect and fix violations of modification
terms in open source licenses during forking. Our evaluation has shown the
effectiveness and efficiency of LiVo. 18 pull requests of fixing modification
term violations have received positive responses. 8 have been merged.
Related papers
- Lost in Edits? A $λ$-Compass for AIGC Provenance [119.95562081325552]
We propose a novel latent-space attribution method that robustly identifies and differentiates authentic outputs from manipulated ones.
LambdaTracer is effective across diverse iterative editing processes, whether automated through text-guided editing tools such as InstructPix2Pix or performed manually with editing software such as Adobe Photoshop.
arXiv Detail & Related papers (2025-02-05T06:24:25Z) - PatentEdits: Framing Patent Novelty as Textual Entailment [62.8514393375952]
We introduce the PatentEdits dataset, which contains 105K examples of successful revisions.
We design algorithms to label edits sentence by sentence, then establish how well these edits can be predicted with large language models.
We demonstrate that evaluating textual entailment between cited references and draft sentences is especially effective in predicting which inventive claims remained unchanged or are novel in relation to prior art.
arXiv Detail & Related papers (2024-11-20T17:23:40Z) - OSS License Identification at Scale: A Comprehensive Dataset Using World of Code [4.954816514146113]
This study presents a reusable and comprehensive dataset of open source software (OSS) licenses.
We found and identified 5.5 million distinct license blobs in OSS projects.
The dataset is open, providing a valuable resource for developers, researchers, and legal professionals in the OSS community.
arXiv Detail & Related papers (2024-09-07T13:34:55Z) - Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? [62.72729485995075]
We investigate the effectiveness of watermarking as a deterrent against the generation of copyrighted texts.
We find that watermarking adversely affects the success rate of Membership Inference Attacks (MIAs)
We propose an adaptive technique to improve the success rate of a recent MIA under watermarking.
arXiv Detail & Related papers (2024-07-24T16:53:09Z) - On the modification and revocation of open source licences [0.14843690728081999]
This paper argues for the creation of a subset of rights that allows open source contributors to force users to update to the most recent version of a model.
Legal, reputational and moral risks related to open-sourcing AI models could justify contributors having more control over downstream uses.
arXiv Detail & Related papers (2024-05-29T00:00:25Z) - GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence [64.95492752484171]
We present GenAudit -- a tool intended to assist fact-checking LLM responses for document-grounded tasks.
GenAudit suggests edits to the LLM response by revising or removing claims that are not supported by the reference document, and also presents evidence from the reference for facts that do appear to have support.
Comprehensive evaluation by human raters shows that GenAudit can detect errors in 8 different LLM outputs when summarizing documents from diverse domains.
arXiv Detail & Related papers (2024-02-19T21:45:55Z) - Catch the Butterfly: Peeking into the Terms and Conflicts among SPDX
Licenses [16.948633594354412]
Third-party libraries (TPLs) in software development has accelerated the creation of modern software.
Developers may inadvertently violate the licenses of TPLs, leading to legal issues.
There is a need for a high-quality license dataset that encompasses a broad range of mainstream licenses.
arXiv Detail & Related papers (2024-01-19T11:27:34Z) - LiSum: Open Source Software License Summarization with Multi-Task
Learning [16.521420821183995]
Open source software (OSS) licenses regulate the conditions under which users can reuse, modify, and distribute the software legally.
There exist various OSS licenses in the community, written in a formal language, which are typically long and complicated to understand.
Motivated by the user study and the fast growth of licenses in the community, we propose the first study towards automated license summarization.
arXiv Detail & Related papers (2023-09-10T16:43:51Z) - LiResolver: License Incompatibility Resolution for Open Source Software [13.28021004336228]
LiResolver is a fine-grained, scalable, and flexible tool to resolve license incompatibility issues for open source software.
Comprehensive experiments demonstrate the effectiveness of LiResolver, with 4.09% false positive (FP) rate and 0.02% false negative (FN) rate for incompatibility issue localization.
arXiv Detail & Related papers (2023-06-26T13:16:09Z) - FAT Forensics: A Python Toolbox for Implementing and Deploying Fairness,
Accountability and Transparency Algorithms in Predictive Systems [69.24490096929709]
We developed an open source Python package called FAT Forensics.
It can inspect important fairness, accountability and transparency aspects of predictive algorithms.
Our toolbox can evaluate all elements of a predictive pipeline.
arXiv Detail & Related papers (2022-09-08T13:25:02Z) - Synthetic Disinformation Attacks on Automated Fact Verification Systems [53.011635547834025]
We explore the sensitivity of automated fact-checkers to synthetic adversarial evidence in two simulated settings.
We show that these systems suffer significant performance drops against these attacks.
We discuss the growing threat of modern NLG systems as generators of disinformation.
arXiv Detail & Related papers (2022-02-18T19:01:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.