Detecting and Fixing Violations of Modification Terms in Open Source
Licenses during Forking
- URL: http://arxiv.org/abs/2310.07991v1
- Date: Thu, 12 Oct 2023 02:37:06 GMT
- Title: Detecting and Fixing Violations of Modification Terms in Open Source
Licenses during Forking
- Authors: Kaifeng Huang, Yingfeng Xia, Bihuan Chen, Zhuotong Zhou, Jin Guo, Xin
Peng
- Abstract summary: We first empirically characterize modification terms in 47 open source licenses.
Inspired by our study, we then design LiVo to automatically detect and fix violations of modification terms in open source licenses during forking.
- Score: 4.682961105225832
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Open source software brings benefit to software community, but also
introduces legal risks caused by license violations, which result in serious
consequences such as lawsuits and financial losses. To mitigate legal risks,
some approaches have been proposed to identify licenses, detect license
incompatibilities and inconsistencies, and recommend licenses. As far as we
know, however, there is no prior work to understand modification terms in open
source licenses or to detect and fix violations of modification terms. To
bridge this gap, we first empirically characterize modification terms in 47
open source licenses. These licenses all require certain forms of "notice" to
describe the modifications made to the original work. Inspired by our study, we
then design LiVo to automatically detect and fix violations of modification
terms in open source licenses during forking. Our evaluation has shown the
effectiveness and efficiency of LiVo. 18 pull requests of fixing modification
term violations have received positive responses. 8 have been merged.
Related papers
- PatentEdits: Framing Patent Novelty as Textual Entailment [62.8514393375952]
We introduce the PatentEdits dataset, which contains 105K examples of successful revisions.
We design algorithms to label edits sentence by sentence, then establish how well these edits can be predicted with large language models.
We demonstrate that evaluating textual entailment between cited references and draft sentences is especially effective in predicting which inventive claims remained unchanged or are novel in relation to prior art.
arXiv Detail & Related papers (2024-11-20T17:23:40Z) - OSS License Identification at Scale: A Comprehensive Dataset Using World of Code [4.954816514146113]
We employ an exhaustive approach, scanning all files containing license'' in their filepath, and apply the winnowing algorithm for robust text matching.
Our method identifies and matches over 5.5 million distinct license blobs across millions of OSS projects, creating a detailed project-to-license (P2L) map.
arXiv Detail & Related papers (2024-09-07T13:34:55Z) - Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? [62.72729485995075]
We investigate the effectiveness of watermarking as a deterrent against the generation of copyrighted texts.
We find that watermarking adversely affects the success rate of Membership Inference Attacks (MIAs)
We propose an adaptive technique to improve the success rate of a recent MIA under watermarking.
arXiv Detail & Related papers (2024-07-24T16:53:09Z) - On the modification and revocation of open source licences [0.14843690728081999]
This paper argues for the creation of a subset of rights that allows open source contributors to force users to update to the most recent version of a model.
Legal, reputational and moral risks related to open-sourcing AI models could justify contributors having more control over downstream uses.
arXiv Detail & Related papers (2024-05-29T00:00:25Z) - Private Online Community Detection for Censored Block Models [60.039026645807326]
We study the private online change detection problem for dynamic communities, using a censored block model (CBM)
We propose an algorithm capable of identifying changes in the community structure, while maintaining user privacy.
arXiv Detail & Related papers (2024-05-09T12:35:57Z) - GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence [64.95492752484171]
We present GenAudit -- a tool intended to assist fact-checking LLM responses for document-grounded tasks.
We train models to execute these tasks, and design an interactive interface to present suggested edits and evidence to users.
To ensure that most errors are flagged by the system, we propose a method that can increase the error recall while minimizing impact on precision.
arXiv Detail & Related papers (2024-02-19T21:45:55Z) - Catch the Butterfly: Peeking into the Terms and Conflicts among SPDX
Licenses [16.948633594354412]
Third-party libraries (TPLs) in software development has accelerated the creation of modern software.
Developers may inadvertently violate the licenses of TPLs, leading to legal issues.
There is a need for a high-quality license dataset that encompasses a broad range of mainstream licenses.
arXiv Detail & Related papers (2024-01-19T11:27:34Z) - LiSum: Open Source Software License Summarization with Multi-Task
Learning [16.521420821183995]
Open source software (OSS) licenses regulate the conditions under which users can reuse, modify, and distribute the software legally.
There exist various OSS licenses in the community, written in a formal language, which are typically long and complicated to understand.
Motivated by the user study and the fast growth of licenses in the community, we propose the first study towards automated license summarization.
arXiv Detail & Related papers (2023-09-10T16:43:51Z) - LiResolver: License Incompatibility Resolution for Open Source Software [13.28021004336228]
LiResolver is a fine-grained, scalable, and flexible tool to resolve license incompatibility issues for open source software.
Comprehensive experiments demonstrate the effectiveness of LiResolver, with 4.09% false positive (FP) rate and 0.02% false negative (FN) rate for incompatibility issue localization.
arXiv Detail & Related papers (2023-06-26T13:16:09Z) - FAT Forensics: A Python Toolbox for Implementing and Deploying Fairness,
Accountability and Transparency Algorithms in Predictive Systems [69.24490096929709]
We developed an open source Python package called FAT Forensics.
It can inspect important fairness, accountability and transparency aspects of predictive algorithms.
Our toolbox can evaluate all elements of a predictive pipeline.
arXiv Detail & Related papers (2022-09-08T13:25:02Z) - Synthetic Disinformation Attacks on Automated Fact Verification Systems [53.011635547834025]
We explore the sensitivity of automated fact-checkers to synthetic adversarial evidence in two simulated settings.
We show that these systems suffer significant performance drops against these attacks.
We discuss the growing threat of modern NLG systems as generators of disinformation.
arXiv Detail & Related papers (2022-02-18T19:01:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.