Related papers: Detecting and Fixing Violations of Modification Terms in Open Source Licenses during Forking

Detecting and Fixing Violations of Modification Terms in Open Source Licenses during Forking

URL: http://arxiv.org/abs/2310.07991v1
Date: Thu, 12 Oct 2023 02:37:06 GMT
Title: Detecting and Fixing Violations of Modification Terms in Open Source Licenses during Forking
Authors: Kaifeng Huang, Yingfeng Xia, Bihuan Chen, Zhuotong Zhou, Jin Guo, Xin Peng
Abstract summary: We first empirically characterize modification terms in 47 open source licenses. Inspired by our study, we then design LiVo to automatically detect and fix violations of modification terms in open source licenses during forking.
Score: 4.682961105225832
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Open source software brings benefit to software community, but also introduces legal risks caused by license violations, which result in serious consequences such as lawsuits and financial losses. To mitigate legal risks, some approaches have been proposed to identify licenses, detect license incompatibilities and inconsistencies, and recommend licenses. As far as we know, however, there is no prior work to understand modification terms in open source licenses or to detect and fix violations of modification terms. To bridge this gap, we first empirically characterize modification terms in 47 open source licenses. These licenses all require certain forms of "notice" to describe the modifications made to the original work. Inspired by our study, we then design LiVo to automatically detect and fix violations of modification terms in open source licenses during forking. Our evaluation has shown the effectiveness and efficiency of LiVo. 18 pull requests of fixing modification term violations have received positive responses. 8 have been merged.

Related papers

A first look at License Variants in the PyPI Ecosystem [22.01881122680886]
We conduct an empirical study of license variants in the PyPI ecosystem.<n>We introduce LV-, a novel approach for efficient license variant analysis leveraging diff-based techniques and large language models.<n> LV- achieves an accuracy of 0.936 while reducing computational costs by 30%, and LV-Compat identifies 5.2 times more incompatible packages than existing methods with a precision of 0.98.
arXiv Detail & Related papers (2025-07-19T12:41:33Z)
Certified Mitigation of Worst-Case LLM Copyright Infringement [46.571805194176825]
"copyright takedown" methods are aimed at preventing models from generating content substantially similar to copyrighted ones. We propose BloomScrub, a remarkably simple yet highly effective inference-time approach that provides certified copyright takedown. Our results suggest that lightweight, inference-time methods can be surprisingly effective for copyright prevention.
arXiv Detail & Related papers (2025-04-22T17:16:53Z)
Open Source at a Crossroads: The Future of Licensing Driven by Monetization [11.149764135999437]
Open Source Software Licenses (OSS licenses) ensure that software can be sold or distributed as part of aggregate programs from various sources without requiring a royalty or fee. We argue that open source is at a crossroads, with a growing need to redefine its licensing models and support communities and critical software.
arXiv Detail & Related papers (2025-03-04T17:44:01Z)
Towards Copyright Protection for Knowledge Bases of Retrieval-augmented Language Models via Ownership Verification with Reasoning [58.57194301645823]
Large language models (LLMs) are increasingly integrated into real-world applications through retrieval-augmented generation (RAG) mechanisms. Existing methods that can be generalized as watermarking techniques to protect these knowledge bases typically involve poisoning attacks. We propose name for harmless' copyright protection of knowledge bases.
arXiv Detail & Related papers (2025-02-10T09:15:56Z)
Lost in Edits? A $λ$-Compass for AIGC Provenance [119.95562081325552]
We propose a novel latent-space attribution method that robustly identifies and differentiates authentic outputs from manipulated ones. LambdaTracer is effective across diverse iterative editing processes, whether automated through text-guided editing tools such as InstructPix2Pix or performed manually with editing software such as Adobe Photoshop.
arXiv Detail & Related papers (2025-02-05T06:24:25Z)
PatentEdits: Framing Patent Novelty as Textual Entailment [62.8514393375952]
We introduce the PatentEdits dataset, which contains 105K examples of successful revisions. We design algorithms to label edits sentence by sentence, then establish how well these edits can be predicted with large language models. We demonstrate that evaluating textual entailment between cited references and draft sentences is especially effective in predicting which inventive claims remained unchanged or are novel in relation to prior art.
arXiv Detail & Related papers (2024-11-20T17:23:40Z)
OSS License Identification at Scale: A Comprehensive Dataset Using World of Code [4.954816514146113]
We employ an exhaustive approach, scanning all files containing license'' in their filepath, and apply the winnowing algorithm for robust text matching. Our method identifies and matches over 5.5 million distinct license blobs across millions of OSS projects, creating a detailed project-to-license (P2L) map.
arXiv Detail & Related papers (2024-09-07T13:34:55Z)
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? [62.72729485995075]
We investigate the effectiveness of watermarking as a deterrent against the generation of copyrighted texts. We find that watermarking adversely affects the success rate of Membership Inference Attacks (MIAs) We propose an adaptive technique to improve the success rate of a recent MIA under watermarking.
arXiv Detail & Related papers (2024-07-24T16:53:09Z)
On the modification and revocation of open source licences [0.14843690728081999]
This paper argues for the creation of a subset of rights that allows open source contributors to force users to update to the most recent version of a model. Legal, reputational and moral risks related to open-sourcing AI models could justify contributors having more control over downstream uses.
arXiv Detail & Related papers (2024-05-29T00:00:25Z)
Private Online Community Detection for Censored Block Models [60.039026645807326]
We study the private online change detection problem for dynamic communities, using a censored block model (CBM) We propose an algorithm capable of identifying changes in the community structure, while maintaining user privacy.
arXiv Detail & Related papers (2024-05-09T12:35:57Z)
GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence [64.95492752484171]
We present GenAudit -- a tool intended to assist fact-checking LLM responses for document-grounded tasks. We train models to execute these tasks, and design an interactive interface to present suggested edits and evidence to users. To ensure that most errors are flagged by the system, we propose a method that can increase the error recall while minimizing impact on precision.
arXiv Detail & Related papers (2024-02-19T21:45:55Z)
Catch the Butterfly: Peeking into the Terms and Conflicts among SPDX Licenses [16.948633594354412]
Third-party libraries (TPLs) in software development has accelerated the creation of modern software. Developers may inadvertently violate the licenses of TPLs, leading to legal issues. There is a need for a high-quality license dataset that encompasses a broad range of mainstream licenses.
arXiv Detail & Related papers (2024-01-19T11:27:34Z)
LiSum: Open Source Software License Summarization with Multi-Task Learning [16.521420821183995]
Open source software (OSS) licenses regulate the conditions under which users can reuse, modify, and distribute the software legally. There exist various OSS licenses in the community, written in a formal language, which are typically long and complicated to understand. Motivated by the user study and the fast growth of licenses in the community, we propose the first study towards automated license summarization.
arXiv Detail & Related papers (2023-09-10T16:43:51Z)
LiResolver: License Incompatibility Resolution for Open Source Software [13.28021004336228]
LiResolver is a fine-grained, scalable, and flexible tool to resolve license incompatibility issues for open source software. Comprehensive experiments demonstrate the effectiveness of LiResolver, with 4.09% false positive (FP) rate and 0.02% false negative (FN) rate for incompatibility issue localization.
arXiv Detail & Related papers (2023-06-26T13:16:09Z)
FAT Forensics: A Python Toolbox for Implementing and Deploying Fairness, Accountability and Transparency Algorithms in Predictive Systems [69.24490096929709]
We developed an open source Python package called FAT Forensics. It can inspect important fairness, accountability and transparency aspects of predictive algorithms. Our toolbox can evaluate all elements of a predictive pipeline.
arXiv Detail & Related papers (2022-09-08T13:25:02Z)
Synthetic Disinformation Attacks on Automated Fact Verification Systems [53.011635547834025]
We explore the sensitivity of automated fact-checkers to synthetic adversarial evidence in two simulated settings. We show that these systems suffer significant performance drops against these attacks. We discuss the growing threat of modern NLG systems as generators of disinformation.
arXiv Detail & Related papers (2022-02-18T19:01:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.