Related papers: LiResolver: License Incompatibility Resolution for Open Source Software

LiResolver: License Incompatibility Resolution for Open Source Software

URL: http://arxiv.org/abs/2306.14675v1
Date: Mon, 26 Jun 2023 13:16:09 GMT
Title: LiResolver: License Incompatibility Resolution for Open Source Software
Authors: Sihan Xu, Ya Gao, Lingling Fan, Linyu Li, Xiangrui Cai, and Zheli Liu
Abstract summary: LiResolver is a fine-grained, scalable, and flexible tool to resolve license incompatibility issues for open source software. Comprehensive experiments demonstrate the effectiveness of LiResolver, with 4.09% false positive (FP) rate and 0.02% false negative (FN) rate for incompatibility issue localization.
Score: 13.28021004336228
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Open source software (OSS) licenses regulate the conditions under which OSS can be legally reused, distributed, and modified. However, a common issue arises when incorporating third-party OSS accompanied with licenses, i.e., license incompatibility, which occurs when multiple licenses exist in one project and there are conflicts between them. Despite being problematic, fixing license incompatibility issues requires substantial efforts due to the lack of license understanding and complex package dependency. In this paper, we propose LiResolver, a fine-grained, scalable, and flexible tool to resolve license incompatibility issues for open source software. Specifically, it first understands the semantics of licenses through fine-grained entity extraction and relation extraction. Then, it detects and resolves license incompatibility issues by recommending official licenses in priority. When no official licenses can satisfy the constraints, it generates a custom license as an alternative solution. Comprehensive experiments demonstrate the effectiveness of LiResolver, with 4.09% false positive (FP) rate and 0.02% false negative (FN) rate for incompatibility issue localization, and 62.61% of 230 real-world incompatible projects resolved by LiResolver. We discuss the feedback from OSS developers and the lessons learned from this work. All the datasets and the replication package of LiResolver have been made publicly available to facilitate follow-up research.

Related papers

A first look at License Variants in the PyPI Ecosystem [22.01881122680886]
We conduct an empirical study of license variants in the PyPI ecosystem.<n>We introduce LV-, a novel approach for efficient license variant analysis leveraging diff-based techniques and large language models.<n> LV- achieves an accuracy of 0.936 while reducing computational costs by 30%, and LV-Compat identifies 5.2 times more incompatible packages than existing methods with a precision of 0.98.
arXiv Detail & Related papers (2025-07-19T12:41:33Z)
D-LiFT: Improving LLM-based Decompiler Backend via Code Quality-driven Fine-tuning [49.16469288280772]
We present D-LiFT, an automated decompiler backend that harnesses and trains LLMs to improve the quality of decompiled code via reinforcement learning (RL)<n>D-LiFT adheres to a key principle for enhancing the quality of decompiled code: textitpreserving accuracy while improving readability.<n>Central to D-LiFT, we propose D-SCORE, an integrated quality assessment system to score the decompiled code from multiple aspects.
arXiv Detail & Related papers (2025-06-11T19:09:08Z)
Open Source at a Crossroads: The Future of Licensing Driven by Monetization [11.149764135999437]
Open Source Software Licenses (OSS licenses) ensure that software can be sold or distributed as part of aggregate programs from various sources without requiring a royalty or fee. We argue that open source is at a crossroads, with a growing need to redefine its licensing models and support communities and critical software.
arXiv Detail & Related papers (2025-03-04T17:44:01Z)
Do Not Trust Licenses You See: Dataset Compliance Requires Massive-Scale AI-Powered Lifecycle Tracing [45.6582862121583]
This paper argues that a dataset's legal risk cannot be accurately assessed by its license terms alone. It argues that tracking dataset redistribution and its full lifecycle is essential. We show that AI can perform these tasks with higher accuracy, efficiency, and cost-effectiveness than human experts.
arXiv Detail & Related papers (2025-03-04T16:57:53Z)
SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution [56.9361004704428]
Large Language Models (LLMs) have demonstrated remarkable proficiency across a variety of complex tasks. SWE-Fixer is a novel open-source framework designed to effectively and efficiently resolve GitHub issues. We assess our approach on the SWE-Bench Lite and Verified benchmarks, achieving state-of-the-art performance among open-source models.
arXiv Detail & Related papers (2025-01-09T07:54:24Z)
"They've Stolen My GPL-Licensed Model!": Toward Standardized and Transparent Model Licensing [30.19362102481241]
We develop a new vocabulary for ML workflow management and encoded license rules to enable ontological reasoning for analyzing rights granting and compliance issues. Our analysis tool is built on Turtle language and Notation3 reasoning engine, envisioned as first step toward Linked Open Model Data.
arXiv Detail & Related papers (2024-12-16T06:52:09Z)
An Overview and Catalogue of Dependency Challenges in Open Source Software Package Registries [52.23798016734889]
This article provides a catalogue of dependency-related challenges that come with relying on OSS packages or libraries. The catalogue is based on the scientific literature on empirical research that has been conducted to understand, quantify and overcome these challenges.
arXiv Detail & Related papers (2024-09-27T16:20:20Z)
Open-domain Implicit Format Control for Large Language Model Generation [52.83173553689678]
We introduce a novel framework for controlled generation in large language models (LLMs) This study investigates LLMs' capabilities to follow open-domain, one-shot constraints and replicate the format of the example answers. We also develop a dataset collection methodology for supervised fine-tuning that enhances the open-domain format control of LLMs without degrading output quality.
arXiv Detail & Related papers (2024-08-08T11:51:45Z)
On the modification and revocation of open source licences [0.14843690728081999]
This paper argues for the creation of a subset of rights that allows open source contributors to force users to update to the most recent version of a model. Legal, reputational and moral risks related to open-sourcing AI models could justify contributors having more control over downstream uses.
arXiv Detail & Related papers (2024-05-29T00:00:25Z)
Catch the Butterfly: Peeking into the Terms and Conflicts among SPDX Licenses [16.948633594354412]
Third-party libraries (TPLs) in software development has accelerated the creation of modern software. Developers may inadvertently violate the licenses of TPLs, leading to legal issues. There is a need for a high-quality license dataset that encompasses a broad range of mainstream licenses.
arXiv Detail & Related papers (2024-01-19T11:27:34Z)
A Comprehensive Evaluation of Large Language Models on Legal Judgment Prediction [60.70089334782383]
Large language models (LLMs) have demonstrated great potential for domain-specific applications. Recent disputes over GPT-4's law evaluation raise questions concerning their performance in real-world legal tasks. We design practical baseline solutions based on LLMs and test on the task of legal judgment prediction.
arXiv Detail & Related papers (2023-10-18T07:38:04Z)
Detecting and Fixing Violations of Modification Terms in Open Source Licenses during Forking [4.682961105225832]
We first empirically characterize modification terms in 47 open source licenses. Inspired by our study, we then design LiVo to automatically detect and fix violations of modification terms in open source licenses during forking.
arXiv Detail & Related papers (2023-10-12T02:37:06Z)
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models [85.02796681773447]
We propose a quantization-aware low-rank adaptation (QA-LoRA) algorithm. The motivation lies in the imbalanced degrees of freedom of quantization and adaptation. QA-LoRA is easily implemented with a few lines of code.
arXiv Detail & Related papers (2023-09-26T07:22:23Z)
LiSum: Open Source Software License Summarization with Multi-Task Learning [16.521420821183995]
Open source software (OSS) licenses regulate the conditions under which users can reuse, modify, and distribute the software legally. There exist various OSS licenses in the community, written in a formal language, which are typically long and complicated to understand. Motivated by the user study and the fast growth of licenses in the community, we propose the first study towards automated license summarization.
arXiv Detail & Related papers (2023-09-10T16:43:51Z)
Understanding and Remediating Open-Source License Incompatibilities in the PyPI Ecosystem [29.898303568884227]
We conduct a large-scale empirical study of license incompatibilities and their remediation practices in the PyPI ecosystem. We propose SILENCE, an SMT-solver-based approach to recommend license incompatibility remediations with minimal costs in package dependency graph.
arXiv Detail & Related papers (2023-08-11T04:57:54Z)
Analyzing Maintenance Activities of Software Libraries [65.268245109828]
Industrial applications heavily integrate open-source software libraries nowadays. I want to introduce an automatic monitoring approach for industrial applications to identify open-source dependencies that show negative signs regarding their current or future maintenance activities.
arXiv Detail & Related papers (2023-06-09T16:51:25Z)
FedSOV: Federated Model Secure Ownership Verification with Unforgeable Signature [60.99054146321459]
Federated learning allows multiple parties to collaborate in learning a global model without revealing private data. We propose a cryptographic signature-based federated learning model ownership verification scheme named FedSOV.
arXiv Detail & Related papers (2023-05-10T12:10:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.