Code Ownership in Open-Source AI Software Security
- URL: http://arxiv.org/abs/2312.10861v1
- Date: Mon, 18 Dec 2023 00:37:29 GMT
- Title: Code Ownership in Open-Source AI Software Security
- Authors: Jiawen Wen, Dong Yuan, Lei Ma, Huaming Chen
- Abstract summary: We use code ownership metrics to investigate the correlation with latent vulnerabilities across five prominent open-source AI software projects.
The findings suggest a positive relationship between high-level ownership (characterised by a limited number of minor contributors) and a decrease in vulnerabilities.
With these novel code ownership metrics, we have implemented a Python-based command-line application to aid project curators and quality assurance professionals in evaluating and benchmarking their on-site projects.
- Score: 18.779538756226298
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As open-source AI software projects become an integral component in the AI
software development, it is critical to develop a novel methods to ensure and
measure the security of the open-source projects for developers. Code
ownership, pivotal in the evolution of such projects, offers insights into
developer engagement and potential vulnerabilities. In this paper, we leverage
the code ownership metrics to empirically investigate the correlation with the
latent vulnerabilities across five prominent open-source AI software projects.
The findings from the large-scale empirical study suggest a positive
relationship between high-level ownership (characterised by a limited number of
minor contributors) and a decrease in vulnerabilities. Furthermore, we
innovatively introduce the time metrics, anchored on the project's duration,
individual source code file timelines, and the count of impacted releases.
These metrics adeptly categorise distinct phases of open-source AI software
projects and their respective vulnerability intensities. With these novel code
ownership metrics, we have implemented a Python-based command-line application
to aid project curators and quality assurance professionals in evaluating and
benchmarking their on-site projects. We anticipate this work will embark a
continuous research development for securing and measuring open-source AI
project security.
Related papers
- Forecasting the risk of software choices: A model to foretell security vulnerabilities from library dependencies and source code evolution [4.538870924201896]
We introduce a model capable of vulnerability forecasting at library level.
Our model can estimate the probability that a software project faces a CVE disclosure in a future time window.
arXiv Detail & Related papers (2024-11-17T23:36:27Z) - The Impact of SBOM Generators on Vulnerability Assessment in Python: A Comparison and a Novel Approach [56.4040698609393]
Software Bill of Materials (SBOM) has been promoted as a tool to increase transparency and verifiability in software composition.
Current SBOM generation tools often suffer from inaccuracies in identifying components and dependencies.
We propose PIP-sbom, a novel pip-inspired solution that addresses their shortcomings.
arXiv Detail & Related papers (2024-09-10T10:12:37Z) - Trust, but Verify: Evaluating Developer Behavior in Mitigating Security Vulnerabilities in Open-Source Software Projects [0.11999555634662631]
This study investigates vulnerabilities in dependencies of sampled open-source software (OSS) projects.
We have identified common issues in outdated or unmaintained dependencies, that pose significant security risks.
Results suggest that reducing the number of direct dependencies and prioritizing well-established libraries with strong security records are effective strategies for enhancing the software security landscape.
arXiv Detail & Related papers (2024-08-26T13:46:48Z) - Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs)
The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation.
We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z) - Risks and Opportunities of Open-Source Generative AI [64.86989162783648]
Applications of Generative AI (Gen AI) are expected to revolutionize a number of different areas, ranging from science & medicine to education.
The potential for these seismic changes has triggered a lively debate about the potential risks of the technology, and resulted in calls for tighter regulation.
This regulation is likely to put at risk the budding field of open-source generative AI.
arXiv Detail & Related papers (2024-05-14T13:37:36Z) - Open-Source AI-based SE Tools: Opportunities and Challenges of Collaborative Software Learning [23.395624804517034]
Large Language Models (LLMs) have become instrumental in advancing software engineering (SE) tasks.
The collaboration of these AI-based SE models hinges on maximising the sources of high-quality data.
Data especially of high quality, often holds commercial or sensitive value, making it less accessible for open-source AI-based SE projects.
arXiv Detail & Related papers (2024-04-09T10:47:02Z) - Embedded Software Development with Digital Twins: Specific Requirements
for Small and Medium-Sized Enterprises [55.57032418885258]
Digital twins have the potential for cost-effective software development and maintenance strategies.
We interviewed SMEs about their current development processes.
First results show that real-time requirements prevent, to date, a Software-in-the-Loop development approach.
arXiv Detail & Related papers (2023-09-17T08:56:36Z) - State-Of-The-Practice in Quality Assurance in Java-Based Open Source
Software Development [3.4800665691198565]
We investigate whether and how quality assurance approaches are being used in conjunction in the development of 1,454 popular open source software projects on GitHub.
Our study indicates that typically projects do not follow all quality assurance practices together with high intensity.
In general, our study provides a deeper understanding of how existing quality assurance approaches are currently being used in Java-based open source software development.
arXiv Detail & Related papers (2023-06-16T07:43:11Z) - CodeLMSec Benchmark: Systematically Evaluating and Finding Security
Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks.
Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities.
This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z) - "Project smells" -- Experiences in Analysing the Software Quality of ML
Projects with mllint [6.0141405230309335]
We introduce the novel concept of project smells which consider deficits in project management as a more holistic perspective on software quality.
An open-source static analysis tool mllint was also implemented to help detect and mitigate these.
Our findings indicate a need for context-aware static analysis tools, that fit the needs of the project at its current stage of development.
arXiv Detail & Related papers (2022-01-20T15:52:24Z) - Uncertainty Quantification 360: A Holistic Toolkit for Quantifying and
Communicating the Uncertainty of AI [49.64037266892634]
We describe an open source Python toolkit named Uncertainty Quantification 360 (UQ360) for the uncertainty quantification of AI models.
The goal of this toolkit is twofold: first, to provide a broad range of capabilities to streamline as well as foster the common practices of quantifying, evaluating, improving, and communicating uncertainty in the AI application development lifecycle; second, to encourage further exploration of UQ's connections to other pillars of trustworthy AI.
arXiv Detail & Related papers (2021-06-02T18:29:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.