Code Ownership in Open-Source AI Software Security
        - URL: http://arxiv.org/abs/2312.10861v1
- Date: Mon, 18 Dec 2023 00:37:29 GMT
- Title: Code Ownership in Open-Source AI Software Security
- Authors: Jiawen Wen, Dong Yuan, Lei Ma, Huaming Chen
- Abstract summary: We use code ownership metrics to investigate the correlation with latent vulnerabilities across five prominent open-source AI software projects.
The findings suggest a positive relationship between high-level ownership (characterised by a limited number of minor contributors) and a decrease in vulnerabilities.
With these novel code ownership metrics, we have implemented a Python-based command-line application to aid project curators and quality assurance professionals in evaluating and benchmarking their on-site projects.
- Score: 18.779538756226298
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   As open-source AI software projects become an integral component in the AI
software development, it is critical to develop a novel methods to ensure and
measure the security of the open-source projects for developers. Code
ownership, pivotal in the evolution of such projects, offers insights into
developer engagement and potential vulnerabilities. In this paper, we leverage
the code ownership metrics to empirically investigate the correlation with the
latent vulnerabilities across five prominent open-source AI software projects.
The findings from the large-scale empirical study suggest a positive
relationship between high-level ownership (characterised by a limited number of
minor contributors) and a decrease in vulnerabilities. Furthermore, we
innovatively introduce the time metrics, anchored on the project's duration,
individual source code file timelines, and the count of impacted releases.
These metrics adeptly categorise distinct phases of open-source AI software
projects and their respective vulnerability intensities. With these novel code
ownership metrics, we have implemented a Python-based command-line application
to aid project curators and quality assurance professionals in evaluating and
benchmarking their on-site projects. We anticipate this work will embark a
continuous research development for securing and measuring open-source AI
project security.
 
      
        Related papers
        - Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent   Foundation Models Training [67.895981259683]
 General AI Agents are increasingly recognized as foundational frameworks for the next generation of artificial intelligence.<n>Current agent systems are either closed-source or heavily reliant on a variety of paid APIs and proprietary tools.<n>We present Cognitive Kernel-Pro, a fully open-source and (to the maximum extent) free multi-module agent framework.
 arXiv  Detail & Related papers  (2025-08-01T08:11:31Z)
- CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with   Real-World Vulnerabilities at Scale [46.76144797837242]
 Large language model (LLM) agents are becoming increasingly skilled at handling cybersecurity tasks autonomously.<n>Existing benchmarks fall short, often failing to capture real-world scenarios or being limited in scope.<n>We introduce CyberGym, a large-scale and high-quality cybersecurity evaluation framework featuring 1,507 real-world vulnerabilities.
 arXiv  Detail & Related papers  (2025-06-03T07:35:14Z)
- OSS-UAgent: An Agent-based Usability Evaluation Framework for Open   Source Software [47.02288620982592]
 Our framework employs intelligent agents powered by large language models (LLMs) to simulate developers performing programming tasks.<n> OSS-UAgent ensures accurate and context-aware code generation.<n>Our demonstration showcases OSS-UAgent's practical application in evaluating graph analytics platforms.
 arXiv  Detail & Related papers  (2025-05-29T08:40:10Z)
- Wolves in the Repository: A Software Engineering Analysis of the XZ   Utils Supply Chain Attack [0.8517406772939294]
 The digital economy runs on Open Source Software (OSS), with an estimated 90% of modern applications containing open-source components.
This paper examines a sophisticated attack on the XZUtils project (-2024-3094), where attackers exploited not just code, but the entire open-source development process.
Our analysis reveals a new breed of supply chain attack that manipulates software engineering practices themselves.
 arXiv  Detail & Related papers  (2025-04-24T12:06:11Z)
- Thinking Longer, Not Larger: Enhancing Software Engineering Agents via   Scaling Test-Time Compute [61.00662702026523]
 We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
 arXiv  Detail & Related papers  (2025-03-31T07:31:32Z)
- SOK: Exploring Hallucinations and Security Risks in AI-Assisted Software   Development with Insights for LLM Deployment [0.0]
 Large Language Models (LLMs) such as GitHub Copilot, ChatGPT, Cursor AI, and Codeium AI have revolutionized the coding landscape.
This paper provides a comprehensive analysis of the benefits and risks associated with AI-powered coding tools.
 arXiv  Detail & Related papers  (2025-01-31T06:00:27Z)
- Forecasting the risk of software choices: A model to foretell security   vulnerabilities from library dependencies and source code evolution [4.538870924201896]
 We introduce a model capable of vulnerability forecasting at library level.
Our model can estimate the probability that a software project faces a CVE disclosure in a future time window.
 arXiv  Detail & Related papers  (2024-11-17T23:36:27Z)
- The Impact of SBOM Generators on Vulnerability Assessment in Python: A   Comparison and a Novel Approach [56.4040698609393]
 Software Bill of Materials (SBOM) has been promoted as a tool to increase transparency and verifiability in software composition.
Current SBOM generation tools often suffer from inaccuracies in identifying components and dependencies.
We propose PIP-sbom, a novel pip-inspired solution that addresses their shortcomings.
 arXiv  Detail & Related papers  (2024-09-10T10:12:37Z)
- Trust, but Verify: Evaluating Developer Behavior in Mitigating Security   Vulnerabilities in Open-Source Software Projects [0.11999555634662631]
 This study investigates vulnerabilities in dependencies of sampled open-source software (OSS) projects.
We have identified common issues in outdated or unmaintained dependencies, that pose significant security risks.
Results suggest that reducing the number of direct dependencies and prioritizing well-established libraries with strong security records are effective strategies for enhancing the software security landscape.
 arXiv  Detail & Related papers  (2024-08-26T13:46:48Z)
- Agent-Driven Automatic Software Improvement [55.2480439325792]
 This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs)
The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation.
We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
 arXiv  Detail & Related papers  (2024-06-24T15:45:22Z)
- Risks and Opportunities of Open-Source Generative AI [64.86989162783648]
 Applications of Generative AI (Gen AI) are expected to revolutionize a number of different areas, ranging from science & medicine to education.
The potential for these seismic changes has triggered a lively debate about the potential risks of the technology, and resulted in calls for tighter regulation.
This regulation is likely to put at risk the budding field of open-source generative AI.
 arXiv  Detail & Related papers  (2024-05-14T13:37:36Z)
- Open-Source AI-based SE Tools: Opportunities and Challenges of   Collaborative Software Learning [23.395624804517034]
 Large Language Models (LLMs) have become instrumental in advancing software engineering (SE) tasks.
The collaboration of these AI-based SE models hinges on maximising the sources of high-quality data.
Data especially of high quality, often holds commercial or sensitive value, making it less accessible for open-source AI-based SE projects.
 arXiv  Detail & Related papers  (2024-04-09T10:47:02Z)
- Embedded Software Development with Digital Twins: Specific Requirements
  for Small and Medium-Sized Enterprises [55.57032418885258]
 Digital twins have the potential for cost-effective software development and maintenance strategies.
We interviewed SMEs about their current development processes.
First results show that real-time requirements prevent, to date, a Software-in-the-Loop development approach.
 arXiv  Detail & Related papers  (2023-09-17T08:56:36Z)
- State-Of-The-Practice in Quality Assurance in Java-Based Open Source
  Software Development [3.4800665691198565]
 We investigate whether and how quality assurance approaches are being used in conjunction in the development of 1,454 popular open source software projects on GitHub.
Our study indicates that typically projects do not follow all quality assurance practices together with high intensity.
In general, our study provides a deeper understanding of how existing quality assurance approaches are currently being used in Java-based open source software development.
 arXiv  Detail & Related papers  (2023-06-16T07:43:11Z)
- The GitHub Development Workflow Automation Ecosystems [47.818229204130596]
 Large-scale software development has become a highly collaborative endeavour.
This chapter explores the ecosystems of development bots and GitHub Actions.
It provides an extensive survey of the state-of-the-art in this domain.
 arXiv  Detail & Related papers  (2023-05-08T15:24:23Z)
- CodeLMSec Benchmark: Systematically Evaluating and Finding Security
  Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
 Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks.
Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities.
This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
 arXiv  Detail & Related papers  (2023-02-08T11:54:07Z)
- "Project smells" -- Experiences in Analysing the Software Quality of ML
  Projects with mllint [6.0141405230309335]
 We introduce the novel concept of project smells which consider deficits in project management as a more holistic perspective on software quality.
An open-source static analysis tool mllint was also implemented to help detect and mitigate these.
Our findings indicate a need for context-aware static analysis tools, that fit the needs of the project at its current stage of development.
 arXiv  Detail & Related papers  (2022-01-20T15:52:24Z)
- Uncertainty Quantification 360: A Holistic Toolkit for Quantifying and
  Communicating the Uncertainty of AI [49.64037266892634]
 We describe an open source Python toolkit named Uncertainty Quantification 360 (UQ360) for the uncertainty quantification of AI models.
The goal of this toolkit is twofold: first, to provide a broad range of capabilities to streamline as well as foster the common practices of quantifying, evaluating, improving, and communicating uncertainty in the AI application development lifecycle; second, to encourage further exploration of UQ's connections to other pillars of trustworthy AI.
 arXiv  Detail & Related papers  (2021-06-02T18:29:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.