Related papers: Open-Source AI-based SE Tools: Opportunities and Challenges of Collaborative Software Learning

Open-Source AI-based SE Tools: Opportunities and Challenges of Collaborative Software Learning

URL: http://arxiv.org/abs/2404.06201v1
Date: Tue, 9 Apr 2024 10:47:02 GMT
Title: Open-Source AI-based SE Tools: Opportunities and Challenges of Collaborative Software Learning
Authors: Zhihao Lin, Wei Ma, Tao Lin, Yaowen Zheng, Jingquan Ge, Jun Wang, Jacques Klein, Tegawende Bissyande, Yang Liu, Li Li,
Abstract summary: Large Language Models (LLMs) have become instrumental in advancing software engineering (SE) tasks. The collaboration of these AI-based SE models hinges on maximising the sources of high-quality data. Data especially of high quality, often holds commercial or sensitive value, making it less accessible for open-source AI-based SE projects.
Score: 23.395624804517034
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have become instrumental in advancing software engineering (SE) tasks, showcasing their efficacy in code understanding and beyond. Like traditional SE tools, open-source collaboration is key in realising the excellent products. However, with AI models, the essential need is in data. The collaboration of these AI-based SE models hinges on maximising the sources of high-quality data. However, data especially of high quality, often holds commercial or sensitive value, making it less accessible for open-source AI-based SE projects. This reality presents a significant barrier to the development and enhancement of AI-based SE tools within the software engineering community. Therefore, researchers need to find solutions for enabling open-source AI-based SE models to tap into resources by different organisations. Addressing this challenge, our position paper investigates one solution to facilitate access to diverse organizational resources for open-source AI models, ensuring privacy and commercial sensitivities are respected. We introduce a governance framework centered on federated learning (FL), designed to foster the joint development and maintenance of open-source AI code models while safeguarding data privacy and security. Additionally, we present guidelines for developers on AI-based SE tool collaboration, covering data requirements, model architecture, updating strategies, and version control. Given the significant influence of data characteristics on FL, our research examines the effect of code data heterogeneity on FL performance.

Related papers

Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training [67.895981259683]
General AI Agents are increasingly recognized as foundational frameworks for the next generation of artificial intelligence.<n>Current agent systems are either closed-source or heavily reliant on a variety of paid APIs and proprietary tools.<n>We present Cognitive Kernel-Pro, a fully open-source and (to the maximum extent) free multi-module agent framework.
arXiv Detail & Related papers (2025-08-01T08:11:31Z)
AI Flow: Perspectives, Scenarios, and Approaches [51.38621621775711]
We introduce AI Flow, a framework that integrates cutting-edge IT and CT advancements.<n>First, device-edge-cloud framework serves as the foundation, which integrates end devices, edge servers, and cloud clusters.<n>Second, we introduce the concept of familial models, which refers to a series of different-sized models with aligned hidden features.<n>Third, connectivity- and interaction-based intelligence emergence is a novel paradigm of AI Flow.
arXiv Detail & Related papers (2025-06-14T12:43:07Z)
Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey [59.52058740470727]
Edge-cloud collaborative computing (ECCC) has emerged as a pivotal paradigm for addressing the computational demands of modern intelligent applications.<n>Recent advancements in AI, particularly deep learning and large language models (LLMs), have dramatically enhanced the capabilities of these distributed systems.<n>This survey provides a structured tutorial on fundamental architectures, enabling technologies, and emerging applications.
arXiv Detail & Related papers (2025-05-03T13:55:38Z)
Towards Human-Guided, Data-Centric LLM Co-Pilots [53.35493881390917]
CliMB-DC is a human-guided, data-centric framework for machine learning co-pilots. It combines advanced data-centric tools with LLM-driven reasoning to enable robust, context-aware data processing. We show how CliMB-DC can transform uncurated datasets into ML-ready formats.
arXiv Detail & Related papers (2025-01-17T17:51:22Z)
How Developers Interact with AI: A Taxonomy of Human-AI Collaboration in Software Engineering [8.65285948382426]
We propose a taxonomy of interaction types between developers and AI tools, identifying eleven distinct interaction types. Building on this taxonomy, we outline a research agenda focused on optimizing AI interactions, improving developer control, and addressing trust and usability challenges in AI-assisted development.
arXiv Detail & Related papers (2025-01-15T12:53:49Z)
Collaborative AI in Sentiment Analysis: System Architecture, Data Prediction and Deployment Strategies [3.3374611485861116]
Large language model (LLM) based artificial intelligence technologies have been a game-changer, particularly in sentiment analysis. However, integrating diverse AI models for processing complex multimodal data and the associated high costs of feature extraction presents significant challenges. This study introduces a collaborative AI framework designed to efficiently distribute and resolve tasks across various AI systems.
arXiv Detail & Related papers (2024-10-17T06:14:34Z)
Next-Gen Software Engineering: AI-Assisted Big Models [0.0]
This paper aims to facilitate a synthesis between models and AI in software engineering. The paper provides an overview of the current status of AI-assisted software engineering. A vision of AI-assisted Big Models in SE is put forth, with the aim of capitalising on the advantages inherent to both approaches.
arXiv Detail & Related papers (2024-09-26T16:49:57Z)
Generative AI like ChatGPT in Blockchain Federated Learning: use cases, opportunities and future [4.497001527881303]
This research explores potential integrations of generative AI in federated learning. generative adversarial networks (GANs) and variational autoencoders (VAEs) Generating synthetic data helps federated learning address challenges related to limited data availability.
arXiv Detail & Related papers (2024-07-25T19:43:49Z)
Is open source software culture enough to make AI a common ? [0.0]
Language models (LM) are increasingly deployed in the field of artificial intelligence (AI) The question arises as to whether they can be a common resource managed and maintained by a community of users. We highlight the potential benefits of treating the data and resources needed to create LMs as commons.
arXiv Detail & Related papers (2024-03-19T14:43:52Z)
Code Ownership in Open-Source AI Software Security [18.779538756226298]
We use code ownership metrics to investigate the correlation with latent vulnerabilities across five prominent open-source AI software projects. The findings suggest a positive relationship between high-level ownership (characterised by a limited number of minor contributors) and a decrease in vulnerabilities. With these novel code ownership metrics, we have implemented a Python-based command-line application to aid project curators and quality assurance professionals in evaluating and benchmarking their on-site projects.
arXiv Detail & Related papers (2023-12-18T00:37:29Z)
SoTaNa: The Open-Source Software Development Assistant [81.86136560157266]
SoTaNa is an open-source software development assistant. It generates high-quality instruction-based data for the domain of software engineering. It employs a parameter-efficient fine-tuning approach to enhance the open-source foundation model, LLaMA.
arXiv Detail & Related papers (2023-08-25T14:56:21Z)
Federated Learning-Empowered AI-Generated Content in Wireless Networks [58.48381827268331]
Federated learning (FL) can be leveraged to improve learning efficiency and achieve privacy protection for AIGC. We present FL-based techniques for empowering AIGC, and aim to enable users to generate diverse, personalized, and high-quality content.
arXiv Detail & Related papers (2023-07-14T04:13:11Z)
OpenAGI: When LLM Meets Domain Experts [51.86179657467822]
Human Intelligence (HI) excels at combining basic skills to solve complex tasks. This capability is vital for Artificial Intelligence (AI) and should be embedded in comprehensive AI Agents. We introduce OpenAGI, an open-source platform designed for solving multi-step, real-world tasks.
arXiv Detail & Related papers (2023-04-10T03:55:35Z)
Human-Centric Multimodal Machine Learning: Recent Advances and Testbed on AI-based Recruitment [66.91538273487379]
There is a certain consensus about the need to develop AI applications with a Human-Centric approach. Human-Centric Machine Learning needs to be developed based on four main requirements: (i) utility and social good; (ii) privacy and data ownership; (iii) transparency and accountability; and (iv) fairness in AI-driven decision-making processes. We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data.
arXiv Detail & Related papers (2023-02-13T16:44:44Z)
Enabling Automated Machine Learning for Model-Driven AI Engineering [60.09869520679979]
We propose a novel approach to enable Model-Driven Software Engineering and Model-Driven AI Engineering. In particular, we support Automated ML, thus assisting software engineers without deep AI knowledge in developing AI-intensive systems.
arXiv Detail & Related papers (2022-03-06T10:12:56Z)
Data-Driven and SE-assisted AI Model Signal-Awareness Enhancement and Introspection [61.571331422347875]
We propose a data-driven approach to enhance models' signal-awareness. We combine the SE concept of code complexity with the AI technique of curriculum learning. We achieve up to 4.8x improvement in model signal awareness.
arXiv Detail & Related papers (2021-11-10T17:58:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.