Related papers: Towards Trustworthy AI Software Development Assistance

Towards Trustworthy AI Software Development Assistance

URL: http://arxiv.org/abs/2312.09126v2
Date: Tue, 23 Jan 2024 09:37:39 GMT
Title: Towards Trustworthy AI Software Development Assistance
Authors: Daniel Maninger, Krishna Narasimhan, Mira Mezini
Abstract summary: Current software development assistants tend to be unreliable, often producing incorrect, unsafe, or low-quality code. We seek to resolve these issues by introducing a holistic architecture for constructing, training, and using trustworthy AI software development assistants.
Score: 0.599251270168187
License: http://creativecommons.org/licenses/by/4.0/
Abstract: It is expected that in the near future, AI software development assistants will play an important role in the software industry. However, current software development assistants tend to be unreliable, often producing incorrect, unsafe, or low-quality code. We seek to resolve these issues by introducing a holistic architecture for constructing, training, and using trustworthy AI software development assistants. In the center of the architecture, there is a foundational LLM trained on datasets representative of real-world coding scenarios and complex software architectures, and fine-tuned on code quality criteria beyond correctness. The LLM will make use of graph-based code representations for advanced semantic comprehension. We envision a knowledge graph integrated into the system to provide up-to-date background knowledge and to enable the assistant to provide appropriate explanations. Finally, a modular framework for constrained decoding will ensure that certain guarantees (e.g., for correctness and security) hold for the generated code.

Related papers

Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z)
Agentic AI Software Engineer: Programming with Trust [33.88230182444934]
Large Language Models (LLMs) have shown surprising proficiency in generating code snippets. We argue that successfully deploying AI software engineers requires a level of trust equal to or even greater than the trust established by human-driven software engineering practices.
arXiv Detail & Related papers (2025-02-19T14:28:42Z)
Bridging LLM-Generated Code and Requirements: Reverse Generation technique and SBC Metric for Developer Insights [0.0]
This paper introduces a novel scoring mechanism called the SBC score. It is based on a reverse generation technique that leverages the natural language generation capabilities of Large Language Models. Unlike direct code analysis, our approach reconstructs system requirements from AI-generated code and compares them with the original specifications.
arXiv Detail & Related papers (2025-02-11T01:12:11Z)
Human-In-the-Loop Software Development Agents [12.830816751625829]
Large Language Models (LLMs) are introduced to automatically resolve software development tasks. We introduce a Human-in-the-loop LLM-based Agents framework (HULA) for software development. We design, implement, and deploy the HULA framework into Atlassian for internal uses.
arXiv Detail & Related papers (2024-11-19T23:22:33Z)
LLMs: A Game-Changer for Software Engineers? [0.0]
Large Language Models (LLMs) like GPT-3 and GPT-4 have emerged as groundbreaking innovations with capabilities that extend far beyond traditional AI applications. Their potential to revolutionize software development has captivated the software engineering (SE) community. This paper argues that LLMs are not just reshaping how software is developed but are redefining the role of developers.
arXiv Detail & Related papers (2024-11-01T17:14:37Z)
RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph [63.87660059104077]
We present RepoGraph, a plug-in module that manages a repository-level structure for modern AI software engineering solutions. RepoGraph substantially boosts the performance of all systems, leading to a new state-of-the-art among open-source frameworks.
arXiv Detail & Related papers (2024-10-03T05:45:26Z)
CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding Capabilities of CodeLLMs [9.649864680130781]
We present CodeMMLU, a benchmark designed to evaluate the depth of software and code understanding in CodeLLMs. CodeMMLU includes over 10,000 questions sourced from diverse domains, encompassing tasks such as code analysis, defect detection, and software engineering principles. Our evaluation reveals that even state-of-the-art models face significant challenges with CodeMMLU.
arXiv Detail & Related papers (2024-10-02T20:04:02Z)
Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework. Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z)
Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs) The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation. We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z)
How Far Have We Gone in Binary Code Understanding Using Large Language Models [51.527805834378974]
We propose a benchmark to evaluate the effectiveness of Large Language Models (LLMs) in binary code understanding. Our evaluations reveal that existing LLMs can understand binary code to a certain extent, thereby improving the efficiency of binary code analysis.
arXiv Detail & Related papers (2024-04-15T14:44:08Z)
Developer Experiences with a Contextualized AI Coding Assistant: Usability, Expectations, and Outcomes [11.520721038793285]
This study focuses on the initial experiences of 62 participants who used a contextualized coding AI assistant -- named StackSpot AI -- in a controlled setting. Assistants' use resulted in significant time savings, easier access to documentation, and the generation of accurate codes for internal APIs. challenges associated with the knowledge sources necessary to make the coding assistant access more contextual information as well as variable responses and limitations in handling complex codes were observed.
arXiv Detail & Related papers (2023-11-30T10:52:28Z)
Enabling Automated Machine Learning for Model-Driven AI Engineering [60.09869520679979]
We propose a novel approach to enable Model-Driven Software Engineering and Model-Driven AI Engineering. In particular, we support Automated ML, thus assisting software engineers without deep AI knowledge in developing AI-intensive systems.
arXiv Detail & Related papers (2022-03-06T10:12:56Z)
Empowered and Embedded: Ethics and Agile Processes [60.63670249088117]
We argue that ethical considerations need to be embedded into the (agile) software development process. We put emphasis on the possibility to implement ethical deliberations in already existing and well established agile software development processes.
arXiv Detail & Related papers (2021-07-15T11:14:03Z)
Technology Readiness Levels for Machine Learning Systems [107.56979560568232]
Development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. We have developed a proven systems engineering approach for machine learning development and deployment. Our "Machine Learning Technology Readiness Levels" framework defines a principled process to ensure robust, reliable, and responsible systems.
arXiv Detail & Related papers (2021-01-11T15:54:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.