Trust Calibration in IDEs: Paving the Way for Widespread Adoption of AI Refactoring
- URL: http://arxiv.org/abs/2412.15948v1
- Date: Fri, 20 Dec 2024 14:44:11 GMT
- Title: Trust Calibration in IDEs: Paving the Way for Widespread Adoption of AI Refactoring
- Authors: Markus Borg,
- Abstract summary: Large Language Models (LLMs) offer a new approach to improving languages at unprecedented scale through AI-assisted.
LLMs come with inherent risks such as braking changes and the introduction of security vulnerabilities.
We advocate for encapsulating the interaction with the models in IDEs and validating attempts using trustworthy safeguards.
In this position paper, we position our future work based on established models from research on human factors in automation.
We outline action research within CodeScene on development of 1) novel safeguards and 2) user interaction that conveys an appropriate level of trust.
- Score: 5.342931064962865
- License:
- Abstract: In the software industry, the drive to add new features often overshadows the need to improve existing code. Large Language Models (LLMs) offer a new approach to improving codebases at an unprecedented scale through AI-assisted refactoring. However, LLMs come with inherent risks such as braking changes and the introduction of security vulnerabilities. We advocate for encapsulating the interaction with the models in IDEs and validating refactoring attempts using trustworthy safeguards. However, equally important for the uptake of AI refactoring is research on trust development. In this position paper, we position our future work based on established models from research on human factors in automation. We outline action research within CodeScene on development of 1) novel LLM safeguards and 2) user interaction that conveys an appropriate level of trust. The industry collaboration enables large-scale repository analysis and A/B testing to continuously guide the design of our research interventions.
Related papers
- MLGym: A New Framework and Benchmark for Advancing AI Research Agents [51.9387884953294]
We introduce Meta MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing large language models on AI research tasks.
This is the first Gym environment for machine learning (ML) tasks, enabling research on reinforcement learning (RL) algorithms for training such agents.
We evaluate a number of frontier large language models (LLMs) on our benchmarks such as Claude-3.5-Sonnet, Llama-3.1 405B, GPT-4o, o1-preview, and Gemini-1.5 Pro.
arXiv Detail & Related papers (2025-02-20T12:28:23Z) - Adversarial Reasoning at Jailbreaking Time [49.70772424278124]
We develop an adversarial reasoning approach to automatic jailbreaking via test-time computation.
Our approach introduces a new paradigm in understanding LLM vulnerabilities, laying the foundation for the development of more robust and trustworthy AI systems.
arXiv Detail & Related papers (2025-02-03T18:59:01Z) - Helping LLMs Improve Code Generation Using Feedback from Testing and Static Analysis [3.892345568697058]
Large Language Models (LLMs) are one of the most promising developments in the field of artificial intelligence.
Developers routinely ask LLMs to generate code snippets, increasing productivity but also introducing ownership, privacy, correctness, and security issues.
Previous work highlighted how code generated by commercial LLMs is often not safe, containing vulnerabilities, bugs, and code smells.
arXiv Detail & Related papers (2024-12-19T13:34:14Z) - A Comprehensive Survey of AI-Driven Advancements and Techniques in Automated Program Repair and Code Generation [0.0]
27 recent papers have been reviewed and split into two groups.
The first group consists of new methods for bug detection and repair, which include locating semantic errors.
The second group dwells on code generation, providing an overview of both general-purpose LLMs fine-tuned for programming and task-specific models.
It also presents methods to improve code generation, such as identifier-aware training, fine-tuning at the instruction level, and incorporating semantic code structures.
arXiv Detail & Related papers (2024-11-12T06:47:54Z) - Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs [64.83462841029089]
We introduce an efficient merging-based alignment method called textscMergeAlign that interpolates the domain and alignment vectors, creating safer domain-specific models.
We apply textscMergeAlign on Llama3 variants that are experts in medicine and finance, obtaining substantial alignment improvements with minimal to no degradation on domain-specific benchmarks.
arXiv Detail & Related papers (2024-11-11T09:32:20Z) - AutoSafeCoder: A Multi-Agent Framework for Securing LLM Code Generation through Static Analysis and Fuzz Testing [6.334110674473677]
Existing approaches often rely on a single agent for code generation, which struggles to produce secure, vulnerability-free code.
We propose AutoSafeCoder, a multi-agent framework that leverages LLM-driven agents for code generation, vulnerability analysis, and security enhancement through continuous collaboration.
Our contribution focuses on ensuring the safety of multi-agent code generation by integrating dynamic and static testing in an iterative process during code generation.
arXiv Detail & Related papers (2024-09-16T21:15:56Z) - Hacking, The Lazy Way: LLM Augmented Pentesting [0.0]
"LLM Augmented Pentesting" is demonstrated through a tool named "Pentest Copilot"
Our research includes a "chain of thought" mechanism to streamline token usage and boost performance.
We propose a novel file analysis approach, enabling LLMs to understand files.
arXiv Detail & Related papers (2024-09-14T17:40:35Z) - Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval [20.959848710829878]
Large language models (LLMs) have brought significant advancements to code generation and code repair.
However, their training using unsanitized data from open-source repositories, like GitHub, raises the risk of inadvertently propagating security vulnerabilities.
We aim to present a comprehensive study aimed at precisely evaluating and enhancing the security aspects of code LLMs.
arXiv Detail & Related papers (2024-07-02T16:13:21Z) - Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs)
The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation.
We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z) - INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness [110.6921470281479]
We introduce INDICT: a new framework that empowers large language models with Internal Dialogues of Critiques for both safety and helpfulness guidance.
The internal dialogue is a dual cooperative system between a safety-driven critic and a helpfulness-driven critic.
We observed that our approach can provide an advanced level of critiques of both safety and helpfulness analysis, significantly improving the quality of output codes.
arXiv Detail & Related papers (2024-06-23T15:55:07Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.