LLMs in Coding and their Impact on the Commercial Software Engineering Landscape
- URL: http://arxiv.org/abs/2506.16653v1
- Date: Thu, 19 Jun 2025 23:43:54 GMT
- Title: LLMs in Coding and their Impact on the Commercial Software Engineering Landscape
- Authors: Vladislav Belozerov, Peter J Barclay, Askhan Sami,
- Abstract summary: Large-language-model coding tools are now mainstream in software engineering.<n>But as these same tools move human effort up the development stack, they present fresh dangers.<n>We argue that firms must tag and review every AI-generated line of code.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-language-model coding tools are now mainstream in software engineering. But as these same tools move human effort up the development stack, they present fresh dangers: 10% of real prompts leak private data, 42% of generated snippets hide security flaws, and the models can even ``agree'' with wrong ideas, a trait called sycophancy. We argue that firms must tag and review every AI-generated line of code, keep prompts and outputs inside private or on-premises deployments, obey emerging safety regulations, and add tests that catch sycophantic answers -- so they can gain speed without losing security and accuracy.
Related papers
- Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model [60.60587869092729]
Large language models (LLMs) are increasingly used in software development, yet their tendency to generate insecure code remains a major barrier to real-world deployment.<n>We propose SecCoderX, an online reinforcement learning framework for functionality-preserving secure code generation.
arXiv Detail & Related papers (2026-02-07T07:42:07Z) - Is Vibe Coding Safe? Benchmarking Vulnerability of Agent-Generated Code in Real-World Tasks [22.499464760561434]
Vibe coding is a new programming paradigm in which human engineers instruct large language model (LLM) agents to complete complex coding tasks with little supervision.<n>We present a benchmark consisting of 200 feature-request software engineering tasks from real-world open-source projects, which, when given to human programmers, led to vulnerable implementations.<n>Our findings raise serious concerns about the widespread adoption of vibe-coding, particularly in security-sensitive applications.
arXiv Detail & Related papers (2025-12-02T22:11:56Z) - Are AI-Generated Fixes Secure? Analyzing LLM and Agent Patches on SWE-bench [9.229310642804036]
We present the first large-scale security analysis of LLM-generated patches using 20,000+ issues from the SWE-bench dataset.<n>We evaluate patches produced by a standalone LLM (Llama 3.3) and compare them to developer-written patches.<n>We also assess the security of patches generated by three top-performing agentic frameworks (OpenHands, AutoCodeRover, HoneyComb) on a subset of our data.
arXiv Detail & Related papers (2025-06-30T21:10:19Z) - Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z) - SOK: Exploring Hallucinations and Security Risks in AI-Assisted Software Development with Insights for LLM Deployment [0.0]
Large Language Models (LLMs) such as GitHub Copilot, ChatGPT, Cursor AI, and Codeium AI have revolutionized the coding landscape.<n>This paper provides a comprehensive analysis of the benefits and risks associated with AI-powered coding tools.
arXiv Detail & Related papers (2025-01-31T06:00:27Z) - Helping LLMs Improve Code Generation Using Feedback from Testing and Static Analysis [3.892345568697058]
Large Language Models (LLMs) are one of the most promising developments in the field of artificial intelligence.<n>Developers routinely ask LLMs to generate code snippets, increasing productivity but also introducing ownership, privacy, correctness, and security issues.<n>Previous work highlighted how code generated by commercial LLMs is often not safe, containing vulnerabilities, bugs, and code smells.
arXiv Detail & Related papers (2024-12-19T13:34:14Z) - RedCode: Risky Code Execution and Generation Benchmark for Code Agents [50.81206098588923]
RedCode is a benchmark for risky code execution and generation.
RedCode-Exec provides challenging prompts that could lead to risky code execution.
RedCode-Gen provides 160 prompts with function signatures and docstrings as input to assess whether code agents will follow instructions.
arXiv Detail & Related papers (2024-11-12T13:30:06Z) - How Well Do Large Language Models Serve as End-to-End Secure Code Agents for Python? [42.119319820752324]
We studied GPT-3.5 and GPT-4's capability to identify and repair vulnerabilities in the code generated by four popular LLMs.<n>By manually or automatically reviewing 4,900 pieces of code, our study reveals that large language models lack awareness of scenario-relevant security risks.<n>To address the limitation of a single round of repair, we developed a lightweight tool that prompts LLMs to construct safer source code.
arXiv Detail & Related papers (2024-08-20T02:42:29Z) - ShadowCode: Towards (Automatic) External Prompt Injection Attack against Code LLMs [56.46702494338318]
This paper introduces a new attack paradigm: (automatic) external prompt injection against code-oriented large language models.<n>We propose ShadowCode, a simple yet effective method that automatically generates induced perturbations based on code simulation.<n>We evaluate our method across 13 distinct malicious objectives, generating 31 threat cases spanning three popular programming languages.
arXiv Detail & Related papers (2024-07-12T10:59:32Z) - CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion [117.178835165855]
This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs.
Our studies reveal a new and universal safety vulnerability of these models against code input.
We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization.
arXiv Detail & Related papers (2024-03-12T17:55:38Z) - Assured LLM-Based Software Engineering [51.003878077888686]
This paper is an outline of the content of the keynote by Mark Harman at the International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering, Monday 15th April 2024, Lisbon, Portugal.
arXiv Detail & Related papers (2024-02-06T20:38:46Z) - LLM-Powered Code Vulnerability Repair with Reinforcement Learning and
Semantic Reward [3.729516018513228]
We introduce a multipurpose code vulnerability analysis system textttSecRepair, powered by a large language model, CodeGen2.
Inspired by how humans fix code issues, we propose an instruction-based dataset suitable for vulnerability analysis with LLMs.
We identify zero-day and N-day vulnerabilities in 6 Open Source IoT Operating Systems on GitHub.
arXiv Detail & Related papers (2024-01-07T02:46:39Z) - Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code.
We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
arXiv Detail & Related papers (2023-02-14T18:43:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.