GoodVibe: Security-by-Vibe for LLM-Based Code Generation
- URL: http://arxiv.org/abs/2602.10778v1
- Date: Wed, 11 Feb 2026 12:10:14 GMT
- Title: GoodVibe: Security-by-Vibe for LLM-Based Code Generation
- Authors: Maximilian Thang, Lichao Wu, Sasha Behrouzi, Mohamadreza Rostami, Jona te Lintelo, Stjepan Picek, Ahmad-Reza Sadeghi,
- Abstract summary: We present GoodVibe, a neuron-level framework for improving the security of code language models by default.<n>GoodVibe is based on the key insight that security-relevant reasoning is localized to a small subset of neurons.<n>We evaluate GoodVibe on six LLMs across security-critical programming languages, including C++, Java, Swift, and Go.
- Score: 21.862657015152365
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are increasingly used for code generation in fast, informal development workflows, often referred to as vibe coding, where speed and convenience are prioritized, and security requirements are rarely made explicit. In this setting, models frequently produce functionally correct but insecure code, creating a growing security risk. Existing approaches to improving code security rely on full-parameter fine-tuning or parameter-efficient adaptations, which are either costly and prone to catastrophic forgetting or operate at coarse granularity with limited interpretability and control. We present GoodVibe, a neuron-level framework for improving the security of code language models by default. GoodVibe is based on the key insight that security-relevant reasoning is localized to a small subset of neurons. We identify these neurons using gradient-based attribution from a supervised security task and perform neuron-selective fine-tuning that updates only this security-critical subspace. To further reduce training cost, we introduce activation-driven neuron clustering, enabling structured updates with minimal overhead. We evaluate GoodVibe on six LLMs across security-critical programming languages, including C++, Java, Swift, and Go. GoodVibe substantially improves the security of generated code while preserving general model utility, achieving up to a 2.5x improvement over base models, matching or exceeding full fine-tuning with over 4,700x fewer trainable parameters, and reducing training computation by more than 3.6x compared to the parameter-efficient baseline (LoRA). Our results demonstrate that neuron-level optimization offers an effective and scalable approach to securing code generation without sacrificing efficiency or generality.
Related papers
- Learning to Generate Secure Code via Token-Level Rewards [11.539519023515021]
Large language models (LLMs) have demonstrated strong capabilities in code generation, yet they remain prone to producing security vulnerabilities.<n>We propose Vul2Safe, a new secure code generation framework that leverages self-reflection to construct high-confidence repair pairs from real-world vulnerabilities.<n>We also introduce SRCode, a novel training framework that pioneers the use of token-level rewards in reinforcement learning for code security.
arXiv Detail & Related papers (2026-02-26T12:57:27Z) - SafeNeuron: Neuron-Level Safety Alignment for Large Language Models [71.50117566279185]
We propose SafeNeuron, a neuron-level safety alignment framework that improves robustness by redistributing safety representations across the network.<n>In experiments, SafeNeuron significantly improves robustness against neuron pruning attacks, reduces the risk of open-source models being repurposed as red-team generators, and preserves general capabilities.
arXiv Detail & Related papers (2026-02-12T16:40:05Z) - SecCodePRM: A Process Reward Model for Code Security [18.20834502693226]
SecCodePRM is a process reward model that assigns a context-aware, step-level security score along a code trajectory.<n>It has three applications: full-code vulnerability detection (VD), partial-code VD, and secure code generation (CG)
arXiv Detail & Related papers (2026-02-11T02:00:19Z) - Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model [60.60587869092729]
Large language models (LLMs) are increasingly used in software development, yet their tendency to generate insecure code remains a major barrier to real-world deployment.<n>We propose SecCoderX, an online reinforcement learning framework for functionality-preserving secure code generation.
arXiv Detail & Related papers (2026-02-07T07:42:07Z) - Light Alignment Improves LLM Safety via Model Self-Reflection with a Single Neuron [28.570811283705556]
Safety of large language models (LLMs) has increasingly emerged as a fundamental aspect of their development.<n>Existing safety alignment for LLMs is predominantly achieved through post-training methods.<n>We propose a safety-aware decoding method that requires only low-cost training of an expert model and employs a single neuron as a gating mechanism.
arXiv Detail & Related papers (2026-02-02T12:21:54Z) - Rethinking Safety in LLM Fine-tuning: An Optimization Perspective [56.31306558218838]
We show that poor optimization choices, rather than inherent trade-offs, often cause safety problems, measured as harmful responses to adversarial prompts.<n>We propose a simple exponential moving average (EMA) momentum technique in parameter space that preserves safety performance.<n>Our experiments on the Llama families across multiple datasets demonstrate that safety problems can largely be avoided without specialized interventions.
arXiv Detail & Related papers (2025-08-17T23:46:36Z) - Fine-Grained Safety Neurons with Training-Free Continual Projection to Reduce LLM Fine Tuning Risks [22.059668583508365]
We propose the Fine-Grained Safety Neurons (FGSN) with Training-Free Continual Projection method to reduce the fine-tuning safety risks.<n>FGSN inherently integrates the multi-scale interactions between safety layers and neurons, localizing sparser and more precise fine-grained safety neurons.
arXiv Detail & Related papers (2025-08-08T03:20:25Z) - LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning [61.594212398272184]
Low-Rank Extrapolation (LoX) improves robustness against benign and malicious fine-tuning attacks.<n>LoX leads to 11% to 54% absolute reductions in attack success rates.
arXiv Detail & Related papers (2025-06-18T16:30:02Z) - NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning [37.024666077902225]
A handful of malicious data uploaded by users can subtly manipulate the finetuning process, resulting in an alignment-broken model.<n>Existing methods to counteract fine-tuning attacks typically require substantial computational resources.<n>We propose textbfNeuron-textbfLevel textbfSafety textbfRealignment.
arXiv Detail & Related papers (2024-12-17T02:59:04Z) - HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data [60.75578581719921]
Large language models (LLMs) have shown great potential for automatic code generation.
Recent studies highlight that many LLM-generated code contains serious security vulnerabilities.
We introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes.
arXiv Detail & Related papers (2024-09-10T12:01:43Z) - RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content [62.685566387625975]
Current mitigation strategies, while effective, are not resilient under adversarial attacks.
This paper introduces Resilient Guardrails for Large Language Models (RigorLLM), a novel framework designed to efficiently moderate harmful and unsafe inputs.
arXiv Detail & Related papers (2024-03-19T07:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.