Related papers: Teaching an Old LLM Secure Coding: Localized Preference Optimization on Distilled Preferences

Teaching an Old LLM Secure Coding: Localized Preference Optimization on Distilled Preferences

URL: http://arxiv.org/abs/2506.00419v1
Date: Sat, 31 May 2025 06:48:12 GMT
Title: Teaching an Old LLM Secure Coding: Localized Preference Optimization on Distilled Preferences
Authors: Mohammad Saqib, Saikat Chakraborty, Santu Karmaker, Niranjan Balasubramanian,
Abstract summary: We address two key challenges in improving secure code generation.<n>First, obtaining high quality training data covering a broad set of security issues is critical.<n>Second, aligning models to secure code requires focusing on localized regions of code.
Score: 19.791588510105687
License: http://creativecommons.org/licenses/by/4.0/
Abstract: LLM generated code often contains security issues. We address two key challenges in improving secure code generation. First, obtaining high quality training data covering a broad set of security issues is critical. To address this, we introduce a method for distilling a preference dataset of insecure and secure code pairs from frontier LLMs, along with a security reasoning that explains the issues and the fix. The key idea here is to make use of security knowledge sources to devise a systematic prompting strategy that ensures broad coverage. Second, aligning models to secure code requires focusing on localized regions of code. Direct preference optimization methods, like SimPO, are not designed to handle these localized differences and turn out to be ineffective. We address this with a new localized preference optimization algorithm that masks the security related tokens in both the winning (secure) and losing (insecure) responses. To prevent loss in code quality, we also add a regularizer. Evaluations show that both training on our dataset, DiSCo, and the new preference optimization algorithm, LPO, yield substantial reductions in code insecurity while also improving overall code quality. Code and dataset are available at https://github.com/StonyBrookNLP/disco-lpo.

Related papers

Improving LLM Safety Alignment with Dual-Objective Optimization [65.41451412400609]
Existing training-time safety alignment techniques for large language models (LLMs) remain vulnerable to jailbreak attacks.<n>We propose an improved safety alignment that disentangles DPO objectives into two components: (1) robust refusal training, which encourages refusal even when partial unsafe generations are produced, and (2) targeted unlearning of harmful knowledge.
arXiv Detail & Related papers (2025-03-05T18:01:05Z)
ProSec: Fortifying Code LLMs with Proactive Security Alignment [14.907702430331803]
Code-specific large language models (LLMs) may introduce vulnerabilities into real-world systems.<n>We propose ProSec, a novel security alignment approach designed to align code LLMs with secure coding practices.<n> Experiments show that models trained with ProSec are 25.2% to 91.4% more secure compared to previous work.
arXiv Detail & Related papers (2024-11-19T22:00:01Z)
CodeCipher: Learning to Obfuscate Source Code Against LLMs [5.872773591957006]
We propose CodeCipher, a novel method that perturbs privacy from code while preserving the original response from LLMs. CodeCipher transforms the LLM's embedding matrix so that each row corresponds to a different word in the original matrix, forming a token-to-token confusion mapping for obfuscating source code. Results show that our model successfully confuses the privacy in source code while preserving the original LLM's performance.
arXiv Detail & Related papers (2024-10-08T08:28:54Z)
CodeDPO: Aligning Code Models with Self Generated and Verified Source Code [52.70310361822519]
We propose CodeDPO, a framework that integrates preference learning into code generation to improve two key code preference factors: code correctness and efficiency.<n>CodeDPO employs a novel dataset construction method, utilizing a self-generation-and-validation mechanism that simultaneously generates and evaluates code and test cases.
arXiv Detail & Related papers (2024-10-08T01:36:15Z)
HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data [60.75578581719921]
Large language models (LLMs) have shown great potential for automatic code generation. Recent studies highlight that many LLM-generated code contains serious security vulnerabilities. We introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes.
arXiv Detail & Related papers (2024-09-10T12:01:43Z)
An Exploratory Study on Fine-Tuning Large Language Models for Secure Code Generation [17.69409515806874]
We present an exploratory study on whether fine-tuning pre-trained LLMs on datasets of vulnerability-fixing commits can promote secure code generation. We crawled a fine-tuning dataset for secure code generation by collecting code fixes of confirmed vulnerabilities from open-source repositories. Our exploration reveals that fine-tuning LLMs can improve secure code generation by 6.4% in C language and 5.4% in C++ language.
arXiv Detail & Related papers (2024-08-17T02:51:27Z)
Constrained Decoding for Secure Code Generation [9.007821185927277]
This paper introduces a new benchmark, CodeGuard+, to measure Code LLMs' ability to generate both secure and correct code. We show that the state-of-the-art defense technique, prefix tuning, may not be as strong as previously believed, since it generates secure code but sacrifices functional correctness. We propose new constrained decoding techniques to generate secure code.
arXiv Detail & Related papers (2024-04-30T21:52:19Z)
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion [117.178835165855]
This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs. Our studies reveal a new and universal safety vulnerability of these models against code input. We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization.
arXiv Detail & Related papers (2024-03-12T17:55:38Z)
Pre-trained Encoders in Self-Supervised Learning Improve Secure and Privacy-preserving Supervised Learning [63.45532264721498]
Self-supervised learning is an emerging technique to pre-train encoders using unlabeled data. We perform first systematic, principled measurement study to understand whether and when a pretrained encoder can address the limitations of secure or privacy-preserving supervised learning algorithms.
arXiv Detail & Related papers (2022-12-06T21:35:35Z)
Safeguarded Learned Convex Optimization [106.81731132086851]
Analytic optimization algorithms can be hand-designed to provably solve problems in an iterative fashion. Data-driven algorithms can "learn to optimize" (L2O) with much fewer iterations and similar cost per iteration as general-purpose optimization algorithms. We present a Safe-L2O framework to fuse the advantages of these approaches.
arXiv Detail & Related papers (2020-03-04T04:01:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.