Related papers: How Do Semantically Equivalent Code Transformations Impact Membership Inference on LLMs for Code?

How Do Semantically Equivalent Code Transformations Impact Membership Inference on LLMs for Code?

URL: http://arxiv.org/abs/2512.15468v1
Date: Wed, 17 Dec 2025 14:12:54 GMT
Title: How Do Semantically Equivalent Code Transformations Impact Membership Inference on LLMs for Code?
Authors: Hua Yang, Alejandro Velasco, Thanh Le-Cong, Md Nazmul Haque, Bowen Xu, Denys Poshyvanyk,
Abstract summary: We investigate whether semantically equivalent code transformation rules might be leveraged to evade MI detection.<n>We find that model accuracy drops by only 1.5% in the worst case for each rule.<n>Our results expose a critical loophole in license compliance enforcement for training large language models for code.
Score: 56.42119949944239
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The success of large language models for code relies on vast amounts of code data, including public open-source repositories, such as GitHub, and private, confidential code from companies. This raises concerns about intellectual property compliance and the potential unauthorized use of license-restricted code. While membership inference (MI) techniques have been proposed to detect such unauthorized usage, their effectiveness can be undermined by semantically equivalent code transformation techniques, which modify code syntax while preserving semantic. In this work, we systematically investigate whether semantically equivalent code transformation rules might be leveraged to evade MI detection. The results reveal that model accuracy drops by only 1.5% in the worst case for each rule, demonstrating that transformed datasets can effectively serve as substitutes for fine-tuning. Additionally, we find that one of the rules (RenameVariable) reduces MI success by 10.19%, highlighting its potential to obscure the presence of restricted code. To validate these findings, we conduct a causal analysis confirming that variable renaming has the strongest causal effect in disrupting MI detection. Notably, we find that combining multiple transformations does not further reduce MI effectiveness. Our results expose a critical loophole in license compliance enforcement for training large language models for code, showing that MI detection can be substantially weakened by transformation-based obfuscation techniques.

Related papers

Uncovering Pretraining Code in LLMs: A Syntax-Aware Attribution Approach [20.775027150345107]
Open-source code, often protected by open source licenses, poses legal and ethical challenges when used in pretraining.<n>We propose SynPrune, a syntax-pruned membership inference attack method tailored for code.
arXiv Detail & Related papers (2025-11-10T12:29:09Z)
Zero-Shot Detection of LLM-Generated Code via Approximated Task Conditioning [8.571111167616165]
Large Language Model (LLM)-generated code is a growing challenge with implications for security, intellectual property, and academic integrity.<n>We investigate the role of conditional probability distributions in improving zero-shot LLM-generated code detection.<n>We propose a novel zero-shot detection approach that approximates the original task used to generate a given code snippet.
arXiv Detail & Related papers (2025-06-06T13:23:37Z)
Simplicity by Obfuscation: Evaluating LLM-Driven Code Transformation with Semantic Elasticity [4.458584890504334]
Code obfuscation aims to prevent reverse engineering and intellectual property theft.<n>The recent development of large language models paves the way for practical applications in different domains.<n>This work performs an empirical study on the ability of LLMs to obfuscate Python source code.
arXiv Detail & Related papers (2025-04-18T18:29:23Z)
ObscuraCoder: Powering Efficient Code LM Pre-Training Via Obfuscation Grounding [60.37988508851391]
Language models (LMs) have become a staple of the code-writing toolbox.<n>Research exploring modifications to Code-LMs' pre-training objectives, geared towards improving data efficiency and better disentangling between syntax and semantics, has been noticeably sparse.<n>In this work, we examine grounding on obfuscated code as a means of helping Code-LMs look beyond the surface-form syntax and enhance their pre-training sample efficiency.
arXiv Detail & Related papers (2025-03-27T23:08:53Z)
Memorize or Generalize? Evaluating LLM Code Generation with Code Rewriting [54.48306552577881]
We argue that large language models (LLMs) are mostly doing memorization (i.e., replicating or reusing large parts of their training data) versus generalization.<n>Existing evaluations largely proxy neglecting surface/structural similarity, thereby conflating benign reuse of repeated code with harmful recall and memorization task correctness.<n>We propose Memorization Risk Index (MRI), a normalized score that combines two signals: (i) how similar the model's answer for the rewritten task is to the original ground-truth solution, and (ii) how much performance drops from the original task to its rewritten counterpart.
arXiv Detail & Related papers (2025-03-04T05:39:24Z)
ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation [57.604506522287814]
Existing large language models (LLMs) only learn the contextual semantics of code during pre-training.<n>We propose ExeCoder to utilize executability representations such as functional semantics, syntax structures, and variable dependencies.<n>We show that ExeCoder achieves state-of-the-art performance in code translation, surpassing existing open-source code LLMs by over 10.88% to 38.78% and over 27.44% to 42.97% on two metrics.
arXiv Detail & Related papers (2025-01-30T16:18:52Z)
Fine-Tuning LLMs for Code Mutation: A New Era of Cyber Threats [0.9208007322096533]
This paper explores the application of Large Language Models in the context of code mutation. Traditionally, code mutation has been employed to increase software robustness in mission-critical applications. We propose a novel definition of code mutation training tailored for pre-trained LLM-based code synthesizers.
arXiv Detail & Related papers (2024-10-29T17:43:06Z)
Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs [65.2379940117181]
We introduce code prompting, a chain of prompts that transforms a natural language problem into code. We find that code prompting exhibits a high-performance boost for multiple LLMs. Our analysis of GPT 3.5 reveals that the code formatting of the input problem is essential for performance improvement.
arXiv Detail & Related papers (2024-01-18T15:32:24Z)
Zero-Shot Detection of Machine-Generated Codes [83.0342513054389]
This work proposes a training-free approach for the detection of LLMs-generated codes. We find that existing training-based or zero-shot text detectors are ineffective in detecting code. Our method exhibits robustness against revision attacks and generalizes well to Java codes.
arXiv Detail & Related papers (2023-10-08T10:08:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.