Related papers: On Adversarial Robustness of Synthetic Code Generation

On Adversarial Robustness of Synthetic Code Generation

URL: http://arxiv.org/abs/2106.11629v1
Date: Tue, 22 Jun 2021 09:37:48 GMT
Title: On Adversarial Robustness of Synthetic Code Generation
Authors: Mrinal Anand, Pratik Kayal and Mayank Singh
Abstract summary: This paper showcases the existence of significant dataset bias through different classes of adversarial examples. We propose several dataset augmentation techniques to reduce bias and showcase their efficacy.
Score: 1.2559148369195197
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automatic code synthesis from natural language descriptions is a challenging task. We witness massive progress in developing code generation systems for domain-specific languages (DSLs) employing sequence-to-sequence deep learning techniques in the recent past. In this paper, we specifically experiment with \textsc{AlgoLisp} DSL-based generative models and showcase the existence of significant dataset bias through different classes of adversarial examples. We also experiment with two variants of Transformer-based models that outperform all existing \textsc{AlgoLisp} DSL-based code generation baselines. Consistent with the current state-of-the-art systems, our proposed models, too, achieve poor performance under adversarial settings. Therefore, we propose several dataset augmentation techniques to reduce bias and showcase their efficacy using robust experimentation.

Related papers

Large EEG-U-Transformer for Time-Step Level Detection Without Pre-Training [1.3254304182988286]
We propose a simple U-shaped model to efficiently learn representations by capturing both local and global features.<n>Compared to other window-level classification models, our method directly outputs predictions at the time-step level.<n>Our model won 1st place in the 2025 "seizure detection challenge" organized in the International Conference on Artificial Intelligence in Epilepsy and Other Neurological Disorders.
arXiv Detail & Related papers (2025-04-01T01:33:42Z)
Multi-Scales Data Augmentation Approach In Natural Language Inference For Artifacts Mitigation And Pre-Trained Model Optimization [0.0]
We provide a variety of techniques for analyzing and locating dataset artifacts inside the crowdsourced Stanford Natural Language Inference corpus. To mitigate dataset artifacts, we employ a unique multi-scale data augmentation technique with two distinct frameworks. Our combination method enhances our model's resistance to perturbation testing, enabling it to continuously outperform the pre-trained baseline.
arXiv Detail & Related papers (2022-12-16T23:37:44Z)
Mutual Exclusivity Training and Primitive Augmentation to Induce Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models. We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples. We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z)
A Simple, Yet Effective Approach to Finding Biases in Code Generation [16.094062131137722]
This work shows that current code generation systems exhibit undesired biases inherited from their large language model backbones. We propose the "block of influence" concept, which enables a modular decomposition and analysis of the coding challenges.
arXiv Detail & Related papers (2022-10-31T15:06:15Z)
DORE: Document Ordered Relation Extraction based on Generative Framework [56.537386636819626]
This paper investigates the root cause of the underwhelming performance of the existing generative DocRE models. We propose to generate a symbolic and ordered sequence from the relation matrix which is deterministic and easier for model to learn. Experimental results on four datasets show that our proposed method can improve the performance of the generative DocRE models.
arXiv Detail & Related papers (2022-10-28T11:18:10Z)
Multi-Modal Experience Inspired AI Creation [33.34566822058209]
We study how to generate texts based on sequential multi-modal information. We firstly design a multi-channel sequence-to-sequence architecture equipped with a multi-modal attention network. We then propose a curriculum negative sampling strategy tailored for the sequential inputs.
arXiv Detail & Related papers (2022-09-02T11:50:41Z)
Twist Decoding: Diverse Generators Guide Each Other [116.20780037268801]
We introduce Twist decoding, a simple and general inference algorithm that generates text while benefiting from diverse models. Our method does not assume the vocabulary, tokenization or even generation order is shared.
arXiv Detail & Related papers (2022-05-19T01:27:53Z)
$\textit{latent}$-GLAT: Glancing at Latent Variables for Parallel Text Generation [65.29170569821093]
parallel text generation has received widespread attention due to its success in generation efficiency. In this paper, we propose $textitlatent$-GLAT, which employs the discrete latent variables to capture word categorical information. Experiment results show that our method outperforms strong baselines without the help of an autoregressive model.
arXiv Detail & Related papers (2022-04-05T07:34:12Z)
Multiscale Generative Models: Improving Performance of a Generative Model Using Feedback from Other Dependent Generative Models [10.053377705165786]
We take a first step towards building interacting generative models (GANs) that reflects the interaction in real world. We build and analyze a hierarchical set-up where a higher-level GAN is conditioned on the output of multiple lower-level GANs. We present a technique of using feedback from the higher-level GAN to improve performance of lower-level GANs.
arXiv Detail & Related papers (2022-01-24T13:05:56Z)
SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation. Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z)
Recent Developments Combining Ensemble Smoother and Deep Generative Networks for Facies History Matching [58.720142291102135]
This research project focuses on the use of autoencoders networks to construct a continuous parameterization for facies models. We benchmark seven different formulations, including VAE, generative adversarial network (GAN), Wasserstein GAN, variational auto-encoding GAN, principal component analysis (PCA) with cycle GAN, PCA with transfer style network and VAE with style loss.
arXiv Detail & Related papers (2020-05-08T21:32:42Z)
POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation. The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner. The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.