Related papers: Syntax-Guided Program Reduction for Understanding Neural Code Intelligence Models

Syntax-Guided Program Reduction for Understanding Neural Code Intelligence Models

URL: http://arxiv.org/abs/2205.14374v1
Date: Sat, 28 May 2022 09:04:57 GMT
Title: Syntax-Guided Program Reduction for Understanding Neural Code Intelligence Models
Authors: Md Rafiqul Islam Rabin, Aftab Hussain, Mohammad Amin Alipour
Abstract summary: We show that a syntax-guided program reduction technique is faster and provides smaller sets of key tokens in reduced programs. We also show that the key tokens could be used in generating adversarial examples for up to 65% of the input programs.
Score: 1.1924369482115011
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural code intelligence (CI) models are opaque black-boxes and offer little insight on the features they use in making predictions. This opacity may lead to distrust in their prediction and hamper their wider adoption in safety-critical applications. Recently, input program reduction techniques have been proposed to identify key features in the input programs to improve the transparency of CI models. However, this approach is syntax-unaware and does not consider the grammar of the programming language. In this paper, we apply a syntax-guided program reduction technique that considers the grammar of the input programs during reduction. Our experiments on multiple models across different types of input programs show that the syntax-guided program reduction technique is faster and provides smaller sets of key tokens in reduced programs. We also show that the key tokens could be used in generating adversarial examples for up to 65% of the input programs.

Related papers

Unexpected but informative: What fixation-related potentials tell us about the processing of confusing program code [15.510640091254887]
We analyze the online processing of program code patterns that are ambiguous to programmers, but not the computer. Relative to unambiguous counterparts in program code, atoms of confusion elicit a late frontal positivity with a duration of about 400 to 700 ms. We take these data to suggest that the brain engages similar neurocognitive mechanisms in response to unexpected and informative inputs in program code and in natural language.
arXiv Detail & Related papers (2024-12-13T12:38:10Z)
Learning Program Behavioral Models from Synthesized Input-Output Pairs [70.9524884086882]
We introduce Modelizer, a framework that learns a model from its input/output behavior using neural machine translation algorithms. Modelizer mocks the original program, achieving up to 95.4% accuracy and a BLEU score of 0.98 with standard error 0.04 in mocking real-world applications. We foresee several applications of these models, especially as the output of the program can be any aspect of program behavior.
arXiv Detail & Related papers (2024-07-11T15:25:02Z)
Extracting Label-specific Key Input Features for Neural Code Intelligence Models [0.0]
Code intelligence (CI) models are often black-box and do not offer insights on the input features that they learn for making correct predictions. In this paper, we apply a syntax-guided program reduction technique that follows the syntax of input programs during reduction.
arXiv Detail & Related papers (2022-02-14T03:36:35Z)
Encoding Program as Image: Evaluating Visual Representation of Source Code [2.1016374925364616]
We investigate Code2Snapshot, a novel representation of the source code based on the snapshots of input programs. We compare its performance with state-of-the-art representations that utilize the rich syntactic and semantic features of input programs.
arXiv Detail & Related papers (2021-11-01T17:07:02Z)
Tea: Program Repair Using Neural Network Based on Program Information Attention Matrix [14.596847020236657]
We propose a unified representation to capture the syntax, data flow, and control flow aspects of software programs. We then devise a method to use such a representation to guide the transformer model from NLP in better understanding and fixing buggy programs.
arXiv Detail & Related papers (2021-07-17T15:49:22Z)
Enforcing Consistency in Weakly Supervised Semantic Parsing [68.2211621631765]
We explore the use of consistency between the output programs for related inputs to reduce the impact of spurious programs. We find that a more consistent formalism leads to improved model performance even without consistency-based training.
arXiv Detail & Related papers (2021-07-13T03:48:04Z)
Latent Execution for Neural Program Synthesis Beyond Domain-Specific Languages [97.58968222942173]
We take the first step to synthesize C programs from input-output examples. In particular, we propose La Synth, which learns the latent representation to approximate the execution of partially generated programs. We show that training on these synthesized programs further improves the prediction performance for both Karel and C program synthesis.
arXiv Detail & Related papers (2021-06-29T02:21:32Z)
Improving Compositionality of Neural Networks by Decoding Representations to Inputs [83.97012077202882]
We bridge the benefits of traditional and deep learning programs by jointly training a generative model to constrain neural network activations to "decode" back to inputs. We demonstrate applications of decodable representations to out-of-distribution detection, adversarial examples, calibration, and fairness.
arXiv Detail & Related papers (2021-06-01T20:07:16Z)
How could Neural Networks understand Programs? [67.4217527949013]
It is difficult to build a model to better understand programs, by either directly applying off-the-shelf NLP pre-training techniques to the source code, or adding features to the model by theshelf. We propose a novel program semantics learning paradigm, that the model should learn from information composed of (1) the representations which align well with the fundamental operations in operational semantics, and (2) the information of environment transition.
arXiv Detail & Related papers (2021-05-10T12:21:42Z)
Representing Partial Programs with Blended Abstract Semantics [62.20775388513027]
We introduce a technique for representing partially written programs in a program synthesis engine. We learn an approximate execution model implemented as a modular neural network. We show that these hybrid neuro-symbolic representations enable execution-guided synthesizers to use more powerful language constructs.
arXiv Detail & Related papers (2020-12-23T20:40:18Z)
Latent Programmer: Discrete Latent Codes for Program Synthesis [56.37993487589351]
In many sequence learning tasks, such as program synthesis and document summarization, a key problem is searching over a large space of possible output sequences. We propose to learn representations of the outputs that are specifically meant for search: rich enough to specify the desired output but compact enough to make search more efficient. We introduce the emphLatent Programmer, a program synthesis method that first predicts a discrete latent code from input/output examples, and then generates the program in the target language.
arXiv Detail & Related papers (2020-12-01T10:11:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.