Extracting Label-specific Key Input Features for Neural Code
Intelligence Models
- URL: http://arxiv.org/abs/2202.06474v1
- Date: Mon, 14 Feb 2022 03:36:35 GMT
- Title: Extracting Label-specific Key Input Features for Neural Code
Intelligence Models
- Authors: Md Rafiqul Islam Rabin
- Abstract summary: Code intelligence (CI) models are often black-box and do not offer insights on the input features that they learn for making correct predictions.
In this paper, we apply a syntax-guided program reduction technique that follows the syntax of input programs during reduction.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The code intelligence (CI) models are often black-box and do not offer any
insights on the input features that they learn for making correct predictions.
This opacity may lead to distrust in their prediction and hamper their wider
adoption in safety-critical applications. In recent, the program reduction
technique is widely being used to identify key input features in order to
explain the prediction of CI models. The approach removes irrelevant parts from
an input program and keeps the minimal snippets that a CI model needs to
maintain its prediction. However, the state-of-the-art approaches mainly use a
syntax-unaware program reduction technique that does not follow the syntax of
programs, which adds significant overhead to the reduction of input programs
and explainability of models. In this paper, we apply a syntax-guided program
reduction technique that follows the syntax of input programs during reduction.
Our experiments on multiple models across different types of input programs
show that the syntax-guided program reduction technique significantly
outperforms the syntax-unaware program reduction technique in reducing the size
of input programs. Extracting key input features from reduced programs reveals
that the syntax-guided reduced programs contain more label-specific key input
features and are more vulnerable to adversarial transformation when renaming
the key tokens in programs. These label-specific key input features may help to
understand the reasoning of models' prediction from different perspectives and
increase the trustworthiness to correct classification given by CI models.
Related papers
- Generative Input: Towards Next-Generation Input Methods Paradigm [49.98958865125018]
We propose a novel Generative Input paradigm named GeneInput.
It uses prompts to handle all input scenarios and other intelligent auxiliary input functions, optimizing the model with user feedback to deliver personalized results.
The results demonstrate that we have achieved state-of-the-art performance for the first time in the Full-mode Key-sequence to Characters(FK2C) task.
arXiv Detail & Related papers (2023-11-02T12:01:29Z) - Improving Input-label Mapping with Demonstration Replay for In-context
Learning [67.57288926736923]
In-context learning (ICL) is an emerging capability of large autoregressive language models.
We propose a novel ICL method called Sliding Causal Attention (RdSca)
We show that our method significantly improves the input-label mapping in ICL demonstrations.
arXiv Detail & Related papers (2023-10-30T14:29:41Z) - PERFOGRAPH: A Numerical Aware Program Graph Representation for
Performance Optimization and Program Analysis [12.778336318809092]
A key challenge in adopting the latest machine learning methods is the representation of programming languages.
To overcome the limitations and challenges of current program representations, we propose a graph-based program representation called PERFOGRAPH.
PerFOGRAPH can capture numerical information and the aggregate data structure by introducing new nodes and edges.
arXiv Detail & Related papers (2023-05-31T21:59:50Z) - Syntax-Guided Program Reduction for Understanding Neural Code
Intelligence Models [1.1924369482115011]
We show that a syntax-guided program reduction technique is faster and provides smaller sets of key tokens in reduced programs.
We also show that the key tokens could be used in generating adversarial examples for up to 65% of the input programs.
arXiv Detail & Related papers (2022-05-28T09:04:57Z) - Tea: Program Repair Using Neural Network Based on Program Information
Attention Matrix [14.596847020236657]
We propose a unified representation to capture the syntax, data flow, and control flow aspects of software programs.
We then devise a method to use such a representation to guide the transformer model from NLP in better understanding and fixing buggy programs.
arXiv Detail & Related papers (2021-07-17T15:49:22Z) - Enforcing Consistency in Weakly Supervised Semantic Parsing [68.2211621631765]
We explore the use of consistency between the output programs for related inputs to reduce the impact of spurious programs.
We find that a more consistent formalism leads to improved model performance even without consistency-based training.
arXiv Detail & Related papers (2021-07-13T03:48:04Z) - Latent Execution for Neural Program Synthesis Beyond Domain-Specific
Languages [97.58968222942173]
We take the first step to synthesize C programs from input-output examples.
In particular, we propose La Synth, which learns the latent representation to approximate the execution of partially generated programs.
We show that training on these synthesized programs further improves the prediction performance for both Karel and C program synthesis.
arXiv Detail & Related papers (2021-06-29T02:21:32Z) - Improving Compositionality of Neural Networks by Decoding
Representations to Inputs [83.97012077202882]
We bridge the benefits of traditional and deep learning programs by jointly training a generative model to constrain neural network activations to "decode" back to inputs.
We demonstrate applications of decodable representations to out-of-distribution detection, adversarial examples, calibration, and fairness.
arXiv Detail & Related papers (2021-06-01T20:07:16Z) - Representing Partial Programs with Blended Abstract Semantics [62.20775388513027]
We introduce a technique for representing partially written programs in a program synthesis engine.
We learn an approximate execution model implemented as a modular neural network.
We show that these hybrid neuro-symbolic representations enable execution-guided synthesizers to use more powerful language constructs.
arXiv Detail & Related papers (2020-12-23T20:40:18Z) - Incremental maintenance of overgrounded logic programs with tailored
simplifications [0.966840768820136]
We introduce a new strategy for generating series of monotonically growing propositional programs.
With respect to earlier approaches, our tailored simplification technique reduces the size of instantiated programs.
arXiv Detail & Related papers (2020-08-06T21:50:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.