Related papers: Extracting Label-specific Key Input Features for Neural Code Intelligence Models

Extracting Label-specific Key Input Features for Neural Code Intelligence Models

URL: http://arxiv.org/abs/2202.06474v1
Date: Mon, 14 Feb 2022 03:36:35 GMT
Title: Extracting Label-specific Key Input Features for Neural Code Intelligence Models
Authors: Md Rafiqul Islam Rabin
Abstract summary: Code intelligence (CI) models are often black-box and do not offer insights on the input features that they learn for making correct predictions. In this paper, we apply a syntax-guided program reduction technique that follows the syntax of input programs during reduction.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The code intelligence (CI) models are often black-box and do not offer any insights on the input features that they learn for making correct predictions. This opacity may lead to distrust in their prediction and hamper their wider adoption in safety-critical applications. In recent, the program reduction technique is widely being used to identify key input features in order to explain the prediction of CI models. The approach removes irrelevant parts from an input program and keeps the minimal snippets that a CI model needs to maintain its prediction. However, the state-of-the-art approaches mainly use a syntax-unaware program reduction technique that does not follow the syntax of programs, which adds significant overhead to the reduction of input programs and explainability of models. In this paper, we apply a syntax-guided program reduction technique that follows the syntax of input programs during reduction. Our experiments on multiple models across different types of input programs show that the syntax-guided program reduction technique significantly outperforms the syntax-unaware program reduction technique in reducing the size of input programs. Extracting key input features from reduced programs reveals that the syntax-guided reduced programs contain more label-specific key input features and are more vulnerable to adversarial transformation when renaming the key tokens in programs. These label-specific key input features may help to understand the reasoning of models' prediction from different perspectives and increase the trustworthiness to correct classification given by CI models.

Related papers

A Framework for On the Fly Input Refinement for Deep Learning Models [0.0]
Deep learning models still exhibit notable mispredictions in real-world applications, even when trained on up-to-date data. This research introduces an adaptive, on-the-fly input refinement framework aimed at improving model performance through input validation and transformation. As a scalable and resource-efficient solution, this framework holds significant promise for high-stakes applications in software engineering, natural language processing, and computer vision.
arXiv Detail & Related papers (2025-02-08T05:41:01Z)
Unexpected but informative: What fixation-related potentials tell us about the processing of confusing program code [15.510640091254887]
We analyze the online processing of program code patterns that are ambiguous to programmers, but not the computer. Relative to unambiguous counterparts in program code, atoms of confusion elicit a late frontal positivity with a duration of about 400 to 700 ms. We take these data to suggest that the brain engages similar neurocognitive mechanisms in response to unexpected and informative inputs in program code and in natural language.
arXiv Detail & Related papers (2024-12-13T12:38:10Z)
Learning Program Behavioral Models from Synthesized Input-Output Pairs [70.9524884086882]
We introduce Modelizer, a framework that learns a model from its input/output behavior using neural machine translation algorithms. Modelizer mocks the original program, achieving up to 95.4% accuracy and a BLEU score of 0.98 with standard error 0.04 in mocking real-world applications. We foresee several applications of these models, especially as the output of the program can be any aspect of program behavior.
arXiv Detail & Related papers (2024-07-11T15:25:02Z)
Generative Input: Towards Next-Generation Input Methods Paradigm [49.98958865125018]
We propose a novel Generative Input paradigm named GeneInput. It uses prompts to handle all input scenarios and other intelligent auxiliary input functions, optimizing the model with user feedback to deliver personalized results. The results demonstrate that we have achieved state-of-the-art performance for the first time in the Full-mode Key-sequence to Characters(FK2C) task.
arXiv Detail & Related papers (2023-11-02T12:01:29Z)
Improving Input-label Mapping with Demonstration Replay for In-context Learning [67.57288926736923]
In-context learning (ICL) is an emerging capability of large autoregressive language models. We propose a novel ICL method called Sliding Causal Attention (RdSca) We show that our method significantly improves the input-label mapping in ICL demonstrations.
arXiv Detail & Related papers (2023-10-30T14:29:41Z)
PERFOGRAPH: A Numerical Aware Program Graph Representation for Performance Optimization and Program Analysis [12.778336318809092]
A key challenge in adopting the latest machine learning methods is the representation of programming languages. To overcome the limitations and challenges of current program representations, we propose a graph-based program representation called PERFOGRAPH. PerFOGRAPH can capture numerical information and the aggregate data structure by introducing new nodes and edges.
arXiv Detail & Related papers (2023-05-31T21:59:50Z)
Syntax-Guided Program Reduction for Understanding Neural Code Intelligence Models [1.1924369482115011]
We show that a syntax-guided program reduction technique is faster and provides smaller sets of key tokens in reduced programs. We also show that the key tokens could be used in generating adversarial examples for up to 65% of the input programs.
arXiv Detail & Related papers (2022-05-28T09:04:57Z)
Tea: Program Repair Using Neural Network Based on Program Information Attention Matrix [14.596847020236657]
We propose a unified representation to capture the syntax, data flow, and control flow aspects of software programs. We then devise a method to use such a representation to guide the transformer model from NLP in better understanding and fixing buggy programs.
arXiv Detail & Related papers (2021-07-17T15:49:22Z)
Enforcing Consistency in Weakly Supervised Semantic Parsing [68.2211621631765]
We explore the use of consistency between the output programs for related inputs to reduce the impact of spurious programs. We find that a more consistent formalism leads to improved model performance even without consistency-based training.
arXiv Detail & Related papers (2021-07-13T03:48:04Z)
Latent Execution for Neural Program Synthesis Beyond Domain-Specific Languages [97.58968222942173]
We take the first step to synthesize C programs from input-output examples. In particular, we propose La Synth, which learns the latent representation to approximate the execution of partially generated programs. We show that training on these synthesized programs further improves the prediction performance for both Karel and C program synthesis.
arXiv Detail & Related papers (2021-06-29T02:21:32Z)
Improving Compositionality of Neural Networks by Decoding Representations to Inputs [83.97012077202882]
We bridge the benefits of traditional and deep learning programs by jointly training a generative model to constrain neural network activations to "decode" back to inputs. We demonstrate applications of decodable representations to out-of-distribution detection, adversarial examples, calibration, and fairness.
arXiv Detail & Related papers (2021-06-01T20:07:16Z)
Representing Partial Programs with Blended Abstract Semantics [62.20775388513027]
We introduce a technique for representing partially written programs in a program synthesis engine. We learn an approximate execution model implemented as a modular neural network. We show that these hybrid neuro-symbolic representations enable execution-guided synthesizers to use more powerful language constructs.
arXiv Detail & Related papers (2020-12-23T20:40:18Z)
Incremental maintenance of overgrounded logic programs with tailored simplifications [0.966840768820136]
We introduce a new strategy for generating series of monotonically growing propositional programs. With respect to earlier approaches, our tailored simplification technique reduces the size of instantiated programs.
arXiv Detail & Related papers (2020-08-06T21:50:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.