Related papers: Encoding Program as Image: Evaluating Visual Representation of Source Code

Encoding Program as Image: Evaluating Visual Representation of Source Code

URL: http://arxiv.org/abs/2111.01097v1
Date: Mon, 1 Nov 2021 17:07:02 GMT
Title: Encoding Program as Image: Evaluating Visual Representation of Source Code
Authors: Md Rafiqul Islam Rabin, Mohammad Amin Alipour
Abstract summary: We investigate Code2Snapshot, a novel representation of the source code based on the snapshots of input programs. We compare its performance with state-of-the-art representations that utilize the rich syntactic and semantic features of input programs.
Score: 2.1016374925364616
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There are several approaches to encode source code in the input vectors of neural models. These approaches attempt to include various syntactic and semantic features of input programs in their encoding. In this paper, we investigate Code2Snapshot, a novel representation of the source code that is based on the snapshots of input programs. We evaluate several variations of this representation and compare its performance with state-of-the-art representations that utilize the rich syntactic and semantic features of input programs. Our preliminary study on the utility of Code2Snapshot in the code summarization task suggests that simple snapshots of input programs have comparable performance to the state-of-the-art representations. Interestingly, obscuring the input programs have insignificant impacts on the Code2Snapshot performance, suggesting that, for some tasks, neural models may provide high performance by relying merely on the structure of input programs.

Related papers

SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects. We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z)
Code Representation Pre-training with Complements from Program Executions [29.148208436656216]
We propose FuzzPretrain to explore the dynamic information of programs revealed by their test cases and embed it into the feature representations of code as complements. FuzzyPretrain yielded more than 6%/9% mAP improvements on code search over its counterparts trained with only source code or AST.
arXiv Detail & Related papers (2023-09-04T01:57:22Z)
Which Features are Learned by CodeBert: An Empirical Study of the BERT-based Source Code Representation Learning [9.469346910848733]
We show that current methods cannot effectively understand the logic of source codes. The representation of source code heavily relies on the programmer-defined variable and function names.
arXiv Detail & Related papers (2023-01-20T05:39:26Z)
Generalized Decoding for Pixel, Image, and Language [197.85760901840177]
We present X-Decoder, a generalized decoding model that can predict pixel-level segmentation and language tokens seamlessly. X-Decoder is the first work that provides a unified way to support all types of image segmentation and a variety of vision-language (VL) tasks.
arXiv Detail & Related papers (2022-12-21T18:58:41Z)
Syntax-Guided Program Reduction for Understanding Neural Code Intelligence Models [1.1924369482115011]
We show that a syntax-guided program reduction technique is faster and provides smaller sets of key tokens in reduced programs. We also show that the key tokens could be used in generating adversarial examples for up to 65% of the input programs.
arXiv Detail & Related papers (2022-05-28T09:04:57Z)
Latent Execution for Neural Program Synthesis Beyond Domain-Specific Languages [97.58968222942173]
We take the first step to synthesize C programs from input-output examples. In particular, we propose La Synth, which learns the latent representation to approximate the execution of partially generated programs. We show that training on these synthesized programs further improves the prediction performance for both Karel and C program synthesis.
arXiv Detail & Related papers (2021-06-29T02:21:32Z)
Representing Partial Programs with Blended Abstract Semantics [62.20775388513027]
We introduce a technique for representing partially written programs in a program synthesis engine. We learn an approximate execution model implemented as a modular neural network. We show that these hybrid neuro-symbolic representations enable execution-guided synthesizers to use more powerful language constructs.
arXiv Detail & Related papers (2020-12-23T20:40:18Z)
Latent Programmer: Discrete Latent Codes for Program Synthesis [56.37993487589351]
In many sequence learning tasks, such as program synthesis and document summarization, a key problem is searching over a large space of possible output sequences. We propose to learn representations of the outputs that are specifically meant for search: rich enough to specify the desired output but compact enough to make search more efficient. We introduce the emphLatent Programmer, a program synthesis method that first predicts a discrete latent code from input/output examples, and then generates the program in the target language.
arXiv Detail & Related papers (2020-12-01T10:11:35Z)
Towards Demystifying Dimensions of Source Code Embeddings [5.211235558099913]
We present our preliminary results towards better understanding the contents of code2vec neural source code embeddings. Our results suggest that the handcrafted features can perform very close to the highly-dimensional code2vec embeddings. We also find that the code2vec embeddings are more resilient to the removal of dimensions with low information gains than the handcrafted features.
arXiv Detail & Related papers (2020-08-29T21:59:11Z)
A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens. We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.