CodeLens: An Interactive Tool for Visualizing Code Representations
- URL: http://arxiv.org/abs/2307.14902v1
- Date: Thu, 27 Jul 2023 14:46:09 GMT
- Title: CodeLens: An Interactive Tool for Visualizing Code Representations
- Authors: Yuejun Guo and Seifeddine Bettaieb and Qiang Hu and Yves Le Traon and
Qiang Tang
- Abstract summary: Representing source code in a generic input format is crucial to automate software engineering tasks.
Visualizing code representations can further enable human experts to gain an intuitive insight into the code.
We introduce a tool, CodeLens, which provides a visual interaction environment that supports various representation methods.
- Score: 12.59741038895472
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Representing source code in a generic input format is crucial to automate
software engineering tasks, e.g., applying machine learning algorithms to
extract information. Visualizing code representations can further enable human
experts to gain an intuitive insight into the code. Unfortunately, as of today,
there is no universal tool that can simultaneously visualise different types of
code representations. In this paper, we introduce a tool, CodeLens, which
provides a visual interaction environment that supports various representation
methods and helps developers understand and explore them. CodeLens is designed
to support multiple programming languages, such as Java, Python, and
JavaScript, and four types of code representations, including sequence of
tokens, abstract syntax tree (AST), data flow graph (DFG), and control flow
graph (CFG). By using CodeLens, developers can quickly visualize the specific
code representation and also obtain the represented inputs for models of code.
The Web-based interface of CodeLens is available at http://www.codelens.org.
The demonstration video can be found at http://www.codelens.org/demo.
Related papers
- CodeGRAG: Extracting Composed Syntax Graphs for Retrieval Augmented Cross-Lingual Code Generation [60.799992690487336]
We propose Syntax Graph Retrieval Augmented Code Generation (CodeGRAG) to enhance the performance of LLMs in single-round code generation tasks.
CodeGRAG significantly improves the code generation ability of LLMs and can even offer performance gain for cross-lingual code generation.
arXiv Detail & Related papers (2024-05-03T02:48:55Z) - CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code [59.32609948217718]
We present CodeIP, a new watermarking technique for Large Language Models (LLMs)-based code generation.
CodeIP enables the insertion of multi-bit information while preserving the semantics of the generated code.
arXiv Detail & Related papers (2024-04-24T04:25:04Z) - SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code
Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects.
We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z) - COMEX: A Tool for Generating Customized Source Code Representations [7.151800146054561]
COMEX is a framework that allows researchers and developers to create and combine multiple code-views.
It can analyze both method-level snippets and program-level snippets by using both intra-procedural and inter-procedural snippets.
It is built on tree-sitter - a widely used incremental analysis tool that supports over 40 languages.
arXiv Detail & Related papers (2023-07-10T16:46:34Z) - Code Execution with Pre-trained Language Models [88.04688617516827]
Most pre-trained models for code intelligence ignore the execution trace and only rely on source code and syntactic structures.
We develop a mutation-based data augmentation technique to create a large-scale and realistic Python dataset and task for code execution.
We then present CodeExecutor, a Transformer model that leverages code execution pre-training and curriculum learning to enhance its semantic comprehension.
arXiv Detail & Related papers (2023-05-08T10:00:05Z) - Adding Context to Source Code Representations for Deep Learning [13.676416860721877]
We argue that it is beneficial for deep learning models to have access to additional contextual information about the code being analysed.
We present preliminary evidence that encoding context from the call hierarchy along with information from the code itself can improve the performance of a state-of-the-art deep learning model.
arXiv Detail & Related papers (2022-07-30T12:47:32Z) - CODE-MVP: Learning to Represent Source Code from Multiple Views with
Contrastive Pre-Training [26.695345034376388]
We propose to integrate different views with the natural-language description of source code into a unified framework with Multi-View contrastive Pre-training.
Specifically, we first extract multiple code views using compiler tools, and learn the complementary information among them under a contrastive learning framework.
Experiments on three downstream tasks over five datasets demonstrate the superiority of CODE-MVP when compared with several state-of-the-art baselines.
arXiv Detail & Related papers (2022-05-04T12:40:58Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - UniXcoder: Unified Cross-Modal Pre-training for Code Representation [65.6846553962117]
We present UniXcoder, a unified cross-modal pre-trained model for programming language.
We propose a one-to-one mapping method to transform AST in a sequence structure that retains all structural information from the tree.
We evaluate UniXcoder on five code-related tasks over nine datasets.
arXiv Detail & Related papers (2022-03-08T04:48:07Z) - Project CodeNet: A Large-Scale AI for Code Dataset for Learning a
Diversity of Coding Tasks [11.10732802304274]
Project CodeNet consists of 14M code samples and about 500M lines of code in 55 different programming languages.
Project CodeNet is not only unique in its scale, but also in the diversity of coding tasks it can help benchmark.
arXiv Detail & Related papers (2021-05-25T00:13:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.