CodeLens: An Interactive Tool for Visualizing Code Representations
- URL: http://arxiv.org/abs/2307.14902v1
- Date: Thu, 27 Jul 2023 14:46:09 GMT
- Title: CodeLens: An Interactive Tool for Visualizing Code Representations
- Authors: Yuejun Guo and Seifeddine Bettaieb and Qiang Hu and Yves Le Traon and
Qiang Tang
- Abstract summary: Representing source code in a generic input format is crucial to automate software engineering tasks.
Visualizing code representations can further enable human experts to gain an intuitive insight into the code.
We introduce a tool, CodeLens, which provides a visual interaction environment that supports various representation methods.
- Score: 12.59741038895472
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Representing source code in a generic input format is crucial to automate
software engineering tasks, e.g., applying machine learning algorithms to
extract information. Visualizing code representations can further enable human
experts to gain an intuitive insight into the code. Unfortunately, as of today,
there is no universal tool that can simultaneously visualise different types of
code representations. In this paper, we introduce a tool, CodeLens, which
provides a visual interaction environment that supports various representation
methods and helps developers understand and explore them. CodeLens is designed
to support multiple programming languages, such as Java, Python, and
JavaScript, and four types of code representations, including sequence of
tokens, abstract syntax tree (AST), data flow graph (DFG), and control flow
graph (CFG). By using CodeLens, developers can quickly visualize the specific
code representation and also obtain the represented inputs for models of code.
The Web-based interface of CodeLens is available at http://www.codelens.org.
The demonstration video can be found at http://www.codelens.org/demo.
Related papers
- Building A Coding Assistant via the Retrieval-Augmented Language Model [24.654428111628242]
We propose a retrieval-augmeNted language model (CONAN) to build a code assistant by mimicking the knowledge-seeking behaviors of humans during coding.
It consists of a code structure aware retriever (CONAN-R) and a dual-view code representation-based retrieval-augmented generation model (CONAN-G)
arXiv Detail & Related papers (2024-10-21T17:34:39Z) - Benchmarking LLM Code Generation for Audio Programming with Visual Dataflow Languages [1.559169421643164]
Node-based programming languages are increasingly popular in media arts coding domains.
Using LLM-based code generation to further lower the barrier to creative output is an exciting opportunity.
Best strategy for code generation for visual node-based programming languages is still an open question.
arXiv Detail & Related papers (2024-09-01T22:11:23Z) - CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
We propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs.
CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language.
Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert.
arXiv Detail & Related papers (2024-05-03T02:48:55Z) - SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code
Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects.
We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z) - Retrieval-Augmented Code Generation for Universal Information Extraction [66.68673051922497]
Information Extraction aims to extract structural knowledge from natural language texts.
We propose a universal retrieval-augmented code generation framework based on Large Language Models (LLMs)
Code4UIE adopts Python classes to define task-specific schemas of various structural knowledge in a universal way.
arXiv Detail & Related papers (2023-11-06T09:03:21Z) - COMEX: A Tool for Generating Customized Source Code Representations [7.151800146054561]
COMEX is a framework that allows researchers and developers to create and combine multiple code-views.
It can analyze both method-level snippets and program-level snippets by using both intra-procedural and inter-procedural snippets.
It is built on tree-sitter - a widely used incremental analysis tool that supports over 40 languages.
arXiv Detail & Related papers (2023-07-10T16:46:34Z) - Code Execution with Pre-trained Language Models [88.04688617516827]
Most pre-trained models for code intelligence ignore the execution trace and only rely on source code and syntactic structures.
We develop a mutation-based data augmentation technique to create a large-scale and realistic Python dataset and task for code execution.
We then present CodeExecutor, a Transformer model that leverages code execution pre-training and curriculum learning to enhance its semantic comprehension.
arXiv Detail & Related papers (2023-05-08T10:00:05Z) - Adding Context to Source Code Representations for Deep Learning [13.676416860721877]
We argue that it is beneficial for deep learning models to have access to additional contextual information about the code being analysed.
We present preliminary evidence that encoding context from the call hierarchy along with information from the code itself can improve the performance of a state-of-the-art deep learning model.
arXiv Detail & Related papers (2022-07-30T12:47:32Z) - CODE-MVP: Learning to Represent Source Code from Multiple Views with
Contrastive Pre-Training [26.695345034376388]
We propose to integrate different views with the natural-language description of source code into a unified framework with Multi-View contrastive Pre-training.
Specifically, we first extract multiple code views using compiler tools, and learn the complementary information among them under a contrastive learning framework.
Experiments on three downstream tasks over five datasets demonstrate the superiority of CODE-MVP when compared with several state-of-the-art baselines.
arXiv Detail & Related papers (2022-05-04T12:40:58Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - UniXcoder: Unified Cross-Modal Pre-training for Code Representation [65.6846553962117]
We present UniXcoder, a unified cross-modal pre-trained model for programming language.
We propose a one-to-one mapping method to transform AST in a sequence structure that retains all structural information from the tree.
We evaluate UniXcoder on five code-related tasks over nine datasets.
arXiv Detail & Related papers (2022-03-08T04:48:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.