Related papers: HAConvGNN: Hierarchical Attention Based Convolutional Graph Neural Network for Code Documentation Generation in Jupyter Notebooks

HAConvGNN: Hierarchical Attention Based Convolutional Graph Neural Network for Code Documentation Generation in Jupyter Notebooks

URL: http://arxiv.org/abs/2104.01002v1
Date: Wed, 31 Mar 2021 22:36:41 GMT
Title: HAConvGNN: Hierarchical Attention Based Convolutional Graph Neural Network for Code Documentation Generation in Jupyter Notebooks
Authors: Xuye Liu, Dakuo Wang, April Wang, Lingfei Wu
Abstract summary: We propose a hierarchical attention-based ConvGNN component to augment the Seq2Seq network. We build a dataset with publicly available Kaggle notebooks and evaluate our model (HAConvGNN) against baseline models.
Score: 33.37494243822309
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many data scientists use Jupyter notebook to experiment code, visualize results, and document rationales or interpretations. The code documentation generation CDG task in notebooks is related but different from the code summarization task in software engineering, as one documentation (markdown cell) may consist of a text (informative summary or indicative rationale) for multiple code cells. Our work aims to solve the CDG task by encoding the multiple code cells as separated AST graph structures, for which we propose a hierarchical attention-based ConvGNN component to augment the Seq2Seq network. We build a dataset with publicly available Kaggle notebooks and evaluate our model (HAConvGNN) against baseline models (e.g., Code2Seq or Graph2Seq).

Related papers

Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning [57.09163579304332]
We introduce PaperCoder, a framework that transforms machine learning papers into functional code repositories. PaperCoder operates in three stages: planning, designs the system architecture with diagrams, identifies file dependencies, and generates configuration files. We then evaluate PaperCoder on generating code implementations from machine learning papers based on both model-based and human evaluations.
arXiv Detail & Related papers (2025-04-24T01:57:01Z)
Hierarchical Vector Quantized Graph Autoencoder with Annealing-Based Code Selection [13.731120424653705]
The Vector Quantized Variational Autoencoder (VQ-VAE) is a powerful autoencoder extensively used in fields such as computer vision. In this paper, we provide an empirical analysis of vector quantization in the context of graph autoencoders. We identify two key challenges associated with vector quantization when applying in graph data: codebook underutilization and codebook space sparsity.
arXiv Detail & Related papers (2025-04-17T07:43:52Z)
Contextualized Data-Wrangling Code Generation in Computational Notebooks [131.26365849822932]
We propose an automated approach, CoCoMine, to mine data-wrangling code generation examples with clear multi-modal contextual dependency. We construct CoCoNote, a dataset containing 58,221 examples for Contextualized Data-wrangling Code generation in Notebooks. Experiment results demonstrate the significance of incorporating data context in data-wrangling code generation.
arXiv Detail & Related papers (2024-09-20T14:49:51Z)
Typhon: Automatic Recommendation of Relevant Code Cells in Jupyter Notebooks [0.3122672716129843]
This paper proposes Typhon, an approach to automatically recommend relevant code cells in Jupyter notebooks. Typhon tokenizes developers' markdown description cells and looks for the most similar code cells from the database. We evaluated the Typhon tool on Jupyter notebooks from Kaggle competitions and found that the approach can recommend code cells with moderate accuracy.
arXiv Detail & Related papers (2024-05-15T03:59:59Z)
CodeExp: Explanatory Code Document Generation [94.43677536210465]
Existing code-to-text generation models produce only high-level summaries of code. We conduct a human study to identify the criteria for high-quality explanatory docstring for code. We present a multi-stage fine-tuning strategy and baseline models for the task.
arXiv Detail & Related papers (2022-11-25T18:05:44Z)
Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks [0.965964228590342]
We propose Doc2Graph, a task-agnostic document understanding framework based on a GNN model. We evaluate our approach on two challenging datasets for key information extraction in form understanding, invoice layout analysis and table detection.
arXiv Detail & Related papers (2022-08-23T19:48:10Z)
Graph Spring Network and Informative Anchor Selection for Session-based Recommendation [2.6524289609910654]
Session-based recommendation (SBR) aims at predicting the next item for an ongoing anonymous session. The major challenge of SBR is how to capture richer relations in between items and learn ID-based item embeddings to capture such relations. We propose a new graph neural network, called Graph Spring Network (GSN), for learning ID-based item embedding on an item graph.
arXiv Detail & Related papers (2022-02-19T02:47:44Z)
Text Classification for Task-based Source Code Related Questions [0.0]
StackOverflow provides solutions in small snippets which provide a complete answer to whatever task question the developer wants to code. We develop a two-fold deep learning model: Seq2Seq and a binary classifier that takes in the intent (which is in natural language) and code snippets in Python. We find that the hidden state layer's embeddings perform slightly better than regular standard embeddings from a constructed vocabulary.
arXiv Detail & Related papers (2021-10-31T20:10:21Z)
Deep Graph Matching and Searching for Semantic Code Retrieval [76.51445515611469]
We propose an end-to-end deep graph matching and searching model based on graph neural networks. We first represent both natural language query texts and programming language code snippets with the unified graph-structured data. In particular, DGMS not only captures more structural information for individual query texts or code snippets but also learns the fine-grained similarity between them.
arXiv Detail & Related papers (2020-10-24T14:16:50Z)
GraphCodeBERT: Pre-training Code Representations with Data Flow [97.00641522327699]
We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code. We use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables. We evaluate our model on four tasks, including code search, clone detection, code translation, and code refinement.
arXiv Detail & Related papers (2020-09-17T15:25:56Z)
Learning to map source code to software vulnerability using code-as-a-graph [67.62847721118142]
We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective. We show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches.
arXiv Detail & Related papers (2020-06-15T16:05:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.