Typhon: Automatic Recommendation of Relevant Code Cells in Jupyter Notebooks
- URL: http://arxiv.org/abs/2405.09075v1
- Date: Wed, 15 May 2024 03:59:59 GMT
- Title: Typhon: Automatic Recommendation of Relevant Code Cells in Jupyter Notebooks
- Authors: Chaiyong Ragkhitwetsagul, Veerakit Prasertpol, Natanon Ritta, Paphon Sae-Wong, Thanapon Noraset, Morakot Choetkiertikul,
- Abstract summary: This paper proposes Typhon, an approach to automatically recommend relevant code cells in Jupyter notebooks.
Typhon tokenizes developers' markdown description cells and looks for the most similar code cells from the database.
We evaluated the Typhon tool on Jupyter notebooks from Kaggle competitions and found that the approach can recommend code cells with moderate accuracy.
- Score: 0.3122672716129843
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: At present, code recommendation tools have gained greater importance to many software developers in various areas of expertise. Having code recommendation tools has enabled better productivity and performance in developing the code in software and made it easier for developers to find code examples and learn from them. This paper proposes Typhon, an approach to automatically recommend relevant code cells in Jupyter notebooks. Typhon tokenizes developers' markdown description cells and looks for the most similar code cells from the database using text similarities such as the BM25 ranking function or CodeBERT, a machine-learning approach. Then, the algorithm computes the similarity distance between the tokenized query and markdown cells to return the most relevant code cells to the developers. We evaluated the Typhon tool on Jupyter notebooks from Kaggle competitions and found that the approach can recommend code cells with moderate accuracy. The approach and results in this paper can lead to further improvements in code cell recommendations in Jupyter notebooks.
Related papers
- Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework.
Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z) - A Flexible Cell Classification for ML Projects in Jupyter Notebooks [0.21485350418225244]
This paper presents a more flexible approach to cell classification based on a hybrid classification approach that combines a rule-based and a decision tree classifier.
We implemented the new flexible cell classification approach in a tool called JupyLabel.
arXiv Detail & Related papers (2024-03-12T11:50:47Z) - Unlocking Insights: Semantic Search in Jupyter Notebooks [1.320904960556043]
We investigate the application of large language models to enhance semantic search capabilities.
Our objective is to retrieve generated outputs, such as figures or tables, associated functions and methods, and other pertinent information.
We demonstrate a semantic search framework that achieves a comprehensive semantic understanding of the entire notebook's contents.
arXiv Detail & Related papers (2024-02-20T18:49:41Z) - Leveraging Generative AI: Improving Software Metadata Classification
with Generated Code-Comment Pairs [0.0]
In software development, code comments play a crucial role in enhancing code comprehension and collaboration.
This research paper addresses the challenge of objectively classifying code comments as "Useful" or "Not Useful"
We propose a novel solution that harnesses contextualized embeddings, particularly BERT, to automate this classification process.
arXiv Detail & Related papers (2023-10-14T12:09:43Z) - Generation-Augmented Query Expansion For Code Retrieval [51.20943646688115]
We propose a generation-augmented query expansion framework.
Inspired by the human retrieval process - sketching an answer before searching.
We achieve new state-of-the-art results on the CodeSearchNet benchmark.
arXiv Detail & Related papers (2022-12-20T23:49:37Z) - CodeExp: Explanatory Code Document Generation [94.43677536210465]
Existing code-to-text generation models produce only high-level summaries of code.
We conduct a human study to identify the criteria for high-quality explanatory docstring for code.
We present a multi-stage fine-tuning strategy and baseline models for the task.
arXiv Detail & Related papers (2022-11-25T18:05:44Z) - Enhancing Semantic Code Search with Multimodal Contrastive Learning and
Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search.
We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - CodeRetriever: Unimodal and Bimodal Contrastive Learning [128.06072658302165]
We propose the CodeRetriever model, which combines the unimodal and bimodal contrastive learning to train function-level code semantic representations.
For unimodal contrastive learning, we design a semantic-guided method to build positive code pairs based on the documentation and function name.
For bimodal contrastive learning, we leverage the documentation and in-line comments of code to build text-code pairs.
arXiv Detail & Related papers (2022-01-26T10:54:30Z) - DeSkew-LSH based Code-to-Code Recommendation Engine [3.7011129410662558]
We present emphSenatus, a new code-to-code recommendation engine for machine learning on source code.
At the core of Senatus is emphDe-Skew LSH, a new locality sensitive hashing algorithm that indexes the data for fast (sub-linear time) retrieval.
We show Senatus improves performance by 6.7% F1 and query time 16x is faster compared to Facebook Aroma on the task of code-to-code recommendation.
arXiv Detail & Related papers (2021-11-05T16:56:28Z) - HAConvGNN: Hierarchical Attention Based Convolutional Graph Neural
Network for Code Documentation Generation in Jupyter Notebooks [33.37494243822309]
We propose a hierarchical attention-based ConvGNN component to augment the Seq2Seq network.
We build a dataset with publicly available Kaggle notebooks and evaluate our model (HAConvGNN) against baseline models.
arXiv Detail & Related papers (2021-03-31T22:36:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.