Related papers: A Systematic Literature Review of Software Engineering Research on Jupyter Notebook

A Systematic Literature Review of Software Engineering Research on Jupyter Notebook

URL: http://arxiv.org/abs/2504.16180v1
Date: Tue, 22 Apr 2025 18:12:04 GMT
Title: A Systematic Literature Review of Software Engineering Research on Jupyter Notebook
Authors: Md Saeed Siddik, Hao Li, Cor-Paul Bezemer,
Abstract summary: The purpose of this study is to analyze trends, gaps, and methodologies used in software engineering research on Jupyter notebooks.<n>The most popular venues for publishing software engineering research on Jupyter notebooks are related to human-computer interaction.
Score: 8.539234346904905
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Context: Jupyter Notebook has emerged as a versatile tool that transforms how researchers, developers, and data scientists conduct and communicate their work. As the adoption of Jupyter notebooks continues to rise, so does the interest from the software engineering research community in improving the software engineering practices for Jupyter notebooks. Objective: The purpose of this study is to analyze trends, gaps, and methodologies used in software engineering research on Jupyter notebooks. Method: We selected 146 relevant publications from the DBLP Computer Science Bibliography up to the end of 2024, following established systematic literature review guidelines. We explored publication trends, categorized them based on software engineering topics, and reported findings based on those topics. Results: The most popular venues for publishing software engineering research on Jupyter notebooks are related to human-computer interaction instead of traditional software engineering venues. Researchers have addressed a wide range of software engineering topics on notebooks, such as code reuse, readability, and execution environment. Although reusability is one of the research topics for Jupyter notebooks, only 64 of the 146 studies can be reused based on their provided URLs. Additionally, most replication packages are not hosted on permanent repositories for long-term availability and adherence to open science principles. Conclusion: Solutions specific to notebooks for software engineering issues, including testing, refactoring, and documentation, are underexplored. Future research opportunities exist in automatic testing frameworks, refactoring clones between notebooks, and generating group documentation for coherent code cells.

Related papers

Exploring the Jupyter Ecosystem: An Empirical Study of Bugs and Vulnerabilities [3.4769545753909608]
This paper aims to provide a large-scale empirical study of bugs and vulnerabilities in the Notebook ecosystem.<n>We collected and analyzed a large dataset of Notebooks from two major platforms.
arXiv Detail & Related papers (2025-07-24T22:09:21Z)
Observing Fine-Grained Changes in Jupyter Notebooks During Development Time [12.75622665542759]
We introduce a toolset for collecting code changes in Jupyter notebooks during development time.<n>We then use it to collect more than 100 hours of work related to a data analysis task and a machine learning task.<n>In our analysis of the collected data, we classified the changes made to the cells between executions and found that a significant number of these changes were code modifications.
arXiv Detail & Related papers (2025-07-21T17:41:51Z)
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning [57.09163579304332]
We introduce PaperCoder, a framework that transforms machine learning papers into functional code repositories. PaperCoder operates in three stages: planning, designs the system architecture with diagrams, identifies file dependencies, and generates configuration files. We then evaluate PaperCoder on generating code implementations from machine learning papers based on both model-based and human evaluations.
arXiv Detail & Related papers (2025-04-24T01:57:01Z)
Teaching Empirical Research Methods in Software Engineering: An Editorial Introduction [2.518416353853374]
Empirical Software Engineering has received much attention in recent years and became a de-facto standard for scientific practice in Software Engineering.<n>While extensive guidelines are nowadays available for designing, conducting, reporting, and reviewing empirical studies, similar attention has not yet been paid to teaching empirical software engineering.
arXiv Detail & Related papers (2025-01-13T10:42:43Z)
Benchmarking Predictive Coding Networks -- Made Simple [48.652114040426625]
We tackle the problems of efficiency and scalability for predictive coding networks (PCNs) in machine learning.<n>We propose a library, called PCX, that focuses on performance and simplicity, and use it to implement a large set of standard benchmarks.<n>We perform extensive tests on such benchmarks using both existing algorithms for PCNs, as well as adaptations of other methods popular in the bio-plausible deep learning community.
arXiv Detail & Related papers (2024-07-01T10:33:44Z)
MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows [58.56005277371235]
We introduce MASSW, a comprehensive text dataset on Multi-Aspect Summarization of ScientificAspects. MASSW includes more than 152,000 peer-reviewed publications from 17 leading computer science conferences spanning the past 50 years. We demonstrate the utility of MASSW through multiple novel machine-learning tasks that can be benchmarked using this new dataset.
arXiv Detail & Related papers (2024-06-10T15:19:09Z)
Investigating Reproducibility in Deep Learning-Based Software Fault Prediction [16.25827159504845]
With the rapid adoption of increasingly complex machine learning models, it becomes more and more difficult for scholars to reproduce the results that are reported in the literature. This is in particular the case when the applied deep learning models and the evaluation methodology are not properly documented and when code and data are not shared. We have conducted a systematic review of the current literature and examined the level of 56 research articles that were published between 2019 and 2022 in top-tier software engineering conferences.
arXiv Detail & Related papers (2024-02-08T13:00:18Z)
SuperNOVA: Design Strategies and Opportunities for Interactive Visualization in Computational Notebooks [34.04783941358773]
We analyze 163 interactive visualization tools for notebooks. We identify key design implications and trade-offs. We develop SuperNOVA, an open-source interactive browser to help researchers explore existing notebook visualization tools.
arXiv Detail & Related papers (2023-05-04T17:57:54Z)
Literature Review: Computer Vision Applications in Transportation Logistics and Warehousing [58.720142291102135]
Computer vision applications in transportation logistics and warehousing have a huge potential for process automation. We present a structured literature review on research in the field to help leverage this potential.
arXiv Detail & Related papers (2023-04-12T17:33:41Z)
Mining the Characteristics of Jupyter Notebooks in Data Science Projects [1.655246222110267]
The computational notebook (e.g., Jupyter Notebook) is a well-known data science tool adopted in practice. This research aims to understand the characteristics of high-voted Jupyter Notebooks on Kaggle and the popular Jupyter Notebooks for data science projects on GitHub.
arXiv Detail & Related papers (2023-04-11T16:30:53Z)
Deep learning for table detection and structure recognition: A survey [49.09628624903334]
The goal of this survey is to provide a profound comprehension of the major developments in the field of Table Detection. We provide an analysis of both classic and new applications in the field. The datasets and source code of the existing models are organized to provide the reader with a compass on this vast literature.
arXiv Detail & Related papers (2022-11-15T19:42:27Z)
Pynblint: a Static Analyzer for Python Jupyter Notebooks [10.190501703364234]
Pynblint is a static analyzer for Jupyter notebooks written in Python. It checks compliance of notebooks (and surrounding repositories) with a set of empirically validated best practices.
arXiv Detail & Related papers (2022-05-24T09:56:03Z)
pymdp: A Python library for active inference in discrete state spaces [52.85819390191516]
pymdp is an open-source package for simulating active inference in Python. We provide the first open-source package for simulating active inference with POMDPs.
arXiv Detail & Related papers (2022-01-11T12:18:44Z)
Machine Learning for Software Engineering: A Systematic Mapping [73.30245214374027]
The software development industry is rapidly adopting machine learning for transitioning modern day software systems towards highly intelligent and self-learning systems. No comprehensive study exists that explores the current state-of-the-art on the adoption of machine learning across software engineering life cycle stages. This study introduces a machine learning for software engineering (MLSE) taxonomy classifying the state-of-the-art machine learning techniques according to their applicability to various software engineering life cycle stages.
arXiv Detail & Related papers (2020-05-27T11:56:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.