Pynblint: a Static Analyzer for Python Jupyter Notebooks
- URL: http://arxiv.org/abs/2205.11934v1
- Date: Tue, 24 May 2022 09:56:03 GMT
- Title: Pynblint: a Static Analyzer for Python Jupyter Notebooks
- Authors: Luigi Quaranta, Fabio Calefato, Filippo Lanubile
- Abstract summary: Pynblint is a static analyzer for Jupyter notebooks written in Python.
It checks compliance of notebooks (and surrounding repositories) with a set of empirically validated best practices.
- Score: 10.190501703364234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Jupyter Notebook is the tool of choice of many data scientists in the early
stages of ML workflows. The notebook format, however, has been criticized for
inducing bad programming practices; indeed, researchers have already shown that
open-source repositories are inundated by poor-quality notebooks. Low-quality
output from the prototypical stages of ML workflows constitutes a clear
bottleneck towards the productization of ML models. To foster the creation of
better notebooks, we developed Pynblint, a static analyzer for Jupyter
notebooks written in Python. The tool checks the compliance of notebooks (and
surrounding repositories) with a set of empirically validated best practices
and provides targeted recommendations when violations are detected.
Related papers
- Untangling Knots: Leveraging LLM for Error Resolution in Computational Notebooks [4.318590074766604]
We propose a potential solution for resolving errors in computational notebooks via an iterative LLM-based agent.
We discuss the questions raised by this approach and share a novel dataset of computational notebooks containing bugs.
arXiv Detail & Related papers (2024-03-26T18:53:17Z) - DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows [72.40917624485822]
We introduce DataDreamer, an open source Python library that allows researchers to implement powerful large language models.
DataDreamer also helps researchers adhere to best practices that we propose to encourage open science.
arXiv Detail & Related papers (2024-02-16T00:10:26Z) - Jup2Kub: algorithms and a system to translate a Jupyter Notebook
pipeline to a fault tolerant distributed Kubernetes deployment [0.9790236766474201]
Scientific facilitate computational, data manipulation, and sometimes visualization steps for scientific data analysis.
Jupyter notebooks struggle to scale with larger data sets, lack failure tolerance, and depend heavily on the stability of underlying tools and packages.
Jup2Kup translates from Jupyter notebooks into a distributed, high-performance environment, enhancing fault tolerance.
arXiv Detail & Related papers (2023-11-21T02:54:06Z) - Julearn: an easy-to-use library for leakage-free evaluation and
inspection of ML models [0.23301643766310373]
We present the rationale behind julearn's design, its core features, and showcase three examples of previously-published research projects.
Julearn aims to simplify the entry into the machine learning world by providing an easy-to-use environment with built in guards against some of the most common ML pitfalls.
arXiv Detail & Related papers (2023-10-19T08:21:12Z) - PEOPL: Characterizing Privately Encoded Open Datasets with Public Labels [59.66777287810985]
We introduce information-theoretic scores for privacy and utility, which quantify the average performance of an unfaithful user.
We then theoretically characterize primitives in building families of encoding schemes that motivate the use of random deep neural networks.
arXiv Detail & Related papers (2023-03-31T18:03:53Z) - Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it.
Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z) - PyGOD: A Python Library for Graph Outlier Detection [56.33769221859135]
PyGOD is an open-source library for detecting outliers in graph data.
It supports a wide array of leading graph-based methods for outlier detection.
PyGOD is released under a BSD 2-Clause license at https://pygod.org and at the Python Package Index (PyPI)
arXiv Detail & Related papers (2022-04-26T06:15:21Z) - Missing Data Infill with Automunge [0.0]
Missing data is a fundamental obstacle in the practice of data science.
This paper surveys a few conventions for imputation as available in the Automunge open source python library platform.
arXiv Detail & Related papers (2022-02-19T00:49:30Z) - Scikit-dimension: a Python package for intrinsic dimension estimation [58.8599521537]
This technical note introduces textttscikit-dimension, an open-source Python package for intrinsic dimension estimation.
textttscikit-dimension package provides a uniform implementation of most of the known ID estimators based on scikit-learn application programming interface.
We briefly describe the package and demonstrate its use in a large-scale (more than 500 datasets) benchmarking of methods for ID estimation in real-life and synthetic data.
arXiv Detail & Related papers (2021-09-06T16:46:38Z) - FedML: A Research Library and Benchmark for Federated Machine Learning [55.09054608875831]
Federated learning (FL) is a rapidly growing research field in machine learning.
Existing FL libraries cannot adequately support diverse algorithmic development.
We introduce FedML, an open research library and benchmark to facilitate FL algorithm development and fair performance comparison.
arXiv Detail & Related papers (2020-07-27T13:02:08Z) - ReproduceMeGit: A Visualization Tool for Analyzing Reproducibility of
Jupyter Notebooks [0.0]
We present ReproduceMeGit, a visualization tool for analyzing the GitHub of Jupyter Notebooks.
The tool provides information on the number of notebooks that were successfully reproducible, those that resulted in exceptions, those with different results from the original notebooks, etc.
arXiv Detail & Related papers (2020-06-22T10:05:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.