Related papers: Pynblint: a Static Analyzer for Python Jupyter Notebooks

Related papers

Practical programming research of Linear DML model based on the simplest Python code: From the standpoint of novice researchers [0.0]
This paper presents linear DML models for causal inference using the simplest Python code on a Jupyter notebook based on an Anaconda platform. The results show that current Library API technology is not yet sufficient to enable novice Python users to build qualified and high-quality DML models.
arXiv Detail & Related papers (2025-02-22T10:07:54Z)
Suggesting Code Edits in Interactive Machine Learning Notebooks Using Large Language Models [3.2433570328895196]
We present the first dataset of 48,398 Jupyter notebook edits derived from 20,095 revisions of 792 machine learning repositories on GitHub. Our dataset captures granular details of cell-level and line-level modifications, offering a foundation for understanding real-world maintenance patterns in machine learning.
arXiv Detail & Related papers (2025-01-16T18:55:38Z)
PyPulse: A Python Library for Biosignal Imputation [58.35269251730328]
We introduce PyPulse, a Python package for imputation of biosignals in both clinical and wearable sensor settings. PyPulse's framework provides a modular and extendable framework with high ease-of-use for a broad userbase, including non-machine-learning bioresearchers. We released PyPulse under the MIT License on Github and PyPI.
arXiv Detail & Related papers (2024-12-09T11:00:55Z)
Why do Machine Learning Notebooks Crash? An Empirical Study on Public Python Jupyter Notebooks [1.8292110434077904]
We collect 64,031 notebooks containing 92,542 crashes from GitHub and Kaggle. We analyze a sample of 746 crashes across various aspects, including crash types and root causes. We find that over 40% of crashes stem from API misuse and notebook-specific issues.
arXiv Detail & Related papers (2024-11-25T09:33:08Z)
Untangling Knots: Leveraging LLM for Error Resolution in Computational Notebooks [4.318590074766604]
We propose a potential solution for resolving errors in computational notebooks via an iterative LLM-based agent. We discuss the questions raised by this approach and share a novel dataset of computational notebooks containing bugs.
arXiv Detail & Related papers (2024-03-26T18:53:17Z)
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows [72.40917624485822]
We introduce DataDreamer, an open source Python library that allows researchers to implement powerful large language models. DataDreamer also helps researchers adhere to best practices that we propose to encourage open science.
arXiv Detail & Related papers (2024-02-16T00:10:26Z)
Jup2Kub: algorithms and a system to translate a Jupyter Notebook pipeline to a fault tolerant distributed Kubernetes deployment [0.9790236766474201]
Scientific facilitate computational, data manipulation, and sometimes visualization steps for scientific data analysis. Jupyter notebooks struggle to scale with larger data sets, lack failure tolerance, and depend heavily on the stability of underlying tools and packages. Jup2Kup translates from Jupyter notebooks into a distributed, high-performance environment, enhancing fault tolerance.
arXiv Detail & Related papers (2023-11-21T02:54:06Z)
Julearn: an easy-to-use library for leakage-free evaluation and inspection of ML models [0.23301643766310373]
We present the rationale behind julearn's design, its core features, and showcase three examples of previously-published research projects. Julearn aims to simplify the entry into the machine learning world by providing an easy-to-use environment with built in guards against some of the most common ML pitfalls.
arXiv Detail & Related papers (2023-10-19T08:21:12Z)
PEOPL: Characterizing Privately Encoded Open Datasets with Public Labels [59.66777287810985]
We introduce information-theoretic scores for privacy and utility, which quantify the average performance of an unfaithful user. We then theoretically characterize primitives in building families of encoding schemes that motivate the use of random deep neural networks.
arXiv Detail & Related papers (2023-03-31T18:03:53Z)
Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it. Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z)
PyGOD: A Python Library for Graph Outlier Detection [56.33769221859135]
PyGOD is an open-source library for detecting outliers in graph data. It supports a wide array of leading graph-based methods for outlier detection. PyGOD is released under a BSD 2-Clause license at https://pygod.org and at the Python Package Index (PyPI)
arXiv Detail & Related papers (2022-04-26T06:15:21Z)
Missing Data Infill with Automunge [0.0]
Missing data is a fundamental obstacle in the practice of data science. This paper surveys a few conventions for imputation as available in the Automunge open source python library platform.
arXiv Detail & Related papers (2022-02-19T00:49:30Z)
Scikit-dimension: a Python package for intrinsic dimension estimation [58.8599521537]
This technical note introduces textttscikit-dimension, an open-source Python package for intrinsic dimension estimation. textttscikit-dimension package provides a uniform implementation of most of the known ID estimators based on scikit-learn application programming interface. We briefly describe the package and demonstrate its use in a large-scale (more than 500 datasets) benchmarking of methods for ID estimation in real-life and synthetic data.
arXiv Detail & Related papers (2021-09-06T16:46:38Z)
FedML: A Research Library and Benchmark for Federated Machine Learning [55.09054608875831]
Federated learning (FL) is a rapidly growing research field in machine learning. Existing FL libraries cannot adequately support diverse algorithmic development. We introduce FedML, an open research library and benchmark to facilitate FL algorithm development and fair performance comparison.
arXiv Detail & Related papers (2020-07-27T13:02:08Z)
ReproduceMeGit: A Visualization Tool for Analyzing Reproducibility of Jupyter Notebooks [0.0]
We present ReproduceMeGit, a visualization tool for analyzing the GitHub of Jupyter Notebooks. The tool provides information on the number of notebooks that were successfully reproducible, those that resulted in exceptions, those with different results from the original notebooks, etc.
arXiv Detail & Related papers (2020-06-22T10:05:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.