Related papers: Eliciting Best Practices for Collaboration with Computational Notebooks

Eliciting Best Practices for Collaboration with Computational Notebooks

URL: http://arxiv.org/abs/2202.07233v1
Date: Tue, 15 Feb 2022 07:39:37 GMT
Title: Eliciting Best Practices for Collaboration with Computational Notebooks
Authors: Luigi Quaranta and Fabio Calefato and Filippo Lanubile
Abstract summary: We elicit a catalog of best practices for collaborative data science with computational notebooks. We conduct interviews with professional data scientists to assess their awareness of these best practices. Findings reveal that experts are mostly aware of the best practices and tend to adopt them in their daily work.
Score: 10.190501703364234
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the widespread adoption of computational notebooks, little is known about best practices for their usage in collaborative contexts. In this paper, we fill this gap by eliciting a catalog of best practices for collaborative data science with computational notebooks. With this aim, we first look for best practices through a multivocal literature review. Then, we conduct interviews with professional data scientists to assess their awareness of these best practices. Finally, we assess the adoption of best practices through the analysis of 1,380 Jupyter notebooks retrieved from the Kaggle platform. Findings reveal that experts are mostly aware of the best practices and tend to adopt them in their daily work. Nonetheless, they do not consistently follow all the recommendations as, depending on specific contexts, some are deemed unfeasible or counterproductive due to the lack of proper tool support. As such, we envision the design of notebook solutions that allow data scientists not to have to prioritize exploration and rapid prototyping over writing code of quality.

Related papers

exHarmony: Authorship and Citations for Benchmarking the Reviewer Assignment Problem [11.763640675057076]
We develop a benchmark dataset for evaluating the reviewer assignment problem without needing explicit labels. We benchmark various methods, including traditional lexical matching, static neural embeddings, and contextualized neural embeddings. Our results indicate that while traditional methods perform reasonably well, contextualized embeddings trained on scholarly literature show the best performance.
arXiv Detail & Related papers (2025-02-11T16:35:04Z)
Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application [54.984348122105516]
Large Language Models (LLMs) pretrained on massive text corpus presents a promising avenue for enhancing recommender systems. We propose an Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework that synergizes open-world knowledge with collaborative knowledge.
arXiv Detail & Related papers (2024-05-07T04:00:30Z)
Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large Language Models [95.96734086126469]
Large language models (LLMs) can serve as the assistant to help users accomplish their jobs, and also support the development of advanced applications. For the wide application of LLMs, the inference efficiency is an essential concern, which has been widely studied in existing work. We perform a detailed coarse-to-fine analysis of the inference performance of various code libraries.
arXiv Detail & Related papers (2024-04-17T15:57:50Z)
A Systematic Mapping Study and Practitioner Insights on the Use of Software Engineering Practices to Develop MVPs [1.6432083797787214]
We identified 33 papers published between 2013 and 2020 and observed some trends related to MVP ideation and evaluation practices. There is an emphasis on end-user validations based on practices such as usability tests, A/B testing, and usage data analysis. There is still limited research related to MVP technical feasibility assessment and effort estimation.
arXiv Detail & Related papers (2023-05-15T02:00:47Z)
Pre-trained Embeddings for Entity Resolution: An Experimental Analysis [Experiment, Analysis & Benchmark] [65.11858854040544]
We perform a thorough experimental analysis of 12 popular language models over 17 established benchmark datasets. First, we assess their vectorization overhead for converting all input entities into dense embeddings vectors. Second, we investigate their blocking performance, performing a detailed scalability analysis, and comparing them with the state-of-the-art deep learning-based blocking method. Third, we conclude with their relative performance for both supervised and unsupervised matching.
arXiv Detail & Related papers (2023-04-24T08:53:54Z)
Mining the Characteristics of Jupyter Notebooks in Data Science Projects [1.655246222110267]
The computational notebook (e.g., Jupyter Notebook) is a well-known data science tool adopted in practice. This research aims to understand the characteristics of high-voted Jupyter Notebooks on Kaggle and the popular Jupyter Notebooks for data science projects on GitHub.
arXiv Detail & Related papers (2023-04-11T16:30:53Z)
Tag-Aware Document Representation for Research Paper Recommendation [68.8204255655161]
We propose a hybrid approach that leverages deep semantic representation of research papers based on social tags assigned by users. The proposed model is effective in recommending research papers even when the rating data is very sparse.
arXiv Detail & Related papers (2022-09-08T09:13:07Z)
Efficient Real-world Testing of Causal Decision Making via Bayesian Experimental Design for Contextual Optimisation [12.37745209793872]
We introduce a model-agnostic framework for gathering data to evaluate and improve contextual decision making. Our method is used for the data-efficient evaluation of the regret of past treatment assignments.
arXiv Detail & Related papers (2022-07-12T01:20:11Z)
Benchopt: Reproducible, efficient and collaborative optimization benchmarks [67.29240500171532]
Benchopt is a framework to automate, reproduce and publish optimization benchmarks in machine learning. Benchopt simplifies benchmarking for the community by providing an off-the-shelf tool for running, sharing and extending experiments.
arXiv Detail & Related papers (2022-06-27T16:19:24Z)
A Field Guide to Federated Optimization [161.3779046812383]
Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data. This paper provides recommendations and guidelines on formulating, designing, evaluating and analyzing federated optimization algorithms.
arXiv Detail & Related papers (2021-07-14T18:09:08Z)
Benchmarking in Optimization: Best Practice and Open Issues [9.710173903804373]
This survey compiles ideas and recommendations from more than a dozen researchers with different backgrounds and from different institutes around the world. The article discusses eight essential topics in benchmarking: clearly stated goals, well-specified problems, suitable algorithms, adequate performance measures, thoughtful analysis, effective and efficient designs, comprehensible presentations, and guaranteed.
arXiv Detail & Related papers (2020-07-07T14:20:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.