Eliciting Best Practices for Collaboration with Computational Notebooks
- URL: http://arxiv.org/abs/2202.07233v1
- Date: Tue, 15 Feb 2022 07:39:37 GMT
- Title: Eliciting Best Practices for Collaboration with Computational Notebooks
- Authors: Luigi Quaranta and Fabio Calefato and Filippo Lanubile
- Abstract summary: We elicit a catalog of best practices for collaborative data science with computational notebooks.
We conduct interviews with professional data scientists to assess their awareness of these best practices.
Findings reveal that experts are mostly aware of the best practices and tend to adopt them in their daily work.
- Score: 10.190501703364234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the widespread adoption of computational notebooks, little is known
about best practices for their usage in collaborative contexts. In this paper,
we fill this gap by eliciting a catalog of best practices for collaborative
data science with computational notebooks. With this aim, we first look for
best practices through a multivocal literature review. Then, we conduct
interviews with professional data scientists to assess their awareness of these
best practices. Finally, we assess the adoption of best practices through the
analysis of 1,380 Jupyter notebooks retrieved from the Kaggle platform.
Findings reveal that experts are mostly aware of the best practices and tend to
adopt them in their daily work. Nonetheless, they do not consistently follow
all the recommendations as, depending on specific contexts, some are deemed
unfeasible or counterproductive due to the lack of proper tool support. As
such, we envision the design of notebook solutions that allow data scientists
not to have to prioritize exploration and rapid prototyping over writing code
of quality.
Related papers
- Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application [54.984348122105516]
Large Language Models (LLMs) pretrained on massive text corpus presents a promising avenue for enhancing recommender systems.
We propose an Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework that synergizes open-world knowledge with collaborative knowledge.
arXiv Detail & Related papers (2024-05-07T04:00:30Z) - Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large Language Models [95.96734086126469]
Large language models (LLMs) can serve as the assistant to help users accomplish their jobs, and also support the development of advanced applications.
For the wide application of LLMs, the inference efficiency is an essential concern, which has been widely studied in existing work.
We perform a detailed coarse-to-fine analysis of the inference performance of various code libraries.
arXiv Detail & Related papers (2024-04-17T15:57:50Z) - A Systematic Mapping Study and Practitioner Insights on the Use of
Software Engineering Practices to Develop MVPs [1.6432083797787214]
We identified 33 papers published between 2013 and 2020 and observed some trends related to MVP ideation and evaluation practices.
There is an emphasis on end-user validations based on practices such as usability tests, A/B testing, and usage data analysis.
There is still limited research related to MVP technical feasibility assessment and effort estimation.
arXiv Detail & Related papers (2023-05-15T02:00:47Z) - Pre-trained Embeddings for Entity Resolution: An Experimental Analysis
[Experiment, Analysis & Benchmark] [65.11858854040544]
We perform a thorough experimental analysis of 12 popular language models over 17 established benchmark datasets.
First, we assess their vectorization overhead for converting all input entities into dense embeddings vectors.
Second, we investigate their blocking performance, performing a detailed scalability analysis, and comparing them with the state-of-the-art deep learning-based blocking method.
Third, we conclude with their relative performance for both supervised and unsupervised matching.
arXiv Detail & Related papers (2023-04-24T08:53:54Z) - Tag-Aware Document Representation for Research Paper Recommendation [68.8204255655161]
We propose a hybrid approach that leverages deep semantic representation of research papers based on social tags assigned by users.
The proposed model is effective in recommending research papers even when the rating data is very sparse.
arXiv Detail & Related papers (2022-09-08T09:13:07Z) - Efficient Real-world Testing of Causal Decision Making via Bayesian
Experimental Design for Contextual Optimisation [12.37745209793872]
We introduce a model-agnostic framework for gathering data to evaluate and improve contextual decision making.
Our method is used for the data-efficient evaluation of the regret of past treatment assignments.
arXiv Detail & Related papers (2022-07-12T01:20:11Z) - Benchopt: Reproducible, efficient and collaborative optimization
benchmarks [67.29240500171532]
Benchopt is a framework to automate, reproduce and publish optimization benchmarks in machine learning.
Benchopt simplifies benchmarking for the community by providing an off-the-shelf tool for running, sharing and extending experiments.
arXiv Detail & Related papers (2022-06-27T16:19:24Z) - A Field Guide to Federated Optimization [161.3779046812383]
Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data.
This paper provides recommendations and guidelines on formulating, designing, evaluating and analyzing federated optimization algorithms.
arXiv Detail & Related papers (2021-07-14T18:09:08Z) - Benchmarking in Optimization: Best Practice and Open Issues [9.710173903804373]
This survey compiles ideas and recommendations from more than a dozen researchers with different backgrounds and from different institutes around the world.
The article discusses eight essential topics in benchmarking: clearly stated goals, well-specified problems, suitable algorithms, adequate performance measures, thoughtful analysis, effective and efficient designs, comprehensible presentations, and guaranteed.
arXiv Detail & Related papers (2020-07-07T14:20:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.