How do Data Science Workers Collaborate? Roles, Workflows, and Tools
- URL: http://arxiv.org/abs/2001.06684v3
- Date: Thu, 16 Apr 2020 16:38:43 GMT
- Title: How do Data Science Workers Collaborate? Roles, Workflows, and Tools
- Authors: Amy X. Zhang, Michael Muller, Dakuo Wang
- Abstract summary: We conducted an online survey with 183 participants who work in various aspects of data science.
We found that data science teams are extremely collaborative and work with a variety of stakeholders and tools.
- Score: 30.725728321928823
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Today, the prominence of data science within organizations has given rise to
teams of data science workers collaborating on extracting insights from data,
as opposed to individual data scientists working alone. However, we still lack
a deep understanding of how data science workers collaborate in practice. In
this work, we conducted an online survey with 183 participants who work in
various aspects of data science. We focused on their reported interactions with
each other (e.g., managers with engineers) and with different tools (e.g.,
Jupyter Notebook). We found that data science teams are extremely collaborative
and work with a variety of stakeholders and tools during the six common steps
of a data science workflow (e.g., clean data and train model). We also found
that the collaborative practices workers employ, such as documentation, vary
according to the kinds of tools they use. Based on these findings, we discuss
design implications for supporting data science team collaborations and future
research directions.
Related papers
- Introduction to the Usage of Open Data from the Large Hadron Collider for Computer Scientists in the Context of Machine Learning [0.0]
We convert open data from the Large Hadron Collider to pandas DataFrames, a well-known format in computer science.
This paper aims to serve as a starting point for future interdisciplinary collaborations between computer scientists and physicists.
arXiv Detail & Related papers (2025-01-12T18:19:28Z) - MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows [58.56005277371235]
We introduce MASSW, a comprehensive text dataset on Multi-Aspect Summarization of ScientificAspects.
MASSW includes more than 152,000 peer-reviewed publications from 17 leading computer science conferences spanning the past 50 years.
We demonstrate the utility of MASSW through multiple novel machine-learning tasks that can be benchmarked using this new dataset.
arXiv Detail & Related papers (2024-06-10T15:19:09Z) - DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows [72.40917624485822]
We introduce DataDreamer, an open source Python library that allows researchers to implement powerful large language models.
DataDreamer also helps researchers adhere to best practices that we propose to encourage open science.
arXiv Detail & Related papers (2024-02-16T00:10:26Z) - On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - Why is AI not a Panacea for Data Workers? An Interview Study on Human-AI
Collaboration in Data Storytelling [59.08591308749448]
We interviewed eighteen data workers from both industry and academia to learn where and how they would like to collaborate with AI.
Surprisingly, though the participants showed excitement about collaborating with AI, many of them also expressed reluctance and pointed out nuanced reasons.
arXiv Detail & Related papers (2023-04-17T15:30:05Z) - TAPS Responsibility Matrix: A tool for responsible data science by
design [2.2973034509761816]
We describe the Transparency, Accountability, Privacy, and Societal Responsibility Matrix (TAPS-RM) as framework to explore social, legal, and ethical aspects of data science projects.
We map the developed model of TAPS-RM with well-known initiatives for open data.
We conclude that TAPS-RM is a tool to reflect on responsibilities at a data science project level and can be used to advance responsible data science by design.
arXiv Detail & Related papers (2023-02-02T12:09:14Z) - How Data Scientists Review the Scholarly Literature [4.406926847270567]
We examine the literature review practices of data scientists.
Data science represents a field seeing an exponential rise in papers.
No prior work has examined the specific practices and challenges faced by these scientists.
arXiv Detail & Related papers (2023-01-10T03:53:05Z) - Modeling Information Change in Science Communication with Semantically
Matched Paraphrases [50.67030449927206]
SPICED is the first paraphrase dataset of scientific findings annotated for degree of information change.
SPICED contains 6,000 scientific finding pairs extracted from news stories, social media discussions, and full texts of original papers.
Models trained on SPICED improve downstream performance on evidence retrieval for fact checking of real-world scientific claims.
arXiv Detail & Related papers (2022-10-24T07:44:38Z) - Data+Shift: Supporting visual investigation of data distribution shifts
by data scientists [1.6311150636417262]
Data+Shift is a visual analytics tool to support data scientists in the task of investigating the underlying factors of shift in data features.
We validated our approach with a think-aloud experiment where a data scientist used the tool for a fraud detection use case.
arXiv Detail & Related papers (2022-04-29T11:50:25Z) - Human-Machine Collaboration for Democratizing Data Science [23.385646192087922]
textscVisualSynth relies on the user providing colored sketches, i.e., coloring parts of the spreadsheet, to partially specify data science tasks.
It performs various data analysis tasks ranging from data wrangling, data selection, clustering, constraint learning, predictive modeling and auto-completion.
arXiv Detail & Related papers (2020-04-23T12:50:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.