Trust in Data Science: Collaboration, Translation, and Accountability in
Corporate Data Science Projects
- URL: http://arxiv.org/abs/2002.03389v1
- Date: Sun, 9 Feb 2020 15:50:50 GMT
- Title: Trust in Data Science: Collaboration, Translation, and Accountability in
Corporate Data Science Projects
- Authors: Samir Passi, Steven J. Jackson
- Abstract summary: We describe four common tensions in applied data science work: (un)equivocal numbers, (counter)intuitive knowledge, (in)credible data, and (in)scrutable models.
We show how organizational actors establish and re-negotiate trust under messy and uncertain analytic conditions through practices of skepticism, assessment, and credibility.
We conclude by discussing the implications of our findings for data science research and practice, both within and beyond CSCW.
- Score: 6.730787776951012
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The trustworthiness of data science systems in applied and real-world
settings emerges from the resolution of specific tensions through situated,
pragmatic, and ongoing forms of work. Drawing on research in CSCW, critical
data studies, and history and sociology of science, and six months of immersive
ethnographic fieldwork with a corporate data science team, we describe four
common tensions in applied data science work: (un)equivocal numbers,
(counter)intuitive knowledge, (in)credible data, and (in)scrutable models. We
show how organizational actors establish and re-negotiate trust under messy and
uncertain analytic conditions through practices of skepticism, assessment, and
credibility. Highlighting the collaborative and heterogeneous nature of
real-world data science, we show how the management of trust in applied
corporate data science settings depends not only on pre-processing and
quantification, but also on negotiation and translation. We conclude by
discussing the implications of our findings for data science research and
practice, both within and beyond CSCW.
Related papers
- SciER: An Entity and Relation Extraction Dataset for Datasets, Methods, and Tasks in Scientific Documents [49.54155332262579]
We release a new entity and relation extraction dataset for entities related to datasets, methods, and tasks in scientific articles.
Our dataset contains 106 manually annotated full-text scientific publications with over 24k entities and 12k relations.
arXiv Detail & Related papers (2024-10-28T15:56:49Z) - DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? [58.330879414174476]
We introduce DSBench, a benchmark designed to evaluate data science agents with realistic tasks.
This benchmark includes 466 data analysis tasks and 74 data modeling tasks, sourced from Eloquence and Kaggle competitions.
Our evaluation of state-of-the-art LLMs, LVLMs, and agents shows that they struggle with most tasks, with the best agent solving only 34.12% of data analysis tasks and achieving a 34.74% Relative Performance Gap (RPG)
arXiv Detail & Related papers (2024-09-12T02:08:00Z) - Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - Data Science for Social Good [2.8621556092850065]
We present a framework for "data science for social good" (DSSG) research.
We perform an analysis of the literature to empirically demonstrate the paucity of work on DSSG in information systems.
We hope that this article and the special issue will spur future DSSG research.
arXiv Detail & Related papers (2023-11-02T15:40:20Z) - On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - Auditing and Generating Synthetic Data with Controllable Trust Trade-offs [54.262044436203965]
We introduce a holistic auditing framework that comprehensively evaluates synthetic datasets and AI models.
It focuses on preventing bias and discrimination, ensures fidelity to the source data, assesses utility, robustness, and privacy preservation.
We demonstrate the framework's effectiveness by auditing various generative models across diverse use cases.
arXiv Detail & Related papers (2023-04-21T09:03:18Z) - TAPS Responsibility Matrix: A tool for responsible data science by
design [2.2973034509761816]
We describe the Transparency, Accountability, Privacy, and Societal Responsibility Matrix (TAPS-RM) as framework to explore social, legal, and ethical aspects of data science projects.
We map the developed model of TAPS-RM with well-known initiatives for open data.
We conclude that TAPS-RM is a tool to reflect on responsibilities at a data science project level and can be used to advance responsible data science by design.
arXiv Detail & Related papers (2023-02-02T12:09:14Z) - Modeling Information Change in Science Communication with Semantically
Matched Paraphrases [50.67030449927206]
SPICED is the first paraphrase dataset of scientific findings annotated for degree of information change.
SPICED contains 6,000 scientific finding pairs extracted from news stories, social media discussions, and full texts of original papers.
Models trained on SPICED improve downstream performance on evidence retrieval for fact checking of real-world scientific claims.
arXiv Detail & Related papers (2022-10-24T07:44:38Z) - SciTweets -- A Dataset and Annotation Framework for Detecting Scientific
Online Discourse [2.3371548697609303]
Scientific topics, claims and resources are increasingly debated as part of online discourse.
This has led to both significant societal impact and increased interest in scientific online discourse from various disciplines.
Research across disciplines currently suffers from a lack of robust definitions of the various forms of science-relatedness.
arXiv Detail & Related papers (2022-06-15T08:14:55Z) - Model Positionality and Computational Reflexivity: Promoting Reflexivity
in Data Science [10.794642538442107]
We describe how the concepts of positionality and reflexivity can be adapted to provide a framework for understanding data science work.
We describe the challenges of adapting these concepts for data science work and offer annotator fingerprinting and position mining as promising solutions.
arXiv Detail & Related papers (2022-03-08T16:02:03Z) - A survey study of success factors in data science projects [0.0]
Agile data science lifecycle is the most widely used framework, but only 25% of the survey participants state to follow a data science project methodology.
Professionals who adhere to a project methodology place greater emphasis on the project's potential risks and pitfalls.
arXiv Detail & Related papers (2022-01-17T09:50:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.