TAPS Responsibility Matrix: A tool for responsible data science by
design
- URL: http://arxiv.org/abs/2302.01041v1
- Date: Thu, 2 Feb 2023 12:09:14 GMT
- Title: TAPS Responsibility Matrix: A tool for responsible data science by
design
- Authors: Visara Urovi, Remzi Celebi, Chang Sun, Linda Rieswijk, Michael Erard,
Arif Yilmaz, Kody Moodley, Parveen Kumar and Michel Dumontier
- Abstract summary: We describe the Transparency, Accountability, Privacy, and Societal Responsibility Matrix (TAPS-RM) as framework to explore social, legal, and ethical aspects of data science projects.
We map the developed model of TAPS-RM with well-known initiatives for open data.
We conclude that TAPS-RM is a tool to reflect on responsibilities at a data science project level and can be used to advance responsible data science by design.
- Score: 2.2973034509761816
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data science is an interdisciplinary research area where scientists are
typically working with data coming from different fields. When using and
analyzing data, the scientists implicitly agree to follow standards,
procedures, and rules set in these fields. However, guidance on the
responsibilities of the data scientists and the other involved actors in a data
science project is typically missing. While literature shows that novel
frameworks and tools are being proposed in support of open-science, data reuse,
and research data management, there are currently no frameworks that can fully
express responsibilities of a data science project. In this paper, we describe
the Transparency, Accountability, Privacy, and Societal Responsibility Matrix
(TAPS-RM) as framework to explore social, legal, and ethical aspects of data
science projects. TAPS-RM acts as a tool to provide users with a holistic view
of their project beyond key outcomes and clarifies the responsibilities of
actors. We map the developed model of TAPS-RM with well-known initiatives for
open data (such as FACT, FAIR and Datasheets for datasets). We conclude that
TAPS-RM is a tool to reflect on responsibilities at a data science project
level and can be used to advance responsible data science by design.
Related papers
- SciER: An Entity and Relation Extraction Dataset for Datasets, Methods, and Tasks in Scientific Documents [49.54155332262579]
We release a new entity and relation extraction dataset for entities related to datasets, methods, and tasks in scientific articles.
Our dataset contains 106 manually annotated full-text scientific publications with over 24k entities and 12k relations.
arXiv Detail & Related papers (2024-10-28T15:56:49Z) - DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? [58.330879414174476]
We introduce DSBench, a benchmark designed to evaluate data science agents with realistic tasks.
This benchmark includes 466 data analysis tasks and 74 data modeling tasks, sourced from Eloquence and Kaggle competitions.
Our evaluation of state-of-the-art LLMs, LVLMs, and agents shows that they struggle with most tasks, with the best agent solving only 34.12% of data analysis tasks and achieving a 34.74% Relative Performance Gap (RPG)
arXiv Detail & Related papers (2024-09-12T02:08:00Z) - The Future of Data Science Education [0.11566458078238004]
The School of Data Science at the University of Virginia has developed a novel model for the definition of Data Science.
This paper will present the core features of the model and explain how it unifies various concepts going far beyond the analytics component of AI.
arXiv Detail & Related papers (2024-07-16T15:11:54Z) - SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks.
SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z) - MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows [58.56005277371235]
We introduce MASSW, a comprehensive text dataset on Multi-Aspect Summarization of ScientificAspects.
MASSW includes more than 152,000 peer-reviewed publications from 17 leading computer science conferences spanning the past 50 years.
We demonstrate the utility of MASSW through multiple novel machine-learning tasks that can be benchmarked using this new dataset.
arXiv Detail & Related papers (2024-06-10T15:19:09Z) - On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - The Problem of Zombie Datasets:A Framework For Deprecating Datasets [55.878249096379804]
We examine the public afterlives of several prominent datasets, including ImageNet, 80 Million Tiny Images, MS-Celeb-1M, Duke MTMC, Brainwash, and HRT Transgender.
We propose a dataset deprecation framework that includes considerations of risk, mitigation of impact, appeal mechanisms, timeline, post-deprecation protocol, and publication checks.
arXiv Detail & Related papers (2021-10-18T20:13:51Z) - Data Science Methodologies: Current Challenges and Future Approaches [0.0]
Lack of vision and clear objectives, a biased emphasis on technical issues, a low level of maturity for ad-hoc projects are among these challenges.
Few methodologies offer a complete guideline across team, project and data & information management.
We propose a conceptual framework containing general characteristics that a methodology for managing data science projects with a holistic point of view should have.
arXiv Detail & Related papers (2021-06-14T10:34:50Z) - Towards Accountability for Machine Learning Datasets: Practices from
Software Engineering and Infrastructure [9.825840279544465]
datasets which empower machine learning are often used, shared and re-used with little visibility into the processes of deliberation which led to their creation.
This paper introduces a rigorous framework for dataset development transparency which supports decision-making and accountability.
arXiv Detail & Related papers (2020-10-23T01:57:42Z) - Computational Skills by Stealth in Secondary School Data Science [16.960800464621993]
We discuss a proposal for the stealth development of computational skills in students' first exposure to data science.
The intent of this approach is to support students, regardless of interest and self-efficacy in coding, in becoming data-driven learners.
arXiv Detail & Related papers (2020-10-08T09:11:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.