Related papers: TAPS Responsibility Matrix: A tool for responsible data science by design

TAPS Responsibility Matrix: A tool for responsible data science by design

URL: http://arxiv.org/abs/2302.01041v1
Date: Thu, 2 Feb 2023 12:09:14 GMT
Title: TAPS Responsibility Matrix: A tool for responsible data science by design
Authors: Visara Urovi, Remzi Celebi, Chang Sun, Linda Rieswijk, Michael Erard, Arif Yilmaz, Kody Moodley, Parveen Kumar and Michel Dumontier
Abstract summary: We describe the Transparency, Accountability, Privacy, and Societal Responsibility Matrix (TAPS-RM) as framework to explore social, legal, and ethical aspects of data science projects. We map the developed model of TAPS-RM with well-known initiatives for open data. We conclude that TAPS-RM is a tool to reflect on responsibilities at a data science project level and can be used to advance responsible data science by design.
Score: 2.2973034509761816
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Data science is an interdisciplinary research area where scientists are typically working with data coming from different fields. When using and analyzing data, the scientists implicitly agree to follow standards, procedures, and rules set in these fields. However, guidance on the responsibilities of the data scientists and the other involved actors in a data science project is typically missing. While literature shows that novel frameworks and tools are being proposed in support of open-science, data reuse, and research data management, there are currently no frameworks that can fully express responsibilities of a data science project. In this paper, we describe the Transparency, Accountability, Privacy, and Societal Responsibility Matrix (TAPS-RM) as framework to explore social, legal, and ethical aspects of data science projects. TAPS-RM acts as a tool to provide users with a holistic view of their project beyond key outcomes and clarifies the responsibilities of actors. We map the developed model of TAPS-RM with well-known initiatives for open data (such as FACT, FAIR and Datasheets for datasets). We conclude that TAPS-RM is a tool to reflect on responsibilities at a data science project level and can be used to advance responsible data science by design.

Related papers

ScIRGen: Synthesize Realistic and Large-Scale RAG Dataset for Scientific Research [15.983924435685553]
We develop ScIRGen, a dataset generation framework for scientific QA & retrieval.<n>We use it to create a large-scale scientific retrieval-augmented generation (RAG) dataset with realistic queries, datasets and papers.
arXiv Detail & Related papers (2025-06-09T11:47:13Z)
Foundation Models for Spatio-Temporal Data Science: A Tutorial and Survey [69.0648659029394]
Spatio-Temporal (ST) data science is fundamental to understanding complex systems in domains such as urban computing, climate science, and intelligent transportation. Researchers have begun exploring the concept of Spatio-Temporal Foundation Models (STFMs) to enhance adaptability and generalization across diverse ST tasks. STFMs empower the entire workflow of ST data science, ranging from data sensing, management, to mining, thereby offering a more holistic and scalable approach.
arXiv Detail & Related papers (2025-03-12T09:42:18Z)
Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation [58.064940977804596]
A plethora of new AI models and tools has been proposed, promising to empower researchers and academics worldwide to conduct their research more effectively and efficiently. Ethical concerns regarding shortcomings of these tools and potential for misuse take a particularly prominent place in our discussion.
arXiv Detail & Related papers (2025-02-07T18:26:45Z)
SciER: An Entity and Relation Extraction Dataset for Datasets, Methods, and Tasks in Scientific Documents [49.54155332262579]
We release a new entity and relation extraction dataset for entities related to datasets, methods, and tasks in scientific articles. Our dataset contains 106 manually annotated full-text scientific publications with over 24k entities and 12k relations.
arXiv Detail & Related papers (2024-10-28T15:56:49Z)
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? [58.330879414174476]
We introduce DSBench, a benchmark designed to evaluate data science agents with realistic tasks. This benchmark includes 466 data analysis tasks and 74 data modeling tasks, sourced from Eloquence and Kaggle competitions. Our evaluation of state-of-the-art LLMs, LVLMs, and agents shows that they struggle with most tasks, with the best agent solving only 34.12% of data analysis tasks and achieving a 34.74% Relative Performance Gap (RPG)
arXiv Detail & Related papers (2024-09-12T02:08:00Z)
The Future of Data Science Education [0.11566458078238004]
The School of Data Science at the University of Virginia has developed a novel model for the definition of Data Science. This paper will present the core features of the model and explain how it unifies various concepts going far beyond the analytics component of AI.
arXiv Detail & Related papers (2024-07-16T15:11:54Z)
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks. SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z)
MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows [58.56005277371235]
We introduce MASSW, a comprehensive text dataset on Multi-Aspect Summarization of ScientificAspects. MASSW includes more than 152,000 peer-reviewed publications from 17 leading computer science conferences spanning the past 50 years. We demonstrate the utility of MASSW through multiple novel machine-learning tasks that can be benchmarked using this new dataset.
arXiv Detail & Related papers (2024-06-10T15:19:09Z)
On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies. Machine and deep learning algorithms depend heavily on the data used during their development. We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z)
The Problem of Zombie Datasets:A Framework For Deprecating Datasets [55.878249096379804]
We examine the public afterlives of several prominent datasets, including ImageNet, 80 Million Tiny Images, MS-Celeb-1M, Duke MTMC, Brainwash, and HRT Transgender. We propose a dataset deprecation framework that includes considerations of risk, mitigation of impact, appeal mechanisms, timeline, post-deprecation protocol, and publication checks.
arXiv Detail & Related papers (2021-10-18T20:13:51Z)
Data Science Methodologies: Current Challenges and Future Approaches [0.0]
Lack of vision and clear objectives, a biased emphasis on technical issues, a low level of maturity for ad-hoc projects are among these challenges. Few methodologies offer a complete guideline across team, project and data & information management. We propose a conceptual framework containing general characteristics that a methodology for managing data science projects with a holistic point of view should have.
arXiv Detail & Related papers (2021-06-14T10:34:50Z)
Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure [9.825840279544465]
datasets which empower machine learning are often used, shared and re-used with little visibility into the processes of deliberation which led to their creation. This paper introduces a rigorous framework for dataset development transparency which supports decision-making and accountability.
arXiv Detail & Related papers (2020-10-23T01:57:42Z)
Computational Skills by Stealth in Secondary School Data Science [16.960800464621993]
We discuss a proposal for the stealth development of computational skills in students' first exposure to data science. The intent of this approach is to support students, regardless of interest and self-efficacy in coding, in becoming data-driven learners.
arXiv Detail & Related papers (2020-10-08T09:11:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.