Biases in Data Science Lifecycle
- URL: http://arxiv.org/abs/2009.09795v2
- Date: Tue, 27 Oct 2020 12:31:24 GMT
- Title: Biases in Data Science Lifecycle
- Authors: Dinh-An Ho and Oya Beyan
- Abstract summary: The aim of this study is to provide a practical guideline to data scientists and increase their awareness.
In this work, we reviewed different sources of biases and grouped them under different stages of the data science lifecycle.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In recent years, data science has become an indispensable part of our
society. Over time, we have become reliant on this technology because of its
opportunity to gain value and new insights from data in any field - business,
socializing, research and society. At the same time, it raises questions about
how justified we are in placing our trust in these technologies. There is a
risk that such powers may lead to biased, inappropriate or unintended actions.
Therefore, ethical considerations which might occur as the result of data
science practices should be carefully considered and these potential problems
should be identified during the data science lifecycle and mitigated if
possible. However, a typical data scientist has not enough knowledge for
identifying these challenges and it is not always possible to include an ethics
expert during data science production. The aim of this study is to provide a
practical guideline to data scientists and increase their awareness. In this
work, we reviewed different sources of biases and grouped them under different
stages of the data science lifecycle. The work is still under progress. The aim
of early publishing is to collect community feedback and improve the curated
knowledge base for bias types and solutions.
Related papers
- Eagle: Ethical Dataset Given from Real Interactions [74.7319697510621]
We create datasets extracted from real interactions between ChatGPT and users that exhibit social biases, toxicity, and immoral problems.
Our experiments show that Eagle captures complementary aspects, not covered by existing datasets proposed for evaluation and mitigation of such ethical challenges.
arXiv Detail & Related papers (2024-02-22T03:46:02Z) - Data Science for Social Good [2.8621556092850065]
We present a framework for "data science for social good" (DSSG) research.
We perform an analysis of the literature to empirically demonstrate the paucity of work on DSSG in information systems.
We hope that this article and the special issue will spur future DSSG research.
arXiv Detail & Related papers (2023-11-02T15:40:20Z) - On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - A data science axiology: the nature, value, and risks of data science [0.0]
Data science is a research paradigm with an unfathomed scope, scale, complexity, and power for knowledge discovery.
This paper presents an axiology of data science, its purpose, nature, importance, risks, and value for problem solving.
arXiv Detail & Related papers (2023-07-19T21:12:04Z) - How Data Scientists Review the Scholarly Literature [4.406926847270567]
We examine the literature review practices of data scientists.
Data science represents a field seeing an exponential rise in papers.
No prior work has examined the specific practices and challenges faced by these scientists.
arXiv Detail & Related papers (2023-01-10T03:53:05Z) - A Non-Expert's Introduction to Data Ethics for Mathematicians [0.0]
I begin with some background information and societal context for data ethics.
I briefly highlight a few efforts -- at my home institution and elsewhere -- on data ethics, society, and social good.
I then discuss open data in research, research replicability and some other ethical issues in research.
I then discuss ethical principles, institutional review boards, and a few other considerations in the scientific use of human data.
arXiv Detail & Related papers (2022-01-18T23:31:06Z) - An Ethical Highlighter for People-Centric Dataset Creation [62.886916477131486]
We propose an analytical framework to guide ethical evaluation of existing datasets and to serve future dataset creators in avoiding missteps.
Our work is informed by a review and analysis of prior works and highlights where such ethical challenges arise.
arXiv Detail & Related papers (2020-11-27T07:18:44Z) - Data Science: A Comprehensive Overview [42.98602883069444]
The twenty-first century has ushered in the age of big data and data economy, in which data DNA has become an intrinsic constituent of all data-based organisms.
An appropriate understanding of data DNA and its organisms relies on the new field of data science and its keystone, analytics.
This article is the first in the field to draw a comprehensive big picture, in addition to offering rich observations, lessons and thinking about data science and analytics.
arXiv Detail & Related papers (2020-07-01T02:33:58Z) - Data Science: Nature and Pitfalls [42.98602883069444]
A critical matter for the healthy development of data science in its early stages is to deeply understand the nature of data and data science.
These important issues motivate the discussions in this article.
arXiv Detail & Related papers (2020-06-28T02:06:54Z) - A Data Scientist's Guide to Streamflow Prediction [55.22219308265945]
We focus on the element of hydrologic rainfall--runoff models and their application to forecast floods and predict streamflow.
This guide aims to help interested data scientists gain an understanding of the problem, the hydrologic concepts involved, and the details that come up along the way.
arXiv Detail & Related papers (2020-06-05T08:04:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.