Related papers: An Ethical Highlighter for People-Centric Dataset Creation

An Ethical Highlighter for People-Centric Dataset Creation

URL: http://arxiv.org/abs/2011.13583v1
Date: Fri, 27 Nov 2020 07:18:44 GMT
Title: An Ethical Highlighter for People-Centric Dataset Creation
Authors: Margot Hanley, Apoorv Khandelwal, Hadar Averbuch-Elor, Noah Snavely and Helen Nissenbaum
Abstract summary: We propose an analytical framework to guide ethical evaluation of existing datasets and to serve future dataset creators in avoiding missteps. Our work is informed by a review and analysis of prior works and highlights where such ethical challenges arise.
Score: 62.886916477131486
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Important ethical concerns arising from computer vision datasets of people have been receiving significant attention, and a number of datasets have been withdrawn as a result. To meet the academic need for people-centric datasets, we propose an analytical framework to guide ethical evaluation of existing datasets and to serve future dataset creators in avoiding missteps. Our work is informed by a review and analysis of prior works and highlights where such ethical challenges arise.

Related papers

A Critical Field Guide for Working with Machine Learning Datasets [0.716879432974126]
Critical Field Guide for Working with Machine Learning datasets suggests practical guidance for conscientious dataset stewardship. Offers questions, suggestions, strategies, and resources for working with existing machine learning datasets. Students, journalists, artists, researchers, and developers can be more capable of avoiding the problems unique to datasets.
arXiv Detail & Related papers (2025-01-26T11:43:33Z)
Building Better Datasets: Seven Recommendations for Responsible Design from Dataset Creators [0.5755004576310334]
We interviewed 18 leading dataset creators about the current state of the field. We shed light on the challenges and considerations faced by dataset creators. We share seven central recommendations for improving responsible dataset creation.
arXiv Detail & Related papers (2024-08-30T20:52:19Z)
Lazy Data Practices Harm Fairness Research [49.02318458244464]
We present a comprehensive analysis of fair ML datasets, demonstrating how unreflective practices hinder the reach and reliability of algorithmic fairness findings. Our analyses identify three main areas of concern: (1) a textbflack of representation for certain protected attributes in both data and evaluations; (2) the widespread textbf of minorities during data preprocessing; and (3) textbfopaque data processing threatening the generalization of fairness research. This study underscores the need for a critical reevaluation of data practices in fair ML and offers directions to improve both the sourcing and usage of datasets.
arXiv Detail & Related papers (2024-04-26T09:51:24Z)
Data Augmentation in Human-Centric Vision [54.97327269866757]
This survey presents a comprehensive analysis of data augmentation techniques in human-centric vision tasks. It delves into a wide range of research areas including person ReID, human parsing, human pose estimation, and pedestrian detection. Our work categorizes data augmentation methods into two main types: data generation and data perturbation.
arXiv Detail & Related papers (2024-03-13T16:05:18Z)
Benchmarking Data Science Agents [11.582116078653968]
Large Language Models (LLMs) have emerged as promising aids as data science agents, assisting humans in data analysis and processing. Yet their practical efficacy remains constrained by the varied demands of real-world applications and complicated analytical process. We introduce DSEval -- a novel evaluation paradigm, as well as a series of innovative benchmarks tailored for assessing the performance of these agents.
arXiv Detail & Related papers (2024-02-27T03:03:06Z)
When is Off-Policy Evaluation (Reward Modeling) Useful in Contextual Bandits? A Data-Centric Perspective [64.73162159837956]
evaluating the value of a hypothetical target policy with only a logged dataset is important but challenging. We propose DataCOPE, a data-centric framework for evaluating a target policy given a dataset. Our empirical analysis of DataCOPE in the logged contextual bandit settings using healthcare datasets confirms its ability to evaluate both machine-learning and human expert policies.
arXiv Detail & Related papers (2023-11-23T17:13:37Z)
On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies. Machine and deep learning algorithms depend heavily on the data used during their development. We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z)
EALM: Introducing Multidimensional Ethical Alignment in Conversational Information Retrieval [43.72331337131317]
We introduce a workflow that integrates ethical alignment with an initial ethical judgment stage for efficient data screening. We present the QA-ETHICS dataset adapted from the ETHICS benchmark, which serves as an evaluation tool by unifying scenarios and label meanings. In addition, we suggest a new approach that achieves top performance in both binary and multi-label ethical judgment tasks.
arXiv Detail & Related papers (2023-10-02T08:22:34Z)
Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic Data [91.52783572568214]
Synthetic data may become a dominant force in the machine learning world, promising a future where datasets can be tailored to individual needs. We discuss which fundamental challenges the community needs to overcome for wider relevance and application of synthetic data.
arXiv Detail & Related papers (2023-04-07T16:38:40Z)
A Survey of Dataset Refinement for Problems in Computer Vision Datasets [11.45536223418548]
Large-scale datasets have played a crucial role in the advancement of computer vision. They often suffer from problems such as class imbalance, noisy labels, dataset bias, or high resource costs. Various data-centric solutions have been proposed to solve the dataset problems. They improve the quality of datasets by re-organizing them, which we call dataset refinement.
arXiv Detail & Related papers (2022-10-21T03:58:43Z)
Bringing the People Back In: Contesting Benchmark Machine Learning Datasets [11.00769651520502]
We outline a research program - a genealogy of machine learning data - for investigating how and why these datasets have been created. We describe the ways in which benchmark datasets in machine learning operate as infrastructure and pose four research questions for these datasets.
arXiv Detail & Related papers (2020-07-14T23:22:13Z)
REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets [64.76453161039973]
REVISE (REvealing VIsual biaSEs) is a tool that assists in the investigation of a visual dataset. It surfacing potential biases along three dimensions: (1) object-based, (2) person-based, and (3) geography-based.
arXiv Detail & Related papers (2020-04-16T23:54:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.