A Philosophy of Data
- URL: http://arxiv.org/abs/2004.09990v2
- Date: Wed, 20 May 2020 12:36:57 GMT
- Title: A Philosophy of Data
- Authors: Alexander M. Mussgnug
- Abstract summary: We work from the fundamental properties necessary for statistical computation to a definition of statistical data.
We argue that the need for useful data to be commensurable rules out an understanding of properties as fundamentally unique or equal.
With our increasing reliance on data and data technologies, these two characteristics of data affect our collective conception of reality.
- Score: 91.3755431537592
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We argue that while this discourse on data ethics is of critical importance,
it is missing one fundamental point: If more and more efforts in business,
government, science, and our daily lives are data-driven, we should pay more
attention to what exactly we are driven by. Therefore, we need more debate on
what fundamental properties constitute data. In the first section of the paper,
we work from the fundamental properties necessary for statistical computation
to a definition of statistical data. We define a statistical datum as the
coming together of substantive and numerical properties and differentiate
between qualitative and quantitative data. Subsequently, we qualify our
definition by arguing that for data to be practically useful, it needs to be
commensurable in a manner that reveals meaningful differences that allow for
the generation of relevant insights through statistical methodologies. In the
second section, we focus on what our conception of data can contribute to the
discourse on data ethics and beyond. First, we hold that the need for useful
data to be commensurable rules out an understanding of properties as
fundamentally unique or equal. Second, we argue that practical concerns lead us
to increasingly standardize how we operationalize a substantive property; in
other words, how we formalize the relationship between the substantive and
numerical properties of data. Thereby, we also standardize the interpretation
of a property. With our increasing reliance on data and data technologies,
these two characteristics of data affect our collective conception of reality.
Statistical data's exclusion of the fundamentally unique and equal influences
our perspective on the world, and the standardization of substantive properties
can be viewed as profound ontological practice, entrenching ever more pervasive
interpretations of phenomena in our everyday lives.
Related papers
- Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.
We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.
Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - Inherent Inconsistencies of Feature Importance [6.02357145653815]
Feature importance is a method that assigns scores to the contribution of individual features on prediction outcomes.
This paper presents an axiomatic framework designed to establish coherent relationships among the different contexts of feature importance scores.
arXiv Detail & Related papers (2022-06-16T14:21:51Z) - Measuring Fairness of Text Classifiers via Prediction Sensitivity [63.56554964580627]
ACCUMULATED PREDICTION SENSITIVITY measures fairness in machine learning models based on the model's prediction sensitivity to perturbations in input features.
We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness.
arXiv Detail & Related papers (2022-03-16T15:00:33Z) - Faking feature importance: A cautionary tale on the use of
differentially-private synthetic data [3.631918877491949]
This paper presents an empirical analysis of the agreement between the feature importance obtained from raw and from synthetic data.
We apply various utility measures to quantify the agreement in feature importance as this varies with the level of privacy.
This work has important implications for developing synthetic versions of highly sensitive data sets in fields such as finance and healthcare.
arXiv Detail & Related papers (2022-03-02T19:11:43Z) - Competency Problems: On Finding and Removing Artifacts in Language Data [50.09608320112584]
We argue that for complex language understanding tasks, all simple feature correlations are spurious.
We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account.
arXiv Detail & Related papers (2021-04-17T21:34:10Z) - Domain Adaptative Causality Encoder [52.779274858332656]
We leverage the characteristics of dependency trees and adversarial learning to address the tasks of adaptive causality identification and localisation.
We present a new causality dataset, namely MedCaus, which integrates all types of causality in the text.
arXiv Detail & Related papers (2020-11-27T04:14:55Z) - Between Subjectivity and Imposition: Power Dynamics in Data Annotation
for Computer Vision [1.933681537640272]
This paper investigates practices of image data annotation as performed in industrial contexts.
We define data annotation as a sense-making practice, where annotators assign meaning to data through the use of labels.
arXiv Detail & Related papers (2020-07-29T15:02:56Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z) - Jointly Predicting Job Performance, Personality, Cognitive Ability,
Affect, and Well-Being [42.67003631848889]
We create a benchmark for predictive analysis of individuals from a perspective that integrates physical and physiological behavior, psychological states and traits, and job performance.
We design data mining techniques as benchmark and uses real noisy and incomplete data derived from wearable sensors to predict 19 constructs based on 12 standardized well-validated tests.
arXiv Detail & Related papers (2020-06-10T14:30:29Z) - Really Useful Synthetic Data -- A Framework to Evaluate the Quality of
Differentially Private Synthetic Data [2.538209532048867]
Recent advances in generating synthetic data that allow to add principled ways of protecting privacy are a crucial step in sharing statistical information in a privacy preserving way.
To further optimise the inherent trade-off between data privacy and data quality, it is necessary to think closely about the latter.
We develop a framework to evaluate the quality of differentially private synthetic data from an applied researcher's perspective.
arXiv Detail & Related papers (2020-04-16T16:24:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.