Between Subjectivity and Imposition: Power Dynamics in Data Annotation
for Computer Vision
- URL: http://arxiv.org/abs/2007.14886v2
- Date: Thu, 30 Jul 2020 11:03:00 GMT
- Title: Between Subjectivity and Imposition: Power Dynamics in Data Annotation
for Computer Vision
- Authors: Milagros Miceli and Martin Schuessler and Tianling Yang
- Abstract summary: This paper investigates practices of image data annotation as performed in industrial contexts.
We define data annotation as a sense-making practice, where annotators assign meaning to data through the use of labels.
- Score: 1.933681537640272
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The interpretation of data is fundamental to machine learning. This paper
investigates practices of image data annotation as performed in industrial
contexts. We define data annotation as a sense-making practice, where
annotators assign meaning to data through the use of labels. Previous
human-centered investigations have largely focused on annotators subjectivity
as a major cause for biased labels. We propose a wider view on this issue:
guided by constructivist grounded theory, we conducted several weeks of
fieldwork at two annotation companies. We analyzed which structures, power
relations, and naturalized impositions shape the interpretation of data. Our
results show that the work of annotators is profoundly informed by the
interests, values, and priorities of other actors above their station.
Arbitrary classifications are vertically imposed on annotators, and through
them, on data. This imposition is largely naturalized. Assigning meaning to
data is often presented as a technical matter. This paper shows it is, in fact,
an exercise of power with multiple implications for individuals and society.
Related papers
- Discipline and Label: A WEIRD Genealogy and Social Theory of Data
Annotation [11.48611587310938]
Data annotation remains the sine qua non of machine learning and AI.
Recent empirical work has highlighted the importance of rater diversity for fairness, model performance, and the role of annotator subjectivity on labels.
This paper outlines a critical genealogy of data annotation; starting with its psychological and perceptual aspects.
arXiv Detail & Related papers (2024-02-09T22:21:55Z) - Same or Different? Diff-Vectors for Authorship Analysis [78.83284164605473]
In classic'' authorship analysis a feature vector represents a document, the value of a feature represents (an increasing function of) the relative frequency of the feature in the document, and the class label represents the author of the document.
Our experiments tackle same-author verification, authorship verification, and closed-set authorship attribution; while DVs are naturally geared for solving the 1st, we also provide two novel methods for solving the 2nd and 3rd.
arXiv Detail & Related papers (2023-01-24T08:48:12Z) - Improving Fairness in Large-Scale Object Recognition by CrowdSourced
Demographic Information [7.968124582214686]
Representing objects fairly in machine learning datasets will lead to models that are less biased towards a particular culture.
We propose a simple and general approach, based on crowdsourcing the demographic composition of the contributors.
We present analysis which leads to a much fairer coverage of the world compared to existing datasets.
arXiv Detail & Related papers (2022-06-02T22:55:10Z) - Whose AI Dream? In search of the aspiration in data annotation [12.454034525520497]
This paper investigates the work practices concerning data annotation as performed in the industry, in India.
Previous investigations have largely focused on annotator subjectivity, bias and efficiency.
Our results show that the work of annotators is dictated by the interests, priorities and values of others above their station.
arXiv Detail & Related papers (2022-03-21T06:28:54Z) - Studying Up Machine Learning Data: Why Talk About Bias When We Mean
Power? [0.0]
We argue that reducing societal problems to "bias" misses the context-based nature of data.
We highlight the corporate forces and market imperatives involved in the labor of data workers that subsequently shape ML datasets.
arXiv Detail & Related papers (2021-09-16T17:38:26Z) - Towards Measuring Bias in Image Classification [61.802949761385]
Convolutional Neural Networks (CNN) have become state-of-the-art for the main computer vision tasks.
However, due to the complex structure their decisions are hard to understand which limits their use in some context of the industrial world.
We present a systematic approach to uncover data bias by means of attribution maps.
arXiv Detail & Related papers (2021-07-01T10:50:39Z) - Competency Problems: On Finding and Removing Artifacts in Language Data [50.09608320112584]
We argue that for complex language understanding tasks, all simple feature correlations are spurious.
We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account.
arXiv Detail & Related papers (2021-04-17T21:34:10Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets [64.76453161039973]
REVISE (REvealing VIsual biaSEs) is a tool that assists in the investigation of a visual dataset.
It surfacing potential biases along three dimensions: (1) object-based, (2) person-based, and (3) geography-based.
arXiv Detail & Related papers (2020-04-16T23:54:37Z) - A Philosophy of Data [91.3755431537592]
We work from the fundamental properties necessary for statistical computation to a definition of statistical data.
We argue that the need for useful data to be commensurable rules out an understanding of properties as fundamentally unique or equal.
With our increasing reliance on data and data technologies, these two characteristics of data affect our collective conception of reality.
arXiv Detail & Related papers (2020-04-15T14:47:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.