Algorithmic Arbitrariness in Content Moderation
- URL: http://arxiv.org/abs/2402.16979v1
- Date: Mon, 26 Feb 2024 19:27:00 GMT
- Title: Algorithmic Arbitrariness in Content Moderation
- Authors: Juan Felipe Gomez and Caio Vieira Machado and Lucas Monteiro Paes and
Flavio P. Calmon
- Abstract summary: We show how content moderation tools can arbitrarily classify samples as toxic.
We discuss these findings in terms of human rights set out by the International Covenant on Civil and Political Rights (ICCPR)
Our study underscores the need to identify and increase the transparency of arbitrariness in content moderation applications.
- Score: 1.4849645397321183
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning (ML) is widely used to moderate online content. Despite its
scalability relative to human moderation, the use of ML introduces unique
challenges to content moderation. One such challenge is predictive
multiplicity: multiple competing models for content classification may perform
equally well on average, yet assign conflicting predictions to the same
content. This multiplicity can result from seemingly innocuous choices during
model development, such as random seed selection for parameter initialization.
We experimentally demonstrate how content moderation tools can arbitrarily
classify samples as toxic, leading to arbitrary restrictions on speech. We
discuss these findings in terms of human rights set out by the International
Covenant on Civil and Political Rights (ICCPR), namely freedom of expression,
non-discrimination, and procedural justice. We analyze (i) the extent of
predictive multiplicity among state-of-the-art LLMs used for detecting toxic
content; (ii) the disparate impact of this arbitrariness across social groups;
and (iii) how model multiplicity compares to unambiguous human classifications.
Our findings indicate that the up-scaled algorithmic moderation risks
legitimizing an algorithmic leviathan, where an algorithm disproportionately
manages human rights. To mitigate such risks, our study underscores the need to
identify and increase the transparency of arbitrariness in content moderation
applications. Since algorithmic content moderation is being fueled by pressing
social concerns, such as disinformation and hate speech, our discussion on
harms raises concerns relevant to policy debates. Our findings also contribute
to content moderation and intermediary liability laws being discussed and
passed in many countries, such as the Digital Services Act in the European
Union, the Online Safety Act in the United Kingdom, and the Fake News Bill in
Brazil.
Related papers
- Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights [50.89022445197919]
We propose a speech-specific risk taxonomy, covering 8 risk categories under hostility (malicious sarcasm and threats), malicious imitation (age, gender, ethnicity), and stereotypical biases (age, gender, ethnicity)
Based on the taxonomy, we create a small-scale dataset for evaluating current LMMs capability in detecting these categories of risk.
arXiv Detail & Related papers (2024-06-25T10:08:45Z) - Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment [84.32768080422349]
Alignment with human preference prevents large language models from generating misleading or toxic content.
We propose a new formulation of prompt diversity, implying a linear correlation with the final performance of LLMs after fine-tuning.
arXiv Detail & Related papers (2024-03-17T07:08:55Z) - Content Moderation on Social Media in the EU: Insights From the DSA
Transparency Database [0.0]
Digital Services Act (DSA) requires large social media platforms in the EU to provide clear and specific information whenever they restrict access to certain content.
Statements of Reasons (SoRs) are collected in the DSA Transparency Database to ensure transparency and scrutiny of content moderation decisions.
We empirically analyze 156 million SoRs within an observation period of two months to provide an early look at content moderation decisions of social media platforms in the EU.
arXiv Detail & Related papers (2023-12-07T16:56:19Z) - Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation.
We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process.
We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z) - An Investigation of Representation and Allocation Harms in Contrastive
Learning [55.42336321517228]
We demonstrate that contrastive learning (CL) tends to collapse representations of minority groups with certain majority groups.
We refer to this phenomenon as representation harm and demonstrate it on image and text datasets using the corresponding popular CL methods.
We provide a theoretical explanation for representation harm using a neural block model that leads to a representational collapse in a contrastive learning setting.
arXiv Detail & Related papers (2023-10-02T19:25:37Z) - Compatibility of Fairness Metrics with EU Non-Discrimination Laws:
Demographic Parity & Conditional Demographic Disparity [3.5607241839298878]
Empirical evidence suggests that algorithmic decisions driven by Machine Learning (ML) techniques threaten to discriminate against legally protected groups or create new sources of unfairness.
This work aims at assessing up to what point we can assure legal fairness through fairness metrics and under fairness constraints.
Our experiments and analysis suggest that AI-assisted decision-making can be fair from a legal perspective depending on the case at hand and the legal justification.
arXiv Detail & Related papers (2023-06-14T09:38:05Z) - Bias, diversity, and challenges to fairness in classification and
automated text analysis. From libraries to AI and back [3.9198548406564604]
We investigate the risks surrounding bias and unfairness in AI usage in classification and automated text analysis.
We take a closer look at the notion of '(un)fairness' in relation to the notion of 'diversity'
arXiv Detail & Related papers (2023-03-07T20:54:49Z) - User-Centered Security in Natural Language Processing [0.7106986689736825]
dissertation proposes a framework of user-centered security in Natural Language Processing (NLP)
It focuses on two security domains within NLP with great public interest.
arXiv Detail & Related papers (2023-01-10T22:34:19Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - A Keyword Based Approach to Understanding the Overpenalization of
Marginalized Groups by English Marginal Abuse Models on Twitter [2.9604738405097333]
Harmful content detection models tend to have higher false positive rates for content from marginalized groups.
We propose a principled approach to detecting and measuring the severity of potential harms associated with a text-based model.
We apply our methodology to audit Twitter's English marginal abuse model, which is used for removing amplification eligibility of marginally abusive content.
arXiv Detail & Related papers (2022-10-07T20:28:00Z) - Modeling Content Creator Incentives on Algorithm-Curated Platforms [76.53541575455978]
We study how algorithmic choices affect the existence and character of (Nash) equilibria in exposure games.
We propose tools for numerically finding equilibria in exposure games, and illustrate results of an audit on the MovieLens and LastFM datasets.
arXiv Detail & Related papers (2022-06-27T08:16:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.