Related papers: Modeling subjectivity (by Mimicking Annotator Annotation) in toxic comment identification across diverse communities

Modeling subjectivity (by Mimicking Annotator Annotation) in toxic comment identification across diverse communities

URL: http://arxiv.org/abs/2311.00203v1
Date: Wed, 1 Nov 2023 00:17:11 GMT
Title: Modeling subjectivity (by Mimicking Annotator Annotation) in toxic comment identification across diverse communities
Authors: Senjuti Dutta (1), Sid Mittal (2), Sherol Chen (2), Deepak Ramachandran (2), Ravi Rajakumar (2), Ian Kivlichan (2), Sunny Mak (2), Alena Butryna (2), Praveen Paritosh (2) ((1) University of Tennessee, Knoxville, (2) Google LLC)
Abstract summary: This study aims to identify intuitive variances from annotator disagreement using quantitative analysis. We also evaluate the model's ability to mimic diverse viewpoints on toxicity by varying size of the training data. We conclude that subjectivity is evident across all annotator groups, demonstrating the shortcomings of majority-rule voting.
Score: 3.0284081180864675
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The prevalence and impact of toxic discussions online have made content moderation crucial.Automated systems can play a vital role in identifying toxicity, and reducing the reliance on human moderation.Nevertheless, identifying toxic comments for diverse communities continues to present challenges that are addressed in this paper.The two-part goal of this study is to(1)identify intuitive variances from annotator disagreement using quantitative analysis and (2)model the subjectivity of these viewpoints.To achieve our goal, we published a new dataset\footnote{\url{https://github.com/XXX}} with expert annotators' annotations and used two other public datasets to identify the subjectivity of toxicity.Then leveraging the Large Language Model(LLM),we evaluate the model's ability to mimic diverse viewpoints on toxicity by varying size of the training data and utilizing same set of annotators as the test set used during model training and a separate set of annotators as the test set.We conclude that subjectivity is evident across all annotator groups, demonstrating the shortcomings of majority-rule voting. Moving forward, subjective annotations should serve as ground truth labels for training models for domains like toxicity in diverse communities.

Related papers

ModelCitizens: Representing Community Voices in Online Safety [25.030166169972418]
We introduce MODELCITIZENS, a dataset of 6.8K social media posts and 40K toxicity annotations across diverse identity groups.<n>Our findings highlight the importance of community-informed annotation and modeling for inclusive content moderation.
arXiv Detail & Related papers (2025-07-07T20:15:18Z)
On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [49.60774626839712]
multimodal generative models have sparked critical discussions on their fairness, reliability, and potential for misuse. We propose an evaluation framework designed to assess model reliability through their responses to perturbations in the embedding space. Our method lays the groundwork for detecting unreliable, bias-injected models and retrieval of bias provenance.
arXiv Detail & Related papers (2024-11-21T09:46:55Z)
Diverging Preferences: When do Annotators Disagree and do Models Know? [92.24651142187989]
We develop a taxonomy of disagreement sources spanning 10 categories across four high-level classes. We find that the majority of disagreements are in opposition with standard reward modeling approaches. We develop methods for identifying diverging preferences to mitigate their influence on evaluation and training.
arXiv Detail & Related papers (2024-10-18T17:32:22Z)
Voices in a Crowd: Searching for Clusters of Unique Perspectives [8.516397617576978]
Proposed solutions aim to capture minority perspectives by either modelling annotator disagreements or grouping annotators based on shared metadata. We propose a framework that trains models without encoding annotator metadata, extracts latent embeddings informed by annotator behaviour, and creates clusters of similar opinions.
arXiv Detail & Related papers (2024-07-19T12:37:15Z)
Decoding Susceptibility: Modeling Misbelief to Misinformation Through a Computational Approach [61.04606493712002]
Susceptibility to misinformation describes the degree of belief in unverifiable claims that is not observable. Existing susceptibility studies heavily rely on self-reported beliefs. We propose a computational approach to model users' latent susceptibility levels.
arXiv Detail & Related papers (2023-11-16T07:22:56Z)
Stable Bias: Analyzing Societal Representations in Diffusion Models [72.27121528451528]
We propose a new method for exploring the social biases in Text-to-Image (TTI) systems. Our approach relies on characterizing the variation in generated images triggered by enumerating gender and ethnicity markers in the prompts. We leverage this method to analyze images generated by 3 popular TTI systems and find that while all of their outputs show correlations with US labor demographics, they also consistently under-represent marginalized identities to different extents.
arXiv Detail & Related papers (2023-03-20T19:32:49Z)
Towards Reliable Assessments of Demographic Disparities in Multi-Label Image Classifiers [11.973749734226852]
We consider multi-label image classification and, specifically, object categorization tasks. Design choices and trade-offs for measurement involve more nuance than discussed in prior computer vision literature. We identify several design choices that look merely like implementation details but significantly impact the conclusions of assessments.
arXiv Detail & Related papers (2023-02-16T20:34:54Z)
Are Neural Topic Models Broken? [81.15470302729638]
We study the relationship between automated and human evaluation of topic models. We find that neural topic models fare worse in both respects compared to an established classical method.
arXiv Detail & Related papers (2022-10-28T14:38:50Z)
Investigating Bias In Automatic Toxic Comment Detection: An Empirical Study [1.5609988622100528]
With surge in online platforms, there has been an upsurge in the user engagement on these platforms via comments and reactions. A large portion of such textual comments are abusive, rude and offensive to the audience. With machine learning systems in-place to check such comments coming onto platform, biases present in the training data gets passed onto the classifier leading to discrimination against a set of classes, religion and gender.
arXiv Detail & Related papers (2021-08-14T08:24:13Z)
Estimating and Improving Fairness with Adversarial Learning [65.99330614802388]
We propose an adversarial multi-task training strategy to simultaneously mitigate and detect bias in the deep learning-based medical image analysis system. Specifically, we propose to add a discrimination module against bias and a critical module that predicts unfairness within the base classification model. We evaluate our framework on a large-scale public-available skin lesion dataset.
arXiv Detail & Related papers (2021-03-07T03:10:32Z)
ToxCCIn: Toxic Content Classification with Interpretability [16.153683223016973]
Explanations are important for tasks like offensive language or toxicity detection on social media. We propose a technique to improve the interpretability of transformer models, based on a simple and powerful assumption. We find this approach effective and can produce explanations that exceed the quality of those provided by Logistic Regression analysis.
arXiv Detail & Related papers (2021-03-01T22:17:10Z)
User Ex Machina : Simulation as a Design Probe in Human-in-the-Loop Text Analytics [29.552736183006672]
We conduct a simulation-based analysis of human-centered interactions with topic models. We find that user interactions have impacts that differ in magnitude but often negatively affect the quality of the resulting modelling.
arXiv Detail & Related papers (2021-01-06T19:44:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.