Towards Ethics by Design in Online Abusive Content Detection
- URL: http://arxiv.org/abs/2010.14952v1
- Date: Wed, 28 Oct 2020 13:10:24 GMT
- Title: Towards Ethics by Design in Online Abusive Content Detection
- Authors: Svetlana Kiritchenko and Isar Nejadgholi
- Abstract summary: The research effort has spread out across several closely related sub-areas, such as detection of hate speech, toxicity, cyberbullying, etc.
We bring ethical issues to forefront and propose a unified framework as a two-step process.
The novel framework is guided by the Ethics by Design principle and is a step towards building more accurate and trusted models.
- Score: 7.163723138100273
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To support safety and inclusion in online communications, significant efforts
in NLP research have been put towards addressing the problem of abusive content
detection, commonly defined as a supervised classification task. The research
effort has spread out across several closely related sub-areas, such as
detection of hate speech, toxicity, cyberbullying, etc. There is a pressing
need to consolidate the field under a common framework for task formulation,
dataset design and performance evaluation. Further, despite current
technologies achieving high classification accuracies, several ethical issues
have been revealed. We bring ethical issues to forefront and propose a unified
framework as a two-step process. First, online content is categorized around
personal and identity-related subject matters. Second, severity of abuse is
identified through comparative annotation within each category. The novel
framework is guided by the Ethics by Design principle and is a step towards
building more accurate and trusted models.
Related papers
- The Unappreciated Role of Intent in Algorithmic Moderation of Social Media Content [2.2618341648062477]
This paper examines the role of intent in content moderation systems.
We review state of the art detection models and benchmark training datasets for online abuse to assess their awareness and ability to capture intent.
arXiv Detail & Related papers (2024-05-17T18:05:13Z) - The Ethics of ChatGPT in Medicine and Healthcare: A Systematic Review on Large Language Models (LLMs) [0.0]
ChatGPT, Large Language Models (LLMs) have received enormous attention in healthcare.
Despite their potential benefits, researchers have underscored various ethical implications.
This work aims to map the ethical landscape surrounding the current stage of deployment of LLMs in medicine and healthcare.
arXiv Detail & Related papers (2024-03-21T15:20:07Z) - Noise Contrastive Estimation-based Matching Framework for Low-Resource
Security Attack Pattern Recognition [49.536368818512116]
Tactics, Techniques and Procedures (TTPs) represent sophisticated attack patterns in the cybersecurity domain.
We formulate the problem in a different learning paradigm, where the assignment of a text to a TTP label is decided by the direct semantic similarity between the two.
We propose a neural matching architecture with an effective sampling-based learn-to-compare mechanism.
arXiv Detail & Related papers (2024-01-18T19:02:00Z) - DEMASQ: Unmasking the ChatGPT Wordsmith [63.8746084667206]
We propose an effective ChatGPT detector named DEMASQ, which accurately identifies ChatGPT-generated content.
Our method addresses two critical factors: (i) the distinct biases in text composition observed in human- and machine-generated content and (ii) the alterations made by humans to evade previous detection methods.
arXiv Detail & Related papers (2023-11-08T21:13:05Z) - Integrity and Junkiness Failure Handling for Embedding-based Retrieval:
A Case Study in Social Network Search [26.705196461992845]
Embedding based retrieval has seen its usage in a variety of search applications like e-commerce, social networking search etc.
In this paper, we conduct an analysis of embedding-based retrieval launched in early 2021 on our social network search engine.
We define two main categories of failures introduced by it, integrity and junkiness.
arXiv Detail & Related papers (2023-04-18T20:53:47Z) - Verifying the Robustness of Automatic Credibility Assessment [79.08422736721764]
Text classification methods have been widely investigated as a way to detect content of low credibility.
In some cases insignificant changes in input text can mislead the models.
We introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and
Benchmarks [95.29345070102045]
In this paper, we focus our investigation on social bias detection of dialog safety problems.
We first propose a novel Dial-Bias Frame for analyzing the social bias in conversations pragmatically.
We introduce CDail-Bias dataset that is the first well-annotated Chinese social bias dialog dataset.
arXiv Detail & Related papers (2022-02-16T11:59:29Z) - Exposing Query Identification for Search Transparency [69.06545074617685]
We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems.
We derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI.
arXiv Detail & Related papers (2021-10-14T20:19:27Z) - Predicting Themes within Complex Unstructured Texts: A Case Study on
Safeguarding Reports [66.39150945184683]
We focus on the problem of automatically identifying the main themes in a safeguarding report using supervised classification approaches.
Our results show the potential of deep learning models to simulate subject-expert behaviour even for complex tasks with limited labelled data.
arXiv Detail & Related papers (2020-10-27T19:48:23Z) - On Analyzing Annotation Consistency in Online Abusive Behavior Datasets [5.900114841365645]
Researchers have proposed, collected, and annotated online abusive content datasets.
These datasets play a critical role in facilitating the research on online hate speech and abusive behaviors.
It is often contentious on what should be the true label of a given text as the semantic difference of the labels may be blurred.
arXiv Detail & Related papers (2020-06-24T06:34:25Z) - WAC: A Corpus of Wikipedia Conversations for Online Abuse Detection [0.0]
We propose an original framework, based on the Wikipedia Comment corpus, with comment-level annotations of different types.
This large corpus of more than 380k annotated messages opens perspectives for online abuse detection and especially for context-based approaches.
We also propose, in addition to this corpus, a complete benchmarking platform to stimulate and fairly compare scientific works around the problem of content abuse detection.
arXiv Detail & Related papers (2020-03-13T10:26:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.