A comprehensive cross-language framework for harmful content detection
with the aid of sentiment analysis
- URL: http://arxiv.org/abs/2403.01270v1
- Date: Sat, 2 Mar 2024 17:13:47 GMT
- Title: A comprehensive cross-language framework for harmful content detection
with the aid of sentiment analysis
- Authors: Mohammad Dehghani
- Abstract summary: This study introduces, for the first time, a detailed framework adaptable to any language.
A key component of the framework is the development of a general and detailed annotation guideline.
The integration of sentiment analysis represents a novel approach to enhancing harmful language detection.
- Score: 0.356008609689971
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In today's digital world, social media plays a significant role in
facilitating communication and content sharing. However, the exponential rise
in user-generated content has led to challenges in maintaining a respectful
online environment. In some cases, users have taken advantage of anonymity in
order to use harmful language, which can negatively affect the user experience
and pose serious social problems. Recognizing the limitations of manual
moderation, automatic detection systems have been developed to tackle this
problem. Nevertheless, several obstacles persist, including the absence of a
universal definition for harmful language, inadequate datasets across
languages, the need for detailed annotation guideline, and most importantly, a
comprehensive framework. This study aims to address these challenges by
introducing, for the first time, a detailed framework adaptable to any
language. This framework encompasses various aspects of harmful language
detection. A key component of the framework is the development of a general and
detailed annotation guideline. Additionally, the integration of sentiment
analysis represents a novel approach to enhancing harmful language detection.
Also, a definition of harmful language based on the review of different related
concepts is presented. To demonstrate the effectiveness of the proposed
framework, its implementation in a challenging low-resource language is
conducted. We collected a Persian dataset and applied the annotation guideline
for harmful detection and sentiment analysis. Next, we present baseline
experiments utilizing machine and deep learning methods to set benchmarks.
Results prove the framework's high performance, achieving an accuracy of 99.4%
in offensive language detection and 66.2% in sentiment analysis.
Related papers
- ToxiCloakCN: Evaluating Robustness of Offensive Language Detection in Chinese with Cloaking Perturbations [6.360597788845826]
This study examines the limitations of state-of-the-art large language models (LLMs) in identifying offensive content within systematically perturbed data.
Our work highlights the urgent need for more advanced techniques in offensive language detection to combat the evolving tactics used to evade detection mechanisms.
arXiv Detail & Related papers (2024-06-18T02:44:56Z) - On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation [71.72465617754553]
We generate "low-level" sentences that convey object-centric, three-dimensional spatial relationships, incorporate them as additional language priors and evaluate their downstream impact on depth estimation.
Our key finding is that current language-guided depth estimators perform optimally only with scene-level descriptions.
Despite leveraging additional data, these methods are not robust to directed adversarial attacks and decline in performance with an increase in distribution shift.
arXiv Detail & Related papers (2024-04-12T15:35:20Z) - Chinese Offensive Language Detection:Current Status and Future Directions [2.1357786131968637]
This paper provides a comprehensive overview of offensive language detection in Chinese, examining current benchmarks and approaches.
The primary objective of this survey is to explore the existing techniques and identify potential avenues for further research.
arXiv Detail & Related papers (2024-03-27T07:34:44Z) - Capturing Pertinent Symbolic Features for Enhanced Content-Based
Misinformation Detection [0.0]
The detection of misleading content presents a significant hurdle due to its extreme linguistic and domain variability.
This paper analyzes the linguistic attributes that characterize this phenomenon and how representative of such features some of the most popular misinformation datasets are.
We demonstrate that the appropriate use of pertinent symbolic knowledge in combination with neural language models is helpful in detecting misleading content.
arXiv Detail & Related papers (2024-01-29T16:42:34Z) - When a Language Question Is at Stake. A Revisited Approach to Label
Sensitive Content [0.0]
Article revisits an approach of pseudo-labeling sensitive data on the example of Ukrainian tweets covering the Russian-Ukrainian war.
We provide a fundamental statistical analysis of the obtained data, evaluation of models used for pseudo-labelling, and set further guidelines on how the scientists can leverage the corpus.
arXiv Detail & Related papers (2023-11-17T13:35:10Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - Contextual information integration for stance detection via
cross-attention [59.662413798388485]
Stance detection deals with identifying an author's stance towards a target.
Most existing stance detection models are limited because they do not consider relevant contextual information.
We propose an approach to integrate contextual information as text.
arXiv Detail & Related papers (2022-11-03T15:04:29Z) - Metrics reloaded: Recommendations for image analysis validation [59.60445111432934]
Metrics Reloaded is a comprehensive framework guiding researchers in the problem-aware selection of metrics.
The framework was developed in a multi-stage Delphi process and is based on the novel concept of a problem fingerprint.
Based on the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics.
arXiv Detail & Related papers (2022-06-03T15:56:51Z) - A New Generation of Perspective API: Efficient Multilingual
Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw.
At the heart of the approach is a single multilingual token-free Charformer model.
We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Natural language technology and query expansion: issues,
state-of-the-art and perspectives [0.0]
Linguistic characteristics that cause ambiguity and misinterpretation of queries as well as additional factors affect the users ability to accurately represent their information needs.
We lay down the anatomy of a generic linguistic based query expansion framework and propose its module-based decomposition.
For each of the modules we review the state-of-the-art solutions in the literature and categorized under the light of the techniques used.
arXiv Detail & Related papers (2020-04-23T11:39:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.