A Multi-faceted Semi-Synthetic Dataset for Automated Cyberbullying
Detection
- URL: http://arxiv.org/abs/2402.10231v1
- Date: Fri, 9 Feb 2024 16:53:19 GMT
- Title: A Multi-faceted Semi-Synthetic Dataset for Automated Cyberbullying
Detection
- Authors: Naveed Ejaz, Fakhra Kashif, Salimur Choudhury
- Abstract summary: This paper provides a description of an extensive semi-synthetic cyberbullying dataset.
It incorporates all of the essential aspects of cyberbullying, including aggression, repetition, peer relationships, and intent to harm.
This accompanying data article provides an in-depth look at the dataset, increasing transparency and enabling replication.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In recent years, the rising use of social media has propelled automated
cyberbullying detection into a prominent research domain. However, challenges
persist due to the absence of a standardized definition and universally
accepted datasets. Many researchers now view cyberbullying as a facet of
cyberaggression, encompassing factors like repetition, peer relationships, and
harmful intent in addition to online aggression. Acquiring comprehensive data
reflective of all cyberbullying components from social media networks proves to
be a complex task. This paper provides a description of an extensive
semi-synthetic cyberbullying dataset that incorporates all of the essential
aspects of cyberbullying, including aggression, repetition, peer relationships,
and intent to harm. The method of creating the dataset is succinctly outlined,
and a detailed overview of the publicly accessible dataset is additionally
presented. This accompanying data article provides an in-depth look at the
dataset, increasing transparency and enabling replication. It also aids in a
deeper understanding of the data, supporting broader research use.
Related papers
- Explain Thyself Bully: Sentiment Aided Cyberbullying Detection with
Explanation [52.3781496277104]
Cyberbullying has become a big issue with the popularity of different social media networks and online communication apps.
Recent laws like "right to explanations" of General Data Protection Regulation have spurred research in developing interpretable models.
We develop first interpretable multi-task model called em mExCB for automatic cyberbullying detection from code-mixed languages.
arXiv Detail & Related papers (2024-01-17T07:36:22Z) - Stepping out of Flatland: Discovering Behavior Patterns as Topological Structures in Cyber Hypergraphs [0.7835894511242797]
We present a novel framework based in the theory of hypergraphs and topology to understand data from cyber networks.
We will demonstrate concrete examples in a large-scale cyber network dataset.
arXiv Detail & Related papers (2023-11-08T00:00:33Z) - Graph Mining for Cybersecurity: A Survey [61.505995908021525]
The explosive growth of cyber attacks nowadays, such as malware, spam, and intrusions, caused severe consequences on society.
Traditional Machine Learning (ML) based methods are extensively used in detecting cyber threats, but they hardly model the correlations between real-world cyber entities.
With the proliferation of graph mining techniques, many researchers investigated these techniques for capturing correlations between cyber entities and achieving high performance.
arXiv Detail & Related papers (2023-04-02T08:43:03Z) - Cyberbullying in Text Content Detection: An Analytical Review [0.0]
Online social networks increase the user's exposure to life-threatening situations such as suicide, eating disorder, cybercrime, compulsive behavior, anxiety, and depression.
To tackle the issue of cyberbullying, most existing literature focuses on developing approaches to identifying factors and understanding the textual factors associated with cyberbullying.
This paper conducts a comprehensive literature review to provide an understanding of cyberbullying detection.
arXiv Detail & Related papers (2023-03-18T21:23:06Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Scene-Aware
Ambidextrous Bin Picking via Physics-based Metaverse Synthesis [72.85526892440251]
We introduce MetaGraspNet, a large-scale photo-realistic bin picking dataset constructed via physics-based metaverse synthesis.
The proposed dataset contains 217k RGBD images across 82 different article types, with full annotations for object detection, amodal perception, keypoint detection, manipulation order and ambidextrous grasp labels for a parallel-jaw and vacuum gripper.
We also provide a real dataset consisting of over 2.3k fully annotated high-quality RGBD images, divided into 5 levels of difficulties and an unseen object set to evaluate different object and layout properties.
arXiv Detail & Related papers (2022-08-08T08:15:34Z) - Session-based Cyberbullying Detection in Social Media: A Survey [16.39344929765961]
We define the Session-based Cyberbullying Detection framework that encapsulates the different steps and challenges of the problem.
Our review leads us to propose evidence-based criteria for a set of best practices to create session-based cyberbullying datasets.
arXiv Detail & Related papers (2022-07-14T18:56:54Z) - Cyberbullying Indicator as a Precursor to a Cyber Construct Development [0.0]
This study proposes a cyberbullying framework based on the identification of some observable behavioral indicators.
Using a self-administered measurement instrument from 30-respondents, the study observed the probability of a cyberbully construct.
arXiv Detail & Related papers (2022-03-31T07:55:51Z) - The Problem of Zombie Datasets:A Framework For Deprecating Datasets [55.878249096379804]
We examine the public afterlives of several prominent datasets, including ImageNet, 80 Million Tiny Images, MS-Celeb-1M, Duke MTMC, Brainwash, and HRT Transgender.
We propose a dataset deprecation framework that includes considerations of risk, mitigation of impact, appeal mechanisms, timeline, post-deprecation protocol, and publication checks.
arXiv Detail & Related papers (2021-10-18T20:13:51Z) - REGRAD: A Large-Scale Relational Grasp Dataset for Safe and
Object-Specific Robotic Grasping in Clutter [52.117388513480435]
We present a new dataset named regrad to sustain the modeling of relationships among objects and grasps.
Our dataset is collected in both forms of 2D images and 3D point clouds.
Users are free to import their own object models for the generation of as many data as they want.
arXiv Detail & Related papers (2021-04-29T05:31:21Z) - Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News [57.9843300852526]
We introduce the more realistic and challenging task of defending against machine-generated news that also includes images and captions.
To identify the possible weaknesses that adversaries can exploit, we create a NeuralNews dataset composed of 4 different types of generated articles.
In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies.
arXiv Detail & Related papers (2020-09-16T14:13:15Z) - Aggressive, Repetitive, Intentional, Visible, and Imbalanced: Refining
Representations for Cyberbullying Classification [4.945634077636197]
We study the nuanced problem of cyberbullying using five explicit factors to represent its social and linguistic aspects.
These results demonstrate the importance of representing and modeling cyberbullying as a social phenomenon.
arXiv Detail & Related papers (2020-04-04T00:35:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.