Automatic User Profiling in Darknet Markets: a Scalability Study
- URL: http://arxiv.org/abs/2203.13179v1
- Date: Thu, 24 Mar 2022 16:54:59 GMT
- Title: Automatic User Profiling in Darknet Markets: a Scalability Study
- Authors: Claudia Peersman, Matthew Edwards, Emma Williams, Awais Rashid
- Abstract summary: This study aims to understand the reliability and limitations of current computational stylometry approaches.
Because no ground truth is available and no validated criminal data from historic investigations is available for validation purposes, we have collected new data from clearweb forums.
- Score: 15.83443291553249
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this study, we investigate the scalability of state-of-the-art user
profiling technologies across different online domains. More specifically, this
work aims to understand the reliability and limitations of current
computational stylometry approaches when these are applied to underground fora
in which user populations potentially differ from other online platforms
(predominantly male, younger age and greater computer use) and cyber offenders
who attempt to hide their identity. Because no ground truth is available and no
validated criminal data from historic investigations is available for
validation purposes, we have collected new data from clearweb forums that do
include user demographics and could be more closely related to underground fora
in terms of user population (e.g., tech communities) than commonly used social
media benchmark datasets showing a more balanced user population.
Related papers
- How Unique is Whose Web Browser? The role of demographics in browser fingerprinting among US users [50.699390248359265]
Browser fingerprinting can be used to identify and track users across the Web, even without cookies.
This technique and resulting privacy risks have been studied for over a decade.
We provide a first-of-its-kind dataset to enable further research.
arXiv Detail & Related papers (2024-10-09T14:51:58Z) - Easy-access online social media metrics can effectively identify misinformation sharing users [41.94295877935867]
We find that higher tweet frequency is positively associated with low factuality in shared content, while account age is negatively associated with it.
Our findings show that relying on these easy-access social network metrics could serve as a low-barrier approach for initial identification of users who are more likely to spread misinformation.
arXiv Detail & Related papers (2024-08-27T16:41:13Z) - Differentially Private Data Release on Graphs: Inefficiencies and Unfairness [48.96399034594329]
This paper characterizes the impact of Differential Privacy on bias and unfairness in the context of releasing information about networks.
We consider a network release problem where the network structure is known to all, but the weights on edges must be released privately.
Our work provides theoretical foundations and empirical evidence into the bias and unfairness arising due to privacy in these networked decision problems.
arXiv Detail & Related papers (2024-08-08T08:37:37Z) - Protecting User Privacy in Online Settings via Supervised Learning [69.38374877559423]
We design an intelligent approach to online privacy protection that leverages supervised learning.
By detecting and blocking data collection that might infringe on a user's privacy, we can restore a degree of digital privacy to the user.
arXiv Detail & Related papers (2023-04-06T05:20:16Z) - Design and analysis of tweet-based election models for the 2021 Mexican
legislative election [55.41644538483948]
We use a dataset of 15 million election-related tweets in the six months preceding election day.
We find that models using data with geographical attributes determine the results of the election with better precision and accuracy than conventional polling methods.
arXiv Detail & Related papers (2023-01-02T12:40:05Z) - Detecting fake accounts through Generative Adversarial Network in online
social media [0.0]
This paper proposes a novel method using user similarity measures and the Generative Adversarial Network (GAN) algorithm to identify fake user accounts in the Twitter dataset.
Despite the problem's complexity, the method achieves an AUC rate of 80% in classifying and detecting fake accounts.
arXiv Detail & Related papers (2022-10-25T10:20:27Z) - Fast Few shot Self-attentive Semi-supervised Political Inclination
Prediction [12.472629584751509]
It is increasingly common now for policymakers/journalists to create online polls on social media to understand the political leanings of people in specific locations.
We introduce a self-attentive semi-supervised framework for political inclination detection to further that objective.
We found that the model is highly efficient even in resource-constrained settings.
arXiv Detail & Related papers (2022-09-21T12:07:16Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - Tackling Racial Bias in Automated Online Hate Detection: Towards Fair
and Accurate Classification of Hateful Online Users Using Geometric Deep
Learning [2.385774752937891]
This paper examines how fairer and more accurate hateful user detection systems can be developed by incorporating social network information.
Geometric deep learning dynamically learns information-rich network representations and can generalise to unseen nodes.
It produces the most accurate and fairest classifier, with an AUC score of 90.8% on the entire dataset.
arXiv Detail & Related papers (2021-03-22T13:08:11Z) - Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models.
We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups.
We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results.
We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.