Related papers: On the Challenges of Creating Datasets for Analyzing Commercial Sex Advertisements to Assess Human Trafficking Risk and Organized Activity

On the Challenges of Creating Datasets for Analyzing Commercial Sex Advertisements to Assess Human Trafficking Risk and Organized Activity

URL: http://arxiv.org/abs/2405.13348v1
Date: Wed, 22 May 2024 05:10:13 GMT
Title: On the Challenges of Creating Datasets for Analyzing Commercial Sex Advertisements to Assess Human Trafficking Risk and Organized Activity
Authors: Pablo Rivas, Tomas Cerny, Alejandro Rodriguez Perez, Javier Turek, Laurie Giddens, Gisela Bichler, Stacie Petter,
Abstract summary: This study addresses the challenges of building datasets to understand the risks associated with organized activities and human trafficking through commercial sex advertisements. Traditional approaches, which are not automated and are difficult to reproduce, fall short in addressing these issues. We have developed a reproducible and automated methodology to analyze five million advertisements.
Score: 37.47883671323525
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Our study addresses the challenges of building datasets to understand the risks associated with organized activities and human trafficking through commercial sex advertisements. These challenges include data scarcity, rapid obsolescence, and privacy concerns. Traditional approaches, which are not automated and are difficult to reproduce, fall short in addressing these issues. We have developed a reproducible and automated methodology to analyze five million advertisements. In the process, we identified further challenges in dataset creation within this sensitive domain. This paper presents a streamlined methodology to assist researchers in constructing effective datasets for combating organized crime, allowing them to focus on advancing detection technologies.

Related papers

A Comprehensive Survey on Network Traffic Synthesis: From Statistical Models to Deep Learning [4.578307236651368]
Synthetic network traffic generation has emerged as a promising alternative for various data-driven applications in the networking domain.<n>It enables the creation of synthetic data that preserves real-world characteristics while addressing key challenges such as data scarcity, privacy concerns, and purity constraints associated with real data.<n>This survey serves as a foundational resource for researchers and practitioners, offering a structured analysis of existing methods, challenges, and opportunities in synthetic network traffic generation.
arXiv Detail & Related papers (2025-06-23T18:08:18Z)
Anomaly Detection and Generation with Diffusion Models: A Survey [51.61574868316922]
Anomaly detection (AD) plays a pivotal role across diverse domains, including cybersecurity, finance, healthcare, and industrial manufacturing.<n>Recent advancements in deep learning, specifically diffusion models (DMs), have sparked significant interest.<n>This survey aims to guide researchers and practitioners in leveraging DMs for innovative AD solutions across diverse applications.
arXiv Detail & Related papers (2025-06-11T03:29:18Z)
Does Machine Unlearning Truly Remove Knowledge? [80.83986295685128]
We introduce a comprehensive auditing framework for unlearning evaluation comprising three benchmark datasets, six unlearning algorithms, and five prompt-based auditing methods.<n>We evaluate the effectiveness and robustness of different unlearning strategies.
arXiv Detail & Related papers (2025-05-29T09:19:07Z)
A Multidisciplinary Approach to Telegram Data Analysis [0.0]
This paper presents a multidisciplinary approach to analyzing data from Telegram for early warning information regarding cyber threats. We employ a combination of neural network architectures and traditional machine learning algorithms. We aim to enhance early warning systems for cyber threats, enabling more proactive responses to potential security breaches.
arXiv Detail & Related papers (2024-12-29T09:10:52Z)
Safeguarding Marketing Research: The Generation, Identification, and Mitigation of AI-Fabricated Disinformation [0.26107298043931204]
Generative AI has ushered in the ability to generate content that closely mimics human contributions. These models can be used to manipulate public opinion and distort perceptions, resulting in a decline in trust towards digital platforms. This study contributes to marketing literature and practice in three ways.
arXiv Detail & Related papers (2024-03-17T13:08:28Z)
Root causes, ongoing difficulties, proactive prevention techniques, and emerging trends of enterprise data breaches [0.0]
Businesses now consider data to be a crucial asset, and any breach of this data can have dire repercussions. Enterprises now place a high premium on detecting and preventing data loss due to the growing amount of data and the increasing frequency of data breaches. This review attempts to highlight interesting prospects and offer insightful information to those who are interested in learning about the risks that businesses face from data leaks.
arXiv Detail & Related papers (2023-11-27T20:34:10Z)
On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies. Machine and deep learning algorithms depend heavily on the data used during their development. We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z)
TII-SSRC-23 Dataset: Typological Exploration of Diverse Traffic Patterns for Intrusion Detection [0.5261718469769447]
Existing datasets often fall short, lacking the necessary diversity and alignment with the contemporary network environment. This paper introduces TII-SSRC-23, a novel and comprehensive dataset designed to overcome these challenges.
arXiv Detail & Related papers (2023-09-14T05:23:36Z)
Embrace Limited and Imperfect Training Datasets: Opportunities and Challenges in Plant Disease Recognition Using Deep Learning [5.526950086166696]
We argue that embracing poor datasets is viable and aim to explicitly define the challenges associated with using these datasets. Although our primary focus is on plant disease recognition, we emphasize that the principles of embracing and analyzing poor datasets are applicable to a wider range of domains, including agriculture.
arXiv Detail & Related papers (2023-05-19T08:58:09Z)
An Ethical Highlighter for People-Centric Dataset Creation [62.886916477131486]
We propose an analytical framework to guide ethical evaluation of existing datasets and to serve future dataset creators in avoiding missteps. Our work is informed by a review and analysis of prior works and highlights where such ethical challenges arise.
arXiv Detail & Related papers (2020-11-27T07:18:44Z)
Batch Exploration with Examples for Scalable Robotic Reinforcement Learning [63.552788688544254]
Batch Exploration with Examples (BEE) explores relevant regions of the state-space guided by a modest number of human provided images of important states. BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot.
arXiv Detail & Related papers (2020-10-22T17:49:25Z)
Survey of Network Intrusion Detection Methods from the Perspective of the Knowledge Discovery in Databases Process [63.75363908696257]
We review the methods that have been applied to network data with the purpose of developing an intrusion detector. We discuss the techniques used for the capture, preparation and transformation of the data, as well as, the data mining and evaluation methods. As a result of this literature review, we investigate some open issues which will need to be considered for further research in the area of network security.
arXiv Detail & Related papers (2020-01-27T11:21:05Z)
Deep Learning for Sensor-based Human Activity Recognition: Overview, Challenges and Opportunities [52.59080024266596]
We present a survey of the state-of-the-art deep learning methods for sensor-based human activity recognition. We first introduce the multi-modality of the sensory data and provide information for public datasets. We then propose a new taxonomy to structure the deep methods by challenges.
arXiv Detail & Related papers (2020-01-21T09:55:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.