Combatting Human Trafficking in the Cyberspace: A Natural Language
Processing-Based Methodology to Analyze the Language in Online Advertisements
- URL: http://arxiv.org/abs/2311.13118v1
- Date: Wed, 22 Nov 2023 02:45:01 GMT
- Title: Combatting Human Trafficking in the Cyberspace: A Natural Language
Processing-Based Methodology to Analyze the Language in Online Advertisements
- Authors: Alejandro Rodriguez Perez and Pablo Rivas
- Abstract summary: This project tackles the pressing issue of human trafficking in online C2C marketplaces through advanced Natural Language Processing (NLP) techniques.
We introduce a novel methodology for generating pseudo-labeled datasets with minimal supervision, serving as a rich resource for training state-of-the-art NLP models.
A key contribution is the implementation of an interpretability framework using Integrated Gradients, providing explainable insights crucial for law enforcement.
- Score: 55.2480439325792
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This project tackles the pressing issue of human trafficking in online C2C
marketplaces through advanced Natural Language Processing (NLP) techniques. We
introduce a novel methodology for generating pseudo-labeled datasets with
minimal supervision, serving as a rich resource for training state-of-the-art
NLP models. Focusing on tasks like Human Trafficking Risk Prediction (HTRP) and
Organized Activity Detection (OAD), we employ cutting-edge Transformer models
for analysis. A key contribution is the implementation of an interpretability
framework using Integrated Gradients, providing explainable insights crucial
for law enforcement. This work not only fills a critical gap in the literature
but also offers a scalable, machine learning-driven approach to combat human
exploitation online. It serves as a foundation for future research and
practical applications, emphasizing the role of machine learning in addressing
complex social issues.
Related papers
- Deep Learning and Machine Learning -- Natural Language Processing: From Theory to Application [17.367710635990083]
We focus on natural language processing (NLP) and the role of large language models (LLMs)
This paper discusses advanced data preprocessing techniques and the use of frameworks like Hugging Face for implementing transformer-based models.
It highlights challenges such as handling multilingual data, reducing bias, and ensuring model robustness.
arXiv Detail & Related papers (2024-10-30T09:35:35Z) - ContextGPT: Infusing LLMs Knowledge into Neuro-Symbolic Activity
Recognition Models [0.3277163122167433]
We propose ContextGPT: a novel prompt engineering approach to retrieve from common-sense knowledge about human activities.
An evaluation carried out on two public datasets shows how a NeSy model obtained by infusing common-sense knowledge from ContextGPT is effective in data scarcity scenarios.
arXiv Detail & Related papers (2024-03-11T10:32:23Z) - Towards A Unified Agent with Foundation Models [18.558328028366816]
We investigate how to embed and leverage such abilities in Reinforcement Learning (RL) agents.
We design a framework that uses language as the core reasoning tool, exploring how this enables an agent to tackle a series of fundamental RL challenges.
We demonstrate substantial performance improvements over baselines in exploration efficiency and ability to reuse data from offline datasets.
arXiv Detail & Related papers (2023-07-18T22:37:30Z) - Deep Active Learning for Computer Vision: Past and Future [50.19394935978135]
Despite its indispensable role for developing AI models, research on active learning is not as intensive as other research directions.
By addressing data automation challenges and coping with automated machine learning systems, active learning will facilitate democratization of AI technologies.
arXiv Detail & Related papers (2022-11-27T13:07:14Z) - Interactive Machine Learning: A State of the Art Review [0.0]
We provide a comprehensive analysis of the state-of-the-art of interactive machine learning (iML)
Research works on adversarial black-box attacks and corresponding iML based defense system, exploratory machine learning, resource constrained learning, and iML performance evaluation are analyzed.
arXiv Detail & Related papers (2022-07-13T13:43:16Z) - Constrained Reinforcement Learning for Robotics via Scenario-Based
Programming [64.07167316957533]
It is crucial to optimize the performance of DRL-based agents while providing guarantees about their behavior.
This paper presents a novel technique for incorporating domain-expert knowledge into a constrained DRL training loop.
Our experiments demonstrate that using our approach to leverage expert knowledge dramatically improves the safety and the performance of the agent.
arXiv Detail & Related papers (2022-06-20T07:19:38Z) - Meta Learning for Natural Language Processing: A Survey [88.58260839196019]
Deep learning has been the mainstream technique in natural language processing (NLP) area.
Deep learning requires many labeled data and is less generalizable across domains.
Meta-learning is an arising field in machine learning studying approaches to learn better algorithms.
arXiv Detail & Related papers (2022-05-03T13:58:38Z) - Human-Robot Collaboration and Machine Learning: A Systematic Review of
Recent Research [69.48907856390834]
Human-robot collaboration (HRC) is the approach that explores the interaction between a human and a robot.
This paper proposes a thorough literature review of the use of machine learning techniques in the context of HRC.
arXiv Detail & Related papers (2021-10-14T15:14:33Z) - Reprogramming Language Models for Molecular Representation Learning [65.00999660425731]
We propose Representation Reprogramming via Dictionary Learning (R2DL) for adversarially reprogramming pretrained language models for molecular learning tasks.
The adversarial program learns a linear transformation between a dense source model input space (language data) and a sparse target model input space (e.g., chemical and biological molecule data) using a k-SVD solver.
R2DL achieves the baseline established by state of the art toxicity prediction models trained on domain-specific data and outperforms the baseline in a limited training-data setting.
arXiv Detail & Related papers (2020-12-07T05:50:27Z) - Semi-Supervised Learning Approach to Discover Enterprise User Insights
from Feedback and Support [9.66491980663996]
We propose and developed an innovative Semi-Supervised Learning approach by utilizing Deep Learning and Topic Modeling.
This approach combines a BERT-based multiclassification algorithm through supervised learning combined with a novel Probabilistic and Semantic Hybrid Topic Inference (PSHTI) Model.
Our system enables mapping the top words to the self-help issues by utilizing domain knowledge about the product through web-crawling.
arXiv Detail & Related papers (2020-07-18T01:18:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.