Creating Geospatial Trajectories from Human Trafficking Text Corpora
- URL: http://arxiv.org/abs/2405.06130v1
- Date: Thu, 09 May 2024 22:24:09 GMT
- Title: Creating Geospatial Trajectories from Human Trafficking Text Corpora
- Authors: Saydeh N. Karabatis, Vandana P. Janeja,
- Abstract summary: We propose a Narrative to Trajectory (N2T) information extraction system.
N2T analyzes reported narratives, extracts relevant information through the use of Natural Language Processing (NLP) techniques, and applies geospatial augmentation.
We evaluate N2T on human trafficking text corpora and demonstrate that our approach of utilizing data preprocessing and augmenting database techniques with NLP libraries outperforms existing geolocation detection methods.
- Score: 0.0
- License:
- Abstract: Human trafficking is a crime that affects the lives of millions of people across the globe. Traffickers exploit the victims through forced labor, involuntary sex, or organ harvesting. Migrant smuggling could also be seen as a form of human trafficking when the migrant fails to pay the smuggler and is forced into coerced activities. Several news agencies and anti-trafficking organizations have reported trafficking survivor stories that include the names of locations visited along the trafficking route. Identifying such routes can provide knowledge that is essential to preventing such heinous crimes. In this paper we propose a Narrative to Trajectory (N2T) information extraction system that analyzes reported narratives, extracts relevant information through the use of Natural Language Processing (NLP) techniques, and applies geospatial augmentation in order to automatically plot trajectories of human trafficking routes. We evaluate N2T on human trafficking text corpora and demonstrate that our approach of utilizing data preprocessing and augmenting database techniques with NLP libraries outperforms existing geolocation detection methods.
Related papers
- Tracing the Unseen: Uncovering Human Trafficking Patterns in Job Listings [9.450459784653196]
We analyze over a quarter million job postings collected from eight relevant regions across the United States, spanning nearly two decades (2006-2024)
Our investigation into the types of advertised opportunities, the modes of preferred contact, and the frequency of postings uncovers the patterns characterizing suspicious ads.
This research underscores the imperative for a deeper dive into how online job boards and communication platforms could be unwitting facilitators of human trafficking.
arXiv Detail & Related papers (2024-06-18T10:18:15Z) - Narrative to Trajectory (N2T+): Extracting Routes of Life or Death from Human Trafficking Text Corpora [0.0]
We propose a system called Narrative to Trajectory (N2T+), which extracts trajectories of trafficking routes.
N2T+ uses Data Science and Natural Language Processing techniques to analyze trafficking narratives, automatically extract relevant location names, and plot the trafficking route on a map.
arXiv Detail & Related papers (2024-05-09T22:21:40Z) - Combatting Human Trafficking in the Cyberspace: A Natural Language
Processing-Based Methodology to Analyze the Language in Online Advertisements [55.2480439325792]
This project tackles the pressing issue of human trafficking in online C2C marketplaces through advanced Natural Language Processing (NLP) techniques.
We introduce a novel methodology for generating pseudo-labeled datasets with minimal supervision, serving as a rich resource for training state-of-the-art NLP models.
A key contribution is the implementation of an interpretability framework using Integrated Gradients, providing explainable insights crucial for law enforcement.
arXiv Detail & Related papers (2023-11-22T02:45:01Z) - Unveiling the Potential of Knowledge-Prompted ChatGPT for Enhancing Drug
Trafficking Detection on Social Media [30.791563171321062]
We propose an analytical framework to compose emphknowledge-informed prompts, which serve as the interface that humans can interact with and use LLMs to perform the detection task.
Our experimental findings demonstrate that the proposed framework outperforms other baseline language models in terms of drug trafficking detection accuracy.
The implications of our research extend to social networks, emphasizing the importance of incorporating prior knowledge and scenario-based prompts into analytical tools to improve online security and public safety.
arXiv Detail & Related papers (2023-07-07T16:15:59Z) - A New Task and Dataset on Detecting Attacks on Human Rights Defenders [68.45906430323156]
We propose a new dataset for detecting Attacks on Human Rights Defenders (HRDsAttack) consisting of crowdsourced annotations on 500 online news articles.
The annotations include fine-grained information about the type and location of the attacks, as well as information about the victim(s)
We demonstrate the usefulness of the dataset by using it to train and evaluate baseline models on several sub-tasks to predict the annotated characteristics.
arXiv Detail & Related papers (2023-06-30T14:20:06Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - Audio Analytics-based Human Trafficking Detection Framework for
Autonomous Vehicles [2.868643768911536]
This study aims to develop an innovative audio analytics-based human trafficking detection framework for autonomous vehicles.
We create a new and comprehensive audio dataset related to human trafficking with five classes i.e., crying, screaming, car door banging, car noise, and conversation.
Our analyses reveal that the deep 1-D CNN can distinguish sound coming from a human trafficking victim from a non-human trafficking sound with an accuracy of 95%.
arXiv Detail & Related papers (2022-09-09T01:06:50Z) - Faking Fake News for Real Fake News Detection: Propaganda-loaded
Training Data Generation [105.20743048379387]
We propose a novel framework for generating training examples informed by the known styles and strategies of human-authored propaganda.
Specifically, we perform self-critical sequence training guided by natural language inference to ensure the validity of the generated articles.
Our experimental results show that fake news detectors trained on PropaNews are better at detecting human-written disinformation by 3.62 - 7.69% F1 score on two public datasets.
arXiv Detail & Related papers (2022-03-10T14:24:19Z) - Knowledge Sharing via Domain Adaptation in Customs Fraud Detection [14.933341652591224]
This paper proposes DAS, a memory bank platform to facilitate knowledge sharing across multi-national customs administrations.
Data encompassing over 8 million import declarations have been used to test the feasibility of this new system.
arXiv Detail & Related papers (2022-01-18T06:17:03Z) - Human-in-the-Loop Imitation Learning using Remote Teleoperation [72.2847988686463]
We build a data collection system tailored to 6-DoF manipulation settings.
We develop an algorithm to train the policy iteratively on new data collected by the system.
We demonstrate that agents trained on data collected by our intervention-based system and algorithm outperform agents trained on an equivalent number of samples collected by non-interventional demonstrators.
arXiv Detail & Related papers (2020-12-12T05:30:35Z) - A German Corpus for Fine-Grained Named Entity Recognition and Relation
Extraction of Traffic and Industry Events [63.08899104652265]
This work describes a corpus of German-language documents which has been annotated with fine-grained geo-entities.
It has also been annotated with a set of 15 traffic- and industry-related n-ary relations and events.
The corpus consists of newswire texts, Twitter messages, and traffic reports from radio stations, police and railway companies.
arXiv Detail & Related papers (2020-04-07T11:39:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.