Child trafficking in a serious problem around the world. Every year there are
more than 4 million victims of child trafficking around the world, many of them
for the purposes of child sexual exploitation. In collaboration with UK Police
and a non-profit focused on child abuse prevention, Global Emancipation
Network, we developed a proof-of-concept machine learning pipeline to aid the
identification of children from intercepted images. In this work, we focus on
images that contain children wearing school uniforms to identify the school of
origin. In the absence of a machine learning pipeline, this hugely time
consuming and labor intensive task is manually conducted by law enforcement
personnel. Thus, by automating aspects of the school identification process, we
hope to significantly impact the speed of this portion of child identification.
Our proposed pipeline consists of two machine learning models: i) to identify
whether an image of a child contains a school uniform in it, and ii)
identification of attributes of different school uniform items (such as
color/texture of shirts, sweaters, blazers etc.). We describe the data
collection, labeling, model development and validation process, along with
strategies for efficient searching of schools using the model predictions.
A machine learning pipeline for aiding school identification
from child trafficking images Sumit Mukherjee Tina Sederholm
子供の売買画像から Sumt Mukherjee Tina Sederholm
Anthony C. Roman
Anthony C. Roman
sherrie@globalemanci pation.ngo Global Emancipation Network
sherrie@globalemanci pation.ngo global emancipation network
USA Juan Lavista Ferres email@example.com om Microsoft Corporation
アメリカ Juan Lavista Ferres firstname.lastname@example.org om Microsoft Corporation
Microsoft Corporation Redmond, USA
マイクロソフト レドモンド, アメリカ
ABSTRACT Child trafficking in a serious problem around the world.
Every year there are more than 4 million victims of child trafficking around the world, many of them for the purposes of child sexual exploitation.
In collaboration with UK Police and a non-profit focused on child abuse prevention, Global Emancipation Network, we developed a proof-of-concept machine learning pipeline to aid the identification of children from intercepted images.
Our proposed pipeline consists of two machine learning models: i) to identify whether an image of a child contains a school uniform in it, and ii) identification of attributes of different school uniform items (such as color/texture of shirts, sweaters, blazers etc.).
KEYWORDS Child trafficking, AI for Good, Computer Vision.
KEYWORDS Child Trafficking, AI for Good, Computer Vision
ACM Reference Format: Sumit Mukherjee, Tina Sederholm, Anthony C. Roman, Ria Sankar, Sherrie Caltagirone, and Juan Lavista Ferres.
ACM参照フォーマット:Sumt Mukherjee、Tina Sederholm、Anthony C. Roman、Ria Sankar、Sherrie Caltagirone、Juan Lavista Ferres。
2018. A machine learning pipeline for aiding school identification from child trafficking images.
In Proceedings of ACM Conference (Conference’17).
In Proceedings of ACM Conference (Conference’17)
ACM, New York, NY, USA, 4 pages.
ACM, New York, NY, USA, 4ページ。
1 INTRODUCTION According to Human Rights First , the total number of victims of global human trafficking is approximately 24.9 million, with 5.5
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.
ACM ISBN 978-x-xxxx-xxxx-x/YY /MM...$15.00 https://doi.org/10.1 145/1122445.1122456
ACM ISBN 978-x-xxxx-x/YY/MM.. .$15.00 https://doi.org/10.1 145/1122445.1122456
million (25 percent) being children.
Additionally, the percentage of children impacted has nearly doubled in the last 15 years (based on the 2020 UNODC Global Report on Trafficking in Persons ).
さらに、感染した子供の比率は過去15年間でほぼ倍増している(2020年のUNODC Global Report on Trafficking in Persons )。
Operations of this criminal industry are not easily identifiable, and organizations like the Global Emancipation Network (GEN) identify patterns and build tools to aid law enforcement with taking down human trafficking operations globally.
It is pointed out by our law enforcement partners that a large number of images scraped from child trafficking websites and from devices seized from child traffickers contain children in school uniforms.
Since uniforms may provide information about the location where the child was abducted from, law enforcement officers currently try to manually identify the school based on descriptive characteristics of the school uniforms (such as color/texture of uniform elements).
In order expedite the process of school identification from images, we built a proof-of-concept machine learning pipeline to: i) identify whether an image contains a child wearing a school uniform (uniform prediction), and ii) to identify which school the uniform represents using attribution specific to that school (such as color of shirts, sweaters, blazers etc.).
Moving forward, in conjunction with efforts focused on gathering images from a large cross section of schools, this tool could greatly help law enforcement in the task of school identification in images of children.
items with rich categorical and segmentation labels [9, 12].
 is now widely used as a standard benchmark dataset for various computer vision tasks related to clothing classification, segmentation of clothing items, as well as generation of synthetic fashion images.
While several models have been highly successful at the semantic segmentation of clothing items on these benchmark fashion datasets, there are several difficulties with directly applying such models to our setting.
Finally, due the staged nature of the benchmark images and the drastic difference in image size/quality between these and school uniform images available to us, we found that the pre-trained semantic segmentation models on these benchmark datasets to work rather poorly on our images.
In this initial pipeline, we only focus on the first two tasks undertaken by the law enforcement officers.
To replicate the process undertaken by the officers, the goal of our modeling endeavor was several fold: i) to identify whether an image contains a child wearing a school uniform, ii) to identify which school the uniform belongs to, and iii) to do so in a manner that can scale to new schools.
3.1 Uniform classification model The uniform classification task is posed as a binary prediction task, where the model simply outputs whether it detects a uniform in the image (class 1) or does not detect a uniform in the image (class 0).
The list of clothing items is expected to expand slightly, once we have more training data.
In the current iteration of the model, we will be predicting the base colors for these clothing items from the following colors: i) Red/Brown, ii) Yellow/Orange, iii) Green, iv) Blue/Purple, v) White, and vi) Black/Grey, vii) No color (meaning that the clothing item is not present in the image).
In this initial prototype, due to limited availability of labeled data, we ignore such considerations but as we have more data we can consider approaches such as stratified sampling to create a well balanced training dataset.
This led to approximately ten thousand images containing single individuals.
Multiple volunteers from GEN (Global Emancipation Network) then used Azure Labeling Services  to label each image with the afore mentioned clothing and color combinations (texture was also collected but not used in the current modeling).
For the next validation, we wanted to test robustness of the uniform classification model on schools it has not scene before.
To this end, we created 10 training sets leaving one school out in each i.e.
using 9 schools. We then created 2 separate groups of tests sets: i) 10 test sets using non-uniform images and uniform images from the same 9 schools as each training set, ii) 10 test sets using non-uniform images and uniform images from the left out school from each training set.
The ratio of uniform to nonuniform images was kept constant in both cases.
We report the Figure 3: Comparison of uniform classification model performance using different metrics on two different test scenarios: (top) test sets are a randomly held out split of the data, (bottom) models are trained on all but one school and test set contains the one school that wasn’t used during model training.
While this paper has focused on developing the machine learning portion of the proposed school identification framework, another related but important component is the school search using the predictions of the machine learning models.
authorities by identifying a specific school uniform.
Our proof-ofconcept pipeline comprises of a uniform classification to identify whether an image contains a child in uniform, and a uniform attribute prediction model which predicts the color (or absence) of different uniform relevant clothing items.
However, the adoption of such a pipeline might face challenges in countries where school uniforms are uncommon.
We anticipate the current pipeline provides limited ability for law enforcement to re-use the model for the other purposes.
However, due to valid concerns about re-usability of machine learning models by law enforcement for reasons other than ones they were developed for, any practical deployment of this pipeline will first go through legally binding ’limited use’ agreements with agencies using the pipeline.