Semi-Supervised Classification of Social Media Posts: Identifying
Sex-Industry Posts to Enable Better Support for Those Experiencing
Sex-Trafficking
- URL: http://arxiv.org/abs/2104.03233v1
- Date: Wed, 7 Apr 2021 16:31:14 GMT
- Title: Semi-Supervised Classification of Social Media Posts: Identifying
Sex-Industry Posts to Enable Better Support for Those Experiencing
Sex-Trafficking
- Authors: Ellie Simonson
- Abstract summary: Social media is both helpful and harmful to the work against sex trafficking.
There is the opportunity to use social media data to better provide support for people experiencing trafficking.
This thesis explores the use of semi-supervised learning to identify social media posts that are a part of the sex industry.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Social media is both helpful and harmful to the work against sex trafficking.
On one hand, social workers carefully use social media to support people
experiencing sex trafficking. On the other hand, traffickers use social media
to groom and recruit people into trafficking situations. There is the
opportunity to use social media data to better provide support for people
experiencing trafficking.
While AI and Machine Learning (ML) have been used in work against sex
trafficking, they predominantly focus on detecting Child Sexual Abuse Material.
Work using social media data has not been done with the intention to provide
community level support to people of all ages experiencing trafficking. Within
this context, this thesis explores the use of semi-supervised classification to
identify social media posts that are a part of the sex industry.
Several methods were explored for ML. However, the primary method used was
semi-supervised learning, which has the benefit of providing automated
classification with a limited set of labelled data. Social media posts were
embedded into low-dimensional vectors using FastText and Doc2Vec models. The
data were then clustered using k-means clustering, and cross-validation was
used to determine label propagation accuracy.
The results of the semi-supervised algorithm were encouraging. The FastText
CBOW model provided 98.6% accuracy to over 12,000 posts in clusters where label
propagation was applied. The results of this thesis suggest that further
semi-supervised learning, in conjunction with manual labeling, may allow for
the entire dataset containing over 50,000 posts to be accurately labeled.
A fully labeled dataset could be used to develop a tool to identify an
overview of where and when social media is used within the sex industry. This
could be used to help determine better ways to provide support to people
experiencing trafficking.
Related papers
- A Semi-supervised Fake News Detection using Sentiment Encoding and LSTM with Self-Attention [0.0]
We propose a semi-supervised self-learning method in which a sentiment analysis is acquired by some state-of-the-art pretrained models.
Our learning model is trained in a semi-supervised fashion and incorporates LSTM with self-attention layers.
We benchmark our model on a dataset with 20,000 news content along with their feedback, which shows better performance in precision, recall, and measures compared to competitive methods in fake news detection.
arXiv Detail & Related papers (2024-07-27T20:00:10Z) - Countering Misinformation via Emotional Response Generation [15.383062216223971]
proliferation of misinformation on social media platforms (SMPs) poses a significant danger to public health, social cohesion and democracy.
Previous research has shown how social correction can be an effective way to curb misinformation.
We present VerMouth, the first large-scale dataset comprising roughly 12 thousand claim-response pairs.
arXiv Detail & Related papers (2023-11-17T15:37:18Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - Understanding Lexical Biases when Identifying Gang-related Social Media
Communications [18.301221486244263]
We use a binary logistic classifier to identify gang-related tweets in Chicago.
We find that the language of a tweet is highly relevant and that uses of big data'' methods or machine learning models need to better understand how language impacts the model's performance.
arXiv Detail & Related papers (2023-04-22T21:51:49Z) - Harnessing the Power of Text-image Contrastive Models for Automatic
Detection of Online Misinformation [50.46219766161111]
We develop a self-learning model to explore the constrastive learning in the domain of misinformation identification.
Our model shows the superior performance of non-matched image-text pair detection when the training data is insufficient.
arXiv Detail & Related papers (2023-04-19T02:53:59Z) - Ethics and Efficacy of Unsolicited Anti-Trafficking SMS Outreach [22.968179319673112]
We investigate the use, context, benefits, and harms of an anti-trafficking technology platform in North America.
Our findings illustrate misalignment between developers, users of the platform, and sex industry workers they are attempting to assist.
arXiv Detail & Related papers (2022-02-19T05:12:34Z) - Identification of Twitter Bots based on an Explainable ML Framework: the
US 2020 Elections Case Study [72.61531092316092]
This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data.
Supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm.
Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions.
arXiv Detail & Related papers (2021-12-08T14:12:24Z) - Can You be More Social? Injecting Politeness and Positivity into
Task-Oriented Conversational Agents [60.27066549589362]
Social language used by human agents is associated with greater users' responsiveness and task completion.
The model uses a sequence-to-sequence deep learning architecture, extended with a social language understanding element.
Evaluation in terms of content preservation and social language level using both human judgment and automatic linguistic measures shows that the model can generate responses that enable agents to address users' issues in a more socially appropriate way.
arXiv Detail & Related papers (2020-12-29T08:22:48Z) - Measuring Social Biases of Crowd Workers using Counterfactual Queries [84.10721065676913]
Social biases based on gender, race, etc. have been shown to pollute machine learning (ML) pipeline predominantly via biased training datasets.
Crowdsourcing, a popular cost-effective measure to gather labeled training datasets, is not immune to the inherent social biases of crowd workers.
We propose a new method based on counterfactual fairness to quantify the degree of inherent social bias in each crowd worker.
arXiv Detail & Related papers (2020-04-04T21:41:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.