Utilizing Deep Learning to Identify Drug Use on Twitter Data
- URL: http://arxiv.org/abs/2003.11522v1
- Date: Sun, 8 Mar 2020 07:52:40 GMT
- Title: Utilizing Deep Learning to Identify Drug Use on Twitter Data
- Authors: Joseph Tassone, Peizhi Yan, Mackenzie Simpson, Chetan Mendhe, Vijay
Mago, Salimur Choudhury
- Abstract summary: The classification power of multiple methods was compared including support vector machines (SVM), XGBoost, and convolutional neural network (CNN) based classifiers.
The accuracy scores were 76.35% and 82.31%, with an AUC of 0.90 and 0.91.
The synthetically generated set provided increased scores, improving the classification capability and proving the worth of this methodology.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The collection and examination of social media has become a useful mechanism
for studying the mental activity and behavior tendencies of users. Through the
analysis of collected Twitter data, models were developed for classifying
drug-related tweets. Using topic pertaining keywords, such as slang and methods
of drug consumption, a set of tweets was generated. Potential candidates were
then preprocessed resulting in a dataset of 3,696,150 rows. The classification
power of multiple methods was compared including support vector machines (SVM),
XGBoost, and convolutional neural network (CNN) based classifiers. Rather than
simple feature or attribute analysis, a deep learning approach was implemented
to screen and analyze the tweets' semantic meaning. The two CNN-based
classifiers presented the best result when compared against other
methodologies. The first was trained with 2,661 manually labeled samples, while
the other included synthetically generated tweets culminating in 12,142
samples. The accuracy scores were 76.35% and 82.31%, with an AUC of 0.90 and
0.91. Additionally, association rule mining showed that commonly mentioned
drugs had a level of correspondence with frequently used illicit substances,
proving the practical usefulness of the system. Lastly, the synthetically
generated set provided increased scores, improving the classification
capability and proving the worth of this methodology.
Related papers
- Novel Deep Neural Network Classifier Characterization Metrics with Applications to Dataless Evaluation [1.6574413179773757]
In this work, we evaluate a Deep Neural Network (DNN) classifier's training quality without any example dataset.
Our empirical study of the proposed method for ResNet18, trained with CAFIR10 and CAFIR100 datasets, confirms that data-less evaluation of DNN classifiers is indeed possible.
arXiv Detail & Related papers (2024-07-17T20:40:46Z) - Detecting the Presence of COVID-19 Vaccination Hesitancy from South
African Twitter Data Using Machine Learning [0.9830751917335564]
Vaccination is a major tool in the fight against the pandemic, but vaccine hesitancy jeopardizes any public health effort.
In this study, sentiment analysis on South African tweets related to vaccine hesitancy was performed, with the aim of training AI-mediated classification models.
arXiv Detail & Related papers (2023-07-12T13:28:37Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - Enabling Classifiers to Make Judgements Explicitly Aligned with Human
Values [73.82043713141142]
Many NLP classification tasks, such as sexism/racism detection or toxicity detection, are based on human values.
We introduce a framework for value-aligned classification that performs prediction based on explicitly written human values in the command.
arXiv Detail & Related papers (2022-10-14T09:10:49Z) - Multi-channel CNN to classify nepali covid-19 related tweets using
hybrid features [1.713291434132985]
We represent each tweet by combining both syntactic and semantic information, called hybrid features.
We design a novel multi-channel convolutional neural network (MCNN), which ensembles the multiple CNNs.
We evaluate the efficacy of both the proposed feature extraction method and the MCNN model classifying tweets on NepCOV19Tweets dataset.
arXiv Detail & Related papers (2022-03-19T09:55:05Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Automatic sleep stage classification with deep residual networks in a
mixed-cohort setting [63.52264764099532]
We developed a novel deep neural network model to assess the generalizability of several large-scale cohorts.
Overall classification accuracy improved with increasing fractions of training data.
arXiv Detail & Related papers (2020-08-21T10:48:35Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z) - Investigating Classification Techniques with Feature Selection For
Intention Mining From Twitter Feed [0.0]
Micro-blogging service Twitter has more than 200 million registered users who exchange more than 65 million posts per day.
Most of the tweets are written informally and often in slang language.
This paper investigates the problem of selecting features that affect extracting user's intention from Twitter feeds.
arXiv Detail & Related papers (2020-01-22T11:55:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.