Sentiment Analysis in Drug Reviews using Supervised Machine Learning
Algorithms
- URL: http://arxiv.org/abs/2003.11643v1
- Date: Sat, 21 Mar 2020 20:13:11 GMT
- Title: Sentiment Analysis in Drug Reviews using Supervised Machine Learning
Algorithms
- Authors: Sairamvinay Vijayaraghavan, Debraj Basu
- Abstract summary: We had chosen to work on analyzing reviews of various drugs which have been reviewed in form of texts.
We had trained models on the most popular conditions such as "Birth Control", "Depression" and "Pain"
Our intention was mainly to implement supervised machine learning classification algorithms that predict the class of the rating.
- Score: 1.14219428942199
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sentiment Analysis is an important algorithm in Natural Language Processing
which is used to detect sentiment within some text. In our project, we had
chosen to work on analyzing reviews of various drugs which have been reviewed
in form of texts and have also been given a rating on a scale from 1-10. We had
obtained this data set from the UCI machine learning repository which had 2
data sets: train and test (split as 75-25\%). We had split the number rating
for the drug into three classes in general: positive (7-10), negative (1-4) or
neutral(4-7). There are multiple reviews for the drugs that belong to a similar
condition and we decided to investigate how the reviews for different
conditions use different words impact the ratings of the drugs. Our intention
was mainly to implement supervised machine learning classification algorithms
that predict the class of the rating using the textual review. We had primarily
implemented different embeddings such as Term Frequency Inverse Document
Frequency (TFIDF) and the Count Vectors (CV). We had trained models on the most
popular conditions such as "Birth Control", "Depression" and "Pain" within the
data set and obtained good results while predicting the test data sets.
Related papers
- "Hey..! This medicine made me sick": Sentiment Analysis of User-Generated Drug Reviews using Machine Learning Techniques [2.2874754079405535]
This project proposes a drug review classification system that classifies user reviews on a particular drug into different classes, such as positive, negative, and neutral.
The collected data is manually labeled and verified manually to ensure that the labels are correct.
arXiv Detail & Related papers (2024-04-09T08:42:34Z) - A Pretrainer's Guide to Training Data: Measuring the Effects of Data
Age, Domain Coverage, Quality, & Toxicity [84.6421260559093]
This study is the largest set of experiments to validate, quantify, and expose undocumented intuitions about text pretraining.
Our findings indicate there does not exist a one-size-fits-all solution to filtering training data.
arXiv Detail & Related papers (2023-05-22T15:57:53Z) - Improving Health Mentioning Classification of Tweets using Contrastive
Adversarial Training [6.586675643422952]
We learn word representation by its surrounding words and utilize emojis in the text to help improve the classification results.
We generate adversarial examples by perturbing the embeddings of the model and then train the model on a pair of clean and adversarial examples.
Experiments show an improvement of 1.0% over BERT-Large baseline and 0.6% over RoBERTa-Large baseline, whereas 5.8% over the state-of-the-art in terms of F1 score.
arXiv Detail & Related papers (2022-03-03T18:20:51Z) - A novel data-driven algorithm to predict anomalous prescription based on
patient's feature set [0.0]
Current quality assurance depends heavily on a peer-review process, where the physicians' peer review on each patient's treatment plan.
We designed a novel prescription anomaly detection algorithm that utilizes historical data to predict anomalous cases.
Our model has a lower type 2 error rate compared to manual peer-review physicians.
arXiv Detail & Related papers (2021-11-30T03:40:24Z) - Detecting Handwritten Mathematical Terms with Sensor Based Data [71.84852429039881]
We propose a solution to the UbiComp 2021 Challenge by Stabilo in which handwritten mathematical terms are supposed to be automatically classified.
The input data set contains data of different writers, with label strings constructed from a total of 15 different possible characters.
arXiv Detail & Related papers (2021-09-12T19:33:34Z) - Double Perturbation: On the Robustness of Robustness and Counterfactual
Bias Evaluation [109.06060143938052]
We propose a "double perturbation" framework to uncover model weaknesses beyond the test dataset.
We apply this framework to study two perturbation-based approaches that are used to analyze models' robustness and counterfactual bias in English.
arXiv Detail & Related papers (2021-04-12T06:57:36Z) - Explainable Multi-class Classification of Medical Data [0.9137554315375922]
We present explainable multi-class classification of a large medical data set.
Six algorithms are used in this study: Support Vector Machine (SVM), Na"ive Bayes, Gradient Boosting, Decision Trees, Random Forest, and Logistic Regression.
Our results show that using 23 medication features in learning experiments improves Recall of five out of the six applied learning algorithms.
arXiv Detail & Related papers (2020-12-26T18:56:07Z) - Tweet Sentiment Quantification: An Experimental Re-Evaluation [88.60021378715636]
Sentiment quantification is the task of training, by means of supervised learning, estimators of the relative frequency (also called prevalence'') of sentiment-related classes.
We re-evaluate those quantification methods following a now consolidated and much more robust experimental protocol.
Results are dramatically different from those obtained by Gao Gao Sebastiani, and they provide a different, much more solid understanding of the relative strengths and weaknesses of different sentiment quantification methods.
arXiv Detail & Related papers (2020-11-04T21:41:34Z) - Hierarchical Bi-Directional Self-Attention Networks for Paper Review
Rating Recommendation [81.55533657694016]
We propose a Hierarchical bi-directional self-attention Network framework (HabNet) for paper review rating prediction and recommendation.
Specifically, we leverage the hierarchical structure of the paper reviews with three levels of encoders: sentence encoder (level one), intra-review encoder (level two) and inter-review encoder (level three)
We are able to identify useful predictors to make the final acceptance decision, as well as to help discover the inconsistency between numerical review ratings and text sentiment conveyed by reviewers.
arXiv Detail & Related papers (2020-11-02T08:07:50Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - Utilizing Deep Learning to Identify Drug Use on Twitter Data [0.0]
The classification power of multiple methods was compared including support vector machines (SVM), XGBoost, and convolutional neural network (CNN) based classifiers.
The accuracy scores were 76.35% and 82.31%, with an AUC of 0.90 and 0.91.
The synthetically generated set provided increased scores, improving the classification capability and proving the worth of this methodology.
arXiv Detail & Related papers (2020-03-08T07:52:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.