Study of sampling methods in sentiment analysis of imbalanced data
- URL: http://arxiv.org/abs/2106.06673v1
- Date: Sat, 12 Jun 2021 03:16:18 GMT
- Title: Study of sampling methods in sentiment analysis of imbalanced data
- Authors: Zeeshan Ali Sayyed
- Abstract summary: This work investigates the application of sampling methods for sentiment analysis on two different datasets.
One dataset contains online user reviews from the cooking platform Epicurious and the other contains comments given to the Planned Parenthood organization.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work investigates the application of sampling methods for sentiment
analysis on two different highly imbalanced datasets. One dataset contains
online user reviews from the cooking platform Epicurious and the other contains
comments given to the Planned Parenthood organization. In both these datasets,
the classes of interest are rare. Word n-grams were used as features from these
datasets. A feature selection technique based on information gain is first
applied to reduce the number of features to a manageable space. A number of
different sampling methods were then applied to mitigate the class imbalance
problem which are then analyzed.
Related papers
- Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Reinforced Approximate Exploratory Data Analysis [7.974685452145769]
We are first to consider the impact of sampling in interactive data exploration settings as they introduce approximation errors.
We propose a Deep Reinforcement Learning (DRL) based framework which can optimize the sample selection in order to keep the analysis and insight generation flow intact.
arXiv Detail & Related papers (2022-12-12T20:20:22Z) - Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular
data [81.43750358586072]
We propose Data-IQ, a framework to systematically stratify examples into subgroups with respect to their outcomes.
We experimentally demonstrate the benefits of Data-IQ on four real-world medical datasets.
arXiv Detail & Related papers (2022-10-24T08:57:55Z) - Learning Classifiers for Imbalanced and Overlapping Data [0.0]
This study is about inducing classifiers using data that is imbalanced.
A minority class is under-represented in relation to the majority classes.
This paper further optimises class imbalance with a new method called Sparsity.
arXiv Detail & Related papers (2022-10-22T13:31:38Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - A Study imbalance handling by various data sampling methods in binary
classification [0.0]
This research report is to present the our learning curve and the exposure to the Machine Learning life cycle.
We take to explore various techniques from pre-processing to the final optimization and model evaluation.
arXiv Detail & Related papers (2021-05-23T15:27:47Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - Interpreting Deep Models through the Lens of Data [5.174367472975529]
This paper presents an in-depth analysis of the methods which attempt to identify the influence of these data points on the resulting classifier.
We show that some interpretability methods can detect mislabels better than using a random approach, however, the sample selection based on the training loss showed a superior performance.
arXiv Detail & Related papers (2020-05-05T07:59:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.