Dimensionality Reduction for Sentiment Classification: Evolving for the
Most Prominent and Separable Features
- URL: http://arxiv.org/abs/2006.04680v1
- Date: Mon, 1 Jun 2020 09:46:52 GMT
- Title: Dimensionality Reduction for Sentiment Classification: Evolving for the
Most Prominent and Separable Features
- Authors: Aftab Anjum, Mazharul Islam, Lin Wang
- Abstract summary: In sentiment classification, the enormous amount of textual data, its immense dimensionality, and inherent noise make it extremely difficult for machine learning classifiers to extract high-level and complex abstractions.
In the existing dimensionality reduction techniques, the number of components needs to be set manually which results in loss of the most prominent features.
We have proposed a new framework that consists of two-dimensionality reduction techniques i.e., Sentiment Term Presence Count (SentiTPC) and Sentiment Term Presence Ratio (SentiTPR)
- Score: 4.156782836736784
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In sentiment classification, the enormous amount of textual data, its immense
dimensionality, and inherent noise make it extremely difficult for machine
learning classifiers to extract high-level and complex abstractions. In order
to make the data less sparse and more statistically significant, the
dimensionality reduction techniques are needed. But in the existing
dimensionality reduction techniques, the number of components needs to be set
manually which results in loss of the most prominent features, thus reducing
the performance of the classifiers. Our prior work, i.e., Term Presence Count
(TPC) and Term Presence Ratio (TPR) have proven to be effective techniques as
they reject the less separable features. However, the most prominent and
separable features might still get removed from the initial feature set despite
having higher distributions among positive and negative tagged documents. To
overcome this problem, we have proposed a new framework that consists of
two-dimensionality reduction techniques i.e., Sentiment Term Presence Count
(SentiTPC) and Sentiment Term Presence Ratio (SentiTPR). These techniques
reject the features by considering term presence difference for SentiTPC and
ratio of the distribution distinction for SentiTPR. Additionally, these methods
also analyze the total distribution information. Extensive experimental results
exhibit that the proposed framework reduces the feature dimension by a large
scale, and thus significantly improve the classification performance.
Related papers
- Mitigating the Effect of Incidental Correlations on Part-based Learning [50.682498099720114]
Part-based representations could be more interpretable and generalize better with limited data.
We present two innovative regularization methods for part-based representations.
We exhibit state-of-the-art (SoTA) performance on few-shot learning tasks on benchmark datasets.
arXiv Detail & Related papers (2023-09-30T13:44:48Z) - Enhancing Representation Learning on High-Dimensional, Small-Size
Tabular Data: A Divide and Conquer Method with Ensembled VAEs [7.923088041693465]
We present an ensemble of lightweight VAEs to learn posteriors over subsets of the feature-space, which get aggregated into a joint posterior in a novel divide-and-conquer approach.
We show that our approach is robust to partial features at inference, exhibiting little performance degradation even with most features missing.
arXiv Detail & Related papers (2023-06-27T17:55:31Z) - Interpretable Linear Dimensionality Reduction based on Bias-Variance
Analysis [45.3190496371625]
We propose a principled dimensionality reduction approach that maintains the interpretability of the resulting features.
In this way, all features are considered, the dimensionality is reduced and the interpretability is preserved.
arXiv Detail & Related papers (2023-03-26T14:30:38Z) - Fine-grained Retrieval Prompt Tuning [149.9071858259279]
Fine-grained Retrieval Prompt Tuning steers a frozen pre-trained model to perform the fine-grained retrieval task from the perspectives of sample prompt and feature adaptation.
Our FRPT with fewer learnable parameters achieves the state-of-the-art performance on three widely-used fine-grained datasets.
arXiv Detail & Related papers (2022-07-29T04:10:04Z) - Exploring Dimensionality Reduction Techniques in Multilingual
Transformers [64.78260098263489]
This paper gives a comprehensive account of the impact of dimensional reduction techniques on the performance of state-of-the-art multilingual Siamese Transformers.
It shows that it is possible to achieve an average reduction in the number of dimensions of $91.58% pm 2.59%$ and $54.65% pm 32.20%$, respectively.
arXiv Detail & Related papers (2022-04-18T17:20:55Z) - Compressibility of Distributed Document Representations [0.0]
CoRe is a representation learner-agnostic framework suitable for representation compression.
We show CoRe's behavior when considering contextual and non-contextual document representations, different compression levels, and 9 different compression algorithms.
Results based on more than 100,000 compression experiments indicate that CoRe offers a very good trade-off between the compression efficiency and performance.
arXiv Detail & Related papers (2021-10-14T17:56:35Z) - Dynamic Feature Regularized Loss for Weakly Supervised Semantic
Segmentation [37.43674181562307]
We propose a new regularized loss which utilizes both shallow and deep features that are dynamically updated.
Our approach achieves new state-of-the-art performances, outperforming other approaches by a significant margin with more than 6% mIoU increase.
arXiv Detail & Related papers (2021-08-03T05:11:00Z) - A Simple Baseline for Semi-supervised Semantic Segmentation with Strong
Data Augmentation [74.8791451327354]
We propose a simple yet effective semi-supervised learning framework for semantic segmentation.
A set of simple design and training techniques can collectively improve the performance of semi-supervised semantic segmentation significantly.
Our method achieves state-of-the-art results in the semi-supervised settings on the Cityscapes and Pascal VOC datasets.
arXiv Detail & Related papers (2021-04-15T06:01:39Z) - Unsupervised low-rank representations for speech emotion recognition [78.38221758430244]
We examine the use of linear and non-linear dimensionality reduction algorithms for extracting low-rank feature representations for speech emotion recognition.
We report speech emotion recognition (SER) results for learned representations on two databases using different classification methods.
arXiv Detail & Related papers (2021-04-14T18:30:58Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Supervised Visualization for Data Exploration [9.742277703732187]
We describe a novel supervised visualization technique based on random forest proximities and diffusion-based dimensionality reduction.
Our approach is robust to noise and parameter tuning, thus making it simple to use while producing reliable visualizations for data exploration.
arXiv Detail & Related papers (2020-06-15T19:10:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.