Quality of Word Embeddings on Sentiment Analysis Tasks
- URL: http://arxiv.org/abs/2003.03264v1
- Date: Fri, 6 Mar 2020 15:03:08 GMT
- Title: Quality of Word Embeddings on Sentiment Analysis Tasks
- Authors: Erion \c{C}ano and Maurizio Morisio
- Abstract summary: We compare performance of a dozen of pretrained word embedding models on lyrics sentiment analysis and movie review polarity tasks.
According to our results, Twitter Tweets is the best on lyrics sentiment analysis, whereas Google News and Common Crawl are the top performers on movie polarity analysis.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Word embeddings or distributed representations of words are being used in
various applications like machine translation, sentiment analysis, topic
identification etc. Quality of word embeddings and performance of their
applications depends on several factors like training method, corpus size and
relevance etc. In this study we compare performance of a dozen of pretrained
word embedding models on lyrics sentiment analysis and movie review polarity
tasks. According to our results, Twitter Tweets is the best on lyrics sentiment
analysis, whereas Google News and Common Crawl are the top performers on movie
polarity analysis. Glove trained models slightly outrun those trained with
Skipgram. Also, factors like topic relevance and size of corpus significantly
impact the quality of the models. When medium or large-sized text sets are
available, obtaining word embeddings from same training dataset is usually the
best choice.
Related papers
- A Comparison of Lexicon-Based and ML-Based Sentiment Analysis: Are There
Outlier Words? [14.816706893177997]
In this paper we compute sentiment for more than 150,000 English language texts drawn from 4 domains.
We model differences in sentiment scores between approaches for documents in each domain using a regression.
Our findings are that the importance of a word depends on the domain and there are no standout lexical entries which systematically cause differences in sentiment scores.
arXiv Detail & Related papers (2023-11-10T18:21:50Z) - Sentiment-Aware Word and Sentence Level Pre-training for Sentiment
Analysis [64.70116276295609]
SentiWSP is a Sentiment-aware pre-trained language model with combined Word-level and Sentence-level Pre-training tasks.
SentiWSP achieves new state-of-the-art performance on various sentence-level and aspect-level sentiment classification benchmarks.
arXiv Detail & Related papers (2022-10-18T12:25:29Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - Contextual Embeddings: When Are They Worth It? [14.582968294755794]
We study the settings for which deep contextual embeddings give large improvements in performance relative to classic pretrained embeddings.
We find that both of these simpler baselines can match contextual embeddings on industry-scale data.
We identify properties of data for which contextual embeddings give particularly large gains: language containing complex structure, ambiguous word usage, and words unseen in training.
arXiv Detail & Related papers (2020-05-18T22:20:17Z) - Comparative Analysis of Word Embeddings for Capturing Word Similarities [0.0]
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks.
Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings.
selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans.
arXiv Detail & Related papers (2020-05-08T01:16:03Z) - Compass-aligned Distributional Embeddings for Studying Semantic
Differences across Corpora [14.993021283916008]
We present a framework to support cross-corpora language studies with word embeddings.
CADE is the core component of our framework and solves the key problem of aligning the embeddings generated from different corpora.
The results of our experiments suggest that CADE achieves state-of-the-art or superior performance on tasks where several competing approaches are available.
arXiv Detail & Related papers (2020-04-13T15:46:47Z) - A Deep Neural Framework for Contextual Affect Detection [51.378225388679425]
A short and simple text carrying no emotion can represent some strong emotions when reading along with its context.
We propose a Contextual Affect Detection framework which learns the inter-dependence of words in a sentence.
arXiv Detail & Related papers (2020-01-28T05:03:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.