Sentiment Classification of Code-Switched Text using Pre-trained
Multilingual Embeddings and Segmentation
- URL: http://arxiv.org/abs/2210.16461v1
- Date: Sat, 29 Oct 2022 01:52:25 GMT
- Title: Sentiment Classification of Code-Switched Text using Pre-trained
Multilingual Embeddings and Segmentation
- Authors: Saurav K. Aryal, Howard Prioleau, and Gloria Washington
- Abstract summary: We propose a multi-step natural language processing algorithm for code-switched sentiment analysis.
The proposed algorithm can be expanded for sentiment analysis of multiple languages with limited human expertise.
- Score: 1.290382979353427
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: With increasing globalization and immigration, various studies have estimated
that about half of the world population is bilingual. Consequently, individuals
concurrently use two or more languages or dialects in casual conversational
settings. However, most research is natural language processing is focused on
monolingual text. To further the work in code-switched sentiment analysis, we
propose a multi-step natural language processing algorithm utilizing points of
code-switching in mixed text and conduct sentiment analysis around those
identified points. The proposed sentiment analysis algorithm uses semantic
similarity derived from large pre-trained multilingual models with a
handcrafted set of positive and negative words to determine the polarity of
code-switched text. The proposed approach outperforms a comparable baseline
model by 11.2% for accuracy and 11.64% for F1-score on a Spanish-English
dataset. Theoretically, the proposed algorithm can be expanded for sentiment
analysis of multiple languages with limited human expertise.
Related papers
- GradSim: Gradient-Based Language Grouping for Effective Multilingual
Training [13.730907708289331]
We propose GradSim, a language grouping method based on gradient similarity.
Our experiments on three diverse multilingual benchmark datasets show that it leads to the largest performance gains.
Besides linguistic features, the topics of the datasets play an important role for language grouping.
arXiv Detail & Related papers (2023-10-23T18:13:37Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Meta-Learning a Cross-lingual Manifold for Semantic Parsing [75.26271012018861]
Localizing a semantic to support new languages requires effective cross-lingual generalization.
We introduce a first-order meta-learning algorithm to train a semantic annotated with maximal sample efficiency during cross-lingual transfer.
Results across six languages on ATIS demonstrate that our combination of steps yields accurate semantics sampling $le$10% of source training data in each new language.
arXiv Detail & Related papers (2022-09-26T10:42:17Z) - Sentiment Analysis on Brazilian Portuguese User Reviews [0.0]
This work analyzes the predictive performance of a range of document embedding strategies, assuming the polarity as the system outcome.
This analysis includes five sentiment analysis datasets in Brazilian Portuguese, unified in a single dataset, and a reference partitioning in training, testing, and validation sets, both made publicly available through a digital repository.
arXiv Detail & Related papers (2021-12-10T11:18:26Z) - Monolingual and Cross-Lingual Acceptability Judgments with the Italian
CoLA corpus [2.418273287232718]
We describe the ItaCoLA corpus, containing almost 10,000 sentences with acceptability judgments.
We also present the first cross-lingual experiments, aimed at assessing whether multilingual transformerbased approaches can benefit from using sentences in two languages during fine-tuning.
arXiv Detail & Related papers (2021-09-24T16:18:53Z) - A Massively Multilingual Analysis of Cross-linguality in Shared
Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space.
We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance.
We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics.
Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings.
We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z) - Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual
Lexical Semantic Similarity [67.36239720463657]
Multi-SimLex is a large-scale lexical resource and evaluation benchmark covering datasets for 12 diverse languages.
Each language dataset is annotated for the lexical relation of semantic similarity and contains 1,888 semantically aligned concept pairs.
Owing to the alignment of concepts across languages, we provide a suite of 66 cross-lingual semantic similarity datasets.
arXiv Detail & Related papers (2020-03-10T17:17:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.