Sentiment Analysis on Brazilian Portuguese User Reviews
- URL: http://arxiv.org/abs/2112.05459v1
- Date: Fri, 10 Dec 2021 11:18:26 GMT
- Title: Sentiment Analysis on Brazilian Portuguese User Reviews
- Authors: Frederico Souza, Jo\~ao Filho
- Abstract summary: This work analyzes the predictive performance of a range of document embedding strategies, assuming the polarity as the system outcome.
This analysis includes five sentiment analysis datasets in Brazilian Portuguese, unified in a single dataset, and a reference partitioning in training, testing, and validation sets, both made publicly available through a digital repository.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Sentiment Analysis is one of the most classical and primarily studied natural
language processing tasks. This problem had a notable advance with the
proposition of more complex and scalable machine learning models. Despite this
progress, the Brazilian Portuguese language still disposes only of limited
linguistic resources, such as datasets dedicated to sentiment classification,
especially when considering the existence of predefined partitions in training,
testing, and validation sets that would allow a more fair comparison of
different algorithm alternatives. Motivated by these issues, this work analyzes
the predictive performance of a range of document embedding strategies,
assuming the polarity as the system outcome. This analysis includes five
sentiment analysis datasets in Brazilian Portuguese, unified in a single
dataset, and a reference partitioning in training, testing, and validation
sets, both made publicly available through a digital repository. A
cross-evaluation of dataset-specific models over different contexts is
conducted to evaluate their generalization capabilities and the feasibility of
adopting a unique model for addressing all scenarios.
Related papers
- A deep Natural Language Inference predictor without language-specific
training data [44.26507854087991]
We present a technique of NLP to tackle the problem of inference relation (NLI) between pairs of sentences in a target language of choice without a language-specific training dataset.
We exploit a generic translation dataset, manually translated, along with two instances of the same pre-trained model.
The model has been evaluated over machine translated Stanford NLI test dataset, machine translated Multi-Genre NLI test dataset, and manually translated RTE3-ITA test dataset.
arXiv Detail & Related papers (2023-09-06T10:20:59Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - Sentiment Classification of Code-Switched Text using Pre-trained
Multilingual Embeddings and Segmentation [1.290382979353427]
We propose a multi-step natural language processing algorithm for code-switched sentiment analysis.
The proposed algorithm can be expanded for sentiment analysis of multiple languages with limited human expertise.
arXiv Detail & Related papers (2022-10-29T01:52:25Z) - FRMT: A Benchmark for Few-Shot Region-Aware Machine Translation [64.9546787488337]
We present FRMT, a new dataset and evaluation benchmark for Few-shot Region-aware Machine Translation.
The dataset consists of professional translations from English into two regional variants each of Portuguese and Mandarin Chinese.
arXiv Detail & Related papers (2022-10-01T05:02:04Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Automatic Construction of Evaluation Suites for Natural Language
Generation Datasets [17.13484629172643]
We develop a framework to generate controlled perturbations and identify subsets in text-to-scalar, text-to-text, or data-to-text settings.
We propose an evaluation suite made of 80 challenge sets, demonstrate the kinds of analyses that it enables and shed light onto the limits of current generation models.
arXiv Detail & Related papers (2021-06-16T18:20:58Z) - TextFlint: Unified Multilingual Robustness Evaluation Toolkit for
Natural Language Processing [73.16475763422446]
We propose a multilingual robustness evaluation platform for NLP tasks (TextFlint)
It incorporates universal text transformation, task-specific transformation, adversarial attack, subpopulation, and their combinations to provide comprehensive robustness analysis.
TextFlint generates complete analytical reports as well as targeted augmented data to address the shortcomings of the model's robustness.
arXiv Detail & Related papers (2021-03-21T17:20:38Z) - Fine-Grained Analysis of Cross-Linguistic Syntactic Divergences [18.19093600136057]
We propose a framework for extracting divergence patterns for any language pair from a parallel corpus.
We show that our framework provides a detailed picture of cross-language divergences, generalizes previous approaches, and lends itself to full automation.
arXiv Detail & Related papers (2020-05-07T13:05:03Z) - XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages.
We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.