Cross-domain Sentiment Classification in Spanish
- URL: http://arxiv.org/abs/2303.08985v1
- Date: Wed, 15 Mar 2023 23:11:30 GMT
- Title: Cross-domain Sentiment Classification in Spanish
- Authors: Lautaro Estienne, Matias Vera, Leonardo Rey Vega
- Abstract summary: We study the ability of a classification system trained with a large database of product reviews to generalize to different Spanish domains.
Results suggest that generalization across domains is feasible though very challenging when trained with these product reviews.
- Score: 18.563342761346608
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sentiment Classification is a fundamental task in the field of Natural
Language Processing, and has very important academic and commercial
applications. It aims to automatically predict the degree of sentiment present
in a text that contains opinions and subjectivity at some level, like product
and movie reviews, or tweets. This can be really difficult to accomplish, in
part, because different domains of text contains different words and
expressions. In addition, this difficulty increases when text is written in a
non-English language due to the lack of databases and resources. As a
consequence, several cross-domain and cross-language techniques are often
applied to this task in order to improve the results. In this work we perform a
study on the ability of a classification system trained with a large database
of product reviews to generalize to different Spanish domains. Reviews were
collected from the MercadoLibre website from seven Latin American countries,
allowing the creation of a large and balanced dataset. Results suggest that
generalization across domains is feasible though very challenging when trained
with these product reviews, and can be improved by pre-training and fine-tuning
the classification model.
Related papers
- Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - Adapting Large Language Models to Domains via Reading Comprehension [86.24451681746676]
We explore how continued pre-training on domain-specific corpora influences large language models.
We show that training on the raw corpora endows the model with domain knowledge, but drastically hurts its ability for question answering.
We propose a simple method for transforming raw corpora into reading comprehension texts.
arXiv Detail & Related papers (2023-09-18T07:17:52Z) - A Curriculum Learning Approach for Multi-domain Text Classification
Using Keyword weight Ranking [17.71297141482757]
We propose to use a curriculum learning strategy based on keyword weight ranking to improve the performance of multi-domain text classification models.
The experimental results on the Amazon review and FDU-MTL datasets show that our curriculum learning strategy effectively improves the performance of multi-domain text classification models.
arXiv Detail & Related papers (2022-10-27T03:15:26Z) - A New Generation of Perspective API: Efficient Multilingual
Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw.
At the heart of the approach is a single multilingual token-free Charformer model.
We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z) - A Case Study and Qualitative Analysis of Simple Cross-Lingual Opinion
Mining [0.0]
We propose a method for building a single topic model with sentiment analysis capable of covering multiple languages simultanteously.
We apply the model to newspaper articles and user comments of a specific domain, i.e., organic food products.
We obtain a high proportion of stable and domain-relevant topics, a meaningful relation between topics and their respective contents, and an interpretable representation for social media documents.
arXiv Detail & Related papers (2021-11-03T14:49:50Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Bangla Text Classification using Transformers [2.3475904942266697]
Text classification has been one of the earliest problems in NLP.
In this work, we fine-tune multilingual Transformer models for Bangla text classification tasks.
We obtain the state of the art results on six benchmark datasets, improving upon the previous results by 5-29% accuracy across different tasks.
arXiv Detail & Related papers (2020-11-09T14:12:07Z) - Rank over Class: The Untapped Potential of Ranking in Natural Language
Processing [8.637110868126546]
We argue that many tasks which are currently addressed using classification are in fact being shoehorned into a classification mould.
We propose a novel end-to-end ranking approach consisting of a Transformer network responsible for producing representations for a pair of text sequences.
In an experiment on a heavily-skewed sentiment analysis dataset, converting ranking results to classification labels yields an approximately 22% improvement over state-of-the-art text classification.
arXiv Detail & Related papers (2020-09-10T22:18:57Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text
Classification [52.69730591919885]
We present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations.
We observe significant gains in effectiveness on document and intent classification for a diverse set of languages.
arXiv Detail & Related papers (2020-07-29T19:38:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.