Leveraging Foreign Language Labeled Data for Aspect-Based Opinion Mining
- URL: http://arxiv.org/abs/2003.06858v1
- Date: Sun, 15 Mar 2020 15:53:53 GMT
- Title: Leveraging Foreign Language Labeled Data for Aspect-Based Opinion Mining
- Authors: Nguyen Thi Thanh Thuy, Ngo Xuan Bach, Tu Minh Phuong
- Abstract summary: We present a supervised aspect-based opinion mining method that utilizes labeled data from a foreign language.
Because aspects and opinions in different languages may be expressed by different words, we propose using word embeddings.
We also introduce an annotated corpus of aspect and sentiment polarities extracted from restaurant reviews in Vietnamese.
- Score: 1.503974529275767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Aspect-based opinion mining is the task of identifying sentiment at the
aspect level in opinionated text, which consists of two subtasks: aspect
category extraction and sentiment polarity classification. While aspect
category extraction aims to detect and categorize opinion targets such as
product features, sentiment polarity classification assigns a sentiment label,
i.e. positive, negative, or neutral, to each identified aspect. Supervised
learning methods have been shown to deliver better accuracy for this task but
they require labeled data, which is costly to obtain, especially for
resource-poor languages like Vietnamese. To address this problem, we present a
supervised aspect-based opinion mining method that utilizes labeled data from a
foreign language (English in this case), which is translated to Vietnamese by
an automated translation tool (Google Translate). Because aspects and opinions
in different languages may be expressed by different words, we propose using
word embeddings, in addition to other features, to reduce the vocabulary
difference between the original and translated texts, thus improving the
effectiveness of aspect category extraction and sentiment polarity
classification processes. We also introduce an annotated corpus of aspect
categories and sentiment polarities extracted from restaurant reviews in
Vietnamese, and conduct a series of experiments on the corpus. Experimental
results demonstrate the effectiveness of the proposed approach.
Related papers
- Experiences from Creating a Benchmark for Sentiment Classification for Varieties of English [8.823927892310238]
Existing benchmarks often fail to account for linguistic diversity, like language variants of English.
In this paper, we share our experiences from building a sentiment classification benchmark for three variants of English: Australian (en-AU), Indian (en-IN), and British (en-UK) English.
arXiv Detail & Related papers (2024-10-15T03:02:03Z) - The Lou Dataset -- Exploring the Impact of Gender-Fair Language in German Text Classification [57.06913662622832]
Gender-fair language fosters inclusion by addressing all genders or using neutral forms.
Gender-fair language substantially impacts predictions by flipping labels, reducing certainty, and altering attention patterns.
While we offer initial insights on the effect on German text classification, the findings likely apply to other languages.
arXiv Detail & Related papers (2024-09-26T15:08:17Z) - Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - CARBD-Ko: A Contextually Annotated Review Benchmark Dataset for
Aspect-Level Sentiment Classification in Korean [3.2146698079532867]
This paper explores the challenges posed by aspect-based sentiment classification (ABSC) within pretrained language models (PLMs)
We introduce CARBD-Ko, a benchmark dataset that incorporates aspects and dual-tagged polarities to distinguish between aspect-specific and aspect-agnostic sentiment classification.
Our experimental findings highlight the inherent difficulties in accurately predicting dual-polarities and underscore the significance of contextualized sentiment analysis models.
arXiv Detail & Related papers (2024-02-23T01:49:38Z) - The performance of multiple language models in identifying offensive
language on social media [6.221851249300585]
The aim of this research is to use a variety of algorithms to test the ability to identify offensive posts.
The motivation for this project is to reduce the harm of these languages to human censors by automating the screening of offending posts.
arXiv Detail & Related papers (2023-12-10T18:58:26Z) - A Corpus for Sentence-level Subjectivity Detection on English News Articles [49.49218203204942]
We use our guidelines to collect NewsSD-ENG, a corpus of 638 objective and 411 subjective sentences extracted from English news articles on controversial topics.
Our corpus paves the way for subjectivity detection in English without relying on language-specific tools, such as lexicons or machine translation.
arXiv Detail & Related papers (2023-05-29T11:54:50Z) - Under the Morphosyntactic Lens: A Multifaceted Evaluation of Gender Bias
in Speech Translation [20.39599469927542]
Gender bias is largely recognized as a problematic phenomenon affecting language technologies.
Most of current evaluation practices adopt a word-level focus on a narrow set of occupational nouns under synthetic conditions.
Such protocols overlook key features of grammatical gender languages, which are characterized by morphosyntactic chains of gender agreement.
arXiv Detail & Related papers (2022-03-18T11:14:16Z) - A New Generation of Perspective API: Efficient Multilingual
Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw.
At the heart of the approach is a single multilingual token-free Charformer model.
We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z) - Fine-Grained Opinion Summarization with Minimal Supervision [48.43506393052212]
FineSum aims to profile a target by extracting opinions from multiple documents.
FineSum automatically identifies opinion phrases from the raw corpus, classifies them into different aspects and sentiments, and constructs multiple fine-grained opinion clusters under each aspect/sentiment.
Both automatic evaluation on the benchmark and quantitative human evaluation validate the effectiveness of our approach.
arXiv Detail & Related papers (2021-10-17T15:16:34Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - A novel approach to sentiment analysis in Persian using discourse and
external semantic information [0.0]
Many approaches have been proposed to extract the sentiment of individuals from documents written in natural languages.
The majority of these approaches have focused on English, while resource-lean languages such as Persian suffer from the lack of research work and language resources.
Due to this gap in Persian, the current work is accomplished to introduce new methods for sentiment analysis which have been applied on Persian.
arXiv Detail & Related papers (2020-07-18T18:40:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.