Related papers: Leveraging Foreign Language Labeled Data for Aspect-Based Opinion Mining

Leveraging Foreign Language Labeled Data for Aspect-Based Opinion Mining

URL: http://arxiv.org/abs/2003.06858v1
Date: Sun, 15 Mar 2020 15:53:53 GMT
Title: Leveraging Foreign Language Labeled Data for Aspect-Based Opinion Mining
Authors: Nguyen Thi Thanh Thuy, Ngo Xuan Bach, Tu Minh Phuong
Abstract summary: We present a supervised aspect-based opinion mining method that utilizes labeled data from a foreign language. Because aspects and opinions in different languages may be expressed by different words, we propose using word embeddings. We also introduce an annotated corpus of aspect and sentiment polarities extracted from restaurant reviews in Vietnamese.
Score: 1.503974529275767
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Aspect-based opinion mining is the task of identifying sentiment at the aspect level in opinionated text, which consists of two subtasks: aspect category extraction and sentiment polarity classification. While aspect category extraction aims to detect and categorize opinion targets such as product features, sentiment polarity classification assigns a sentiment label, i.e. positive, negative, or neutral, to each identified aspect. Supervised learning methods have been shown to deliver better accuracy for this task but they require labeled data, which is costly to obtain, especially for resource-poor languages like Vietnamese. To address this problem, we present a supervised aspect-based opinion mining method that utilizes labeled data from a foreign language (English in this case), which is translated to Vietnamese by an automated translation tool (Google Translate). Because aspects and opinions in different languages may be expressed by different words, we propose using word embeddings, in addition to other features, to reduce the vocabulary difference between the original and translated texts, thus improving the effectiveness of aspect category extraction and sentiment polarity classification processes. We also introduce an annotated corpus of aspect categories and sentiment polarities extracted from restaurant reviews in Vietnamese, and conduct a series of experiments on the corpus. Experimental results demonstrate the effectiveness of the proposed approach.

Related papers

Experiences from Creating a Benchmark for Sentiment Classification for Varieties of English [8.823927892310238]
Existing benchmarks often fail to account for linguistic diversity, like language variants of English. In this paper, we share our experiences from building a sentiment classification benchmark for three variants of English: Australian (en-AU), Indian (en-IN), and British (en-UK) English.
arXiv Detail & Related papers (2024-10-15T03:02:03Z)
The Lou Dataset -- Exploring the Impact of Gender-Fair Language in German Text Classification [57.06913662622832]
Gender-fair language fosters inclusion by addressing all genders or using neutral forms. Gender-fair language substantially impacts predictions by flipping labels, reducing certainty, and altering attention patterns. While we offer initial insights on the effect on German text classification, the findings likely apply to other languages.
arXiv Detail & Related papers (2024-09-26T15:08:17Z)
Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models. We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z)
CARBD-Ko: A Contextually Annotated Review Benchmark Dataset for Aspect-Level Sentiment Classification in Korean [3.2146698079532867]
This paper explores the challenges posed by aspect-based sentiment classification (ABSC) within pretrained language models (PLMs) We introduce CARBD-Ko, a benchmark dataset that incorporates aspects and dual-tagged polarities to distinguish between aspect-specific and aspect-agnostic sentiment classification. Our experimental findings highlight the inherent difficulties in accurately predicting dual-polarities and underscore the significance of contextualized sentiment analysis models.
arXiv Detail & Related papers (2024-02-23T01:49:38Z)
The performance of multiple language models in identifying offensive language on social media [6.221851249300585]
The aim of this research is to use a variety of algorithms to test the ability to identify offensive posts. The motivation for this project is to reduce the harm of these languages to human censors by automating the screening of offending posts.
arXiv Detail & Related papers (2023-12-10T18:58:26Z)
A Corpus for Sentence-level Subjectivity Detection on English News Articles [49.49218203204942]
We use our guidelines to collect NewsSD-ENG, a corpus of 638 objective and 411 subjective sentences extracted from English news articles on controversial topics. Our corpus paves the way for subjectivity detection in English without relying on language-specific tools, such as lexicons or machine translation.
arXiv Detail & Related papers (2023-05-29T11:54:50Z)
Under the Morphosyntactic Lens: A Multifaceted Evaluation of Gender Bias in Speech Translation [20.39599469927542]
Gender bias is largely recognized as a problematic phenomenon affecting language technologies. Most of current evaluation practices adopt a word-level focus on a narrow set of occupational nouns under synthetic conditions. Such protocols overlook key features of grammatical gender languages, which are characterized by morphosyntactic chains of gender agreement.
arXiv Detail & Related papers (2022-03-18T11:14:16Z)
A New Generation of Perspective API: Efficient Multilingual Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw. At the heart of the approach is a single multilingual token-free Charformer model. We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z)
Fine-Grained Opinion Summarization with Minimal Supervision [48.43506393052212]
FineSum aims to profile a target by extracting opinions from multiple documents. FineSum automatically identifies opinion phrases from the raw corpus, classifies them into different aspects and sentiments, and constructs multiple fine-grained opinion clusters under each aspect/sentiment. Both automatic evaluation on the benchmark and quantitative human evaluation validate the effectiveness of our approach.
arXiv Detail & Related papers (2021-10-17T15:16:34Z)
Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis. We learn sentiment, aspect> joint topic embeddings in the word embedding space. We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z)
A novel approach to sentiment analysis in Persian using discourse and external semantic information [0.0]
Many approaches have been proposed to extract the sentiment of individuals from documents written in natural languages. The majority of these approaches have focused on English, while resource-lean languages such as Persian suffer from the lack of research work and language resources. Due to this gap in Persian, the current work is accomplished to introduce new methods for sentiment analysis which have been applied on Persian.
arXiv Detail & Related papers (2020-07-18T18:40:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.