A Case Study and Qualitative Analysis of Simple Cross-Lingual Opinion
Mining
- URL: http://arxiv.org/abs/2111.02259v3
- Date: Mon, 24 Jul 2023 20:03:14 GMT
- Title: A Case Study and Qualitative Analysis of Simple Cross-Lingual Opinion
Mining
- Authors: Gerhard Johann Hagerer, Wing Sheung Leung, Qiaoxi Liu, Hannah Danner,
Georg Groh
- Abstract summary: We propose a method for building a single topic model with sentiment analysis capable of covering multiple languages simultanteously.
We apply the model to newspaper articles and user comments of a specific domain, i.e., organic food products.
We obtain a high proportion of stable and domain-relevant topics, a meaningful relation between topics and their respective contents, and an interpretable representation for social media documents.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: User-generated content from social media is produced in many languages,
making it technically challenging to compare the discussed themes from one
domain across different cultures and regions. It is relevant for domains in a
globalized world, such as market research, where people from two nations and
markets might have different requirements for a product. We propose a simple,
modern, and effective method for building a single topic model with sentiment
analysis capable of covering multiple languages simultanteously, based on a
pre-trained state-of-the-art deep neural network for natural language
understanding. To demonstrate its feasibility, we apply the model to newspaper
articles and user comments of a specific domain, i.e., organic food products
and related consumption behavior. The themes match across languages.
Additionally, we obtain an high proportion of stable and domain-relevant
topics, a meaningful relation between topics and their respective textual
contents, and an interpretable representation for social media documents.
Marketing can potentially benefit from our method, since it provides an
easy-to-use means of addressing specific customer interests from different
market regions around the globe. For reproducibility, we provide the code,
data, and results of our study.
Related papers
- Multilingual Topic Classification in X: Dataset and Analysis [19.725017254962918]
We introduce X-Topic, a multilingual dataset featuring content in four distinct languages (English, Spanish, Japanese, and Greek)
Our dataset includes a wide range of topics, tailored for social media content, making it a valuable resource for scientists and professionals working on cross-linguistic analysis.
arXiv Detail & Related papers (2024-10-04T01:37:26Z) - Combining Objective and Subjective Perspectives for Political News Understanding [5.741243797283764]
We introduce a text analysis framework which integrates both perspectives and provides a fine-grained processing of subjective aspects.
We illustrate its functioning with insights on news outlets, political orientations, topics, individual entities, and demographic segments.
arXiv Detail & Related papers (2024-08-20T20:13:19Z) - Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A Survey [66.166184609616]
ChatGPT has opened up immense potential for applying large language models (LLMs) to text-centric multimodal tasks.
It is still unclear how existing LLMs can adapt better to text-centric multimodal sentiment analysis tasks.
arXiv Detail & Related papers (2024-06-12T10:36:27Z) - Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - A Comprehensive Review on Sentiment Analysis: Tasks, Approaches and
Applications [0.2717221198324361]
Sentiment analysis (SA) is an emerging field in text mining.
It is the process of computationally identifying and categorizing opinions expressed in a piece of text over different social media platforms.
arXiv Detail & Related papers (2023-11-19T06:29:41Z) - Adapting Large Language Models to Domains via Reading Comprehension [86.24451681746676]
We explore how continued pre-training on domain-specific corpora influences large language models.
We show that training on the raw corpora endows the model with domain knowledge, but drastically hurts its ability for question answering.
We propose a simple method for transforming raw corpora into reading comprehension texts.
arXiv Detail & Related papers (2023-09-18T07:17:52Z) - Cross-domain Sentiment Classification in Spanish [18.563342761346608]
We study the ability of a classification system trained with a large database of product reviews to generalize to different Spanish domains.
Results suggest that generalization across domains is feasible though very challenging when trained with these product reviews.
arXiv Detail & Related papers (2023-03-15T23:11:30Z) - An Inclusive Notion of Text [69.36678873492373]
We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP.
We introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling.
arXiv Detail & Related papers (2022-11-10T14:26:43Z) - A New Generation of Perspective API: Efficient Multilingual
Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw.
At the heart of the approach is a single multilingual token-free Charformer model.
We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z) - An NLP approach to quantify dynamic salience of predefined topics in a
text corpus [0.0]
We use natural language processing techniques to quantify how a set of pre-defined topics of interest change over time across a large corpus of text.
We find that given a predefined topic, we can identify and rank sets of terms, or n-grams, that map to those topics and have usage patterns that deviate from a normal baseline.
arXiv Detail & Related papers (2021-08-16T21:00:06Z) - FDMT: A Benchmark Dataset for Fine-grained Domain Adaptation in Machine
Translation [53.87731008029645]
We present a real-world fine-grained domain adaptation task in machine translation (FDMT)
The FDMT dataset consists of four sub-domains of information technology: autonomous vehicles, AI education, real-time networks and smart phone.
We make quantitative experiments and deep analyses in this new setting, which benchmarks the fine-grained domain adaptation task.
arXiv Detail & Related papers (2020-12-31T17:15:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.