Evaluating Zero-Shot Multilingual Aspect-Based Sentiment Analysis with Large Language Models
- URL: http://arxiv.org/abs/2412.12564v2
- Date: Tue, 24 Dec 2024 16:41:40 GMT
- Title: Evaluating Zero-Shot Multilingual Aspect-Based Sentiment Analysis with Large Language Models
- Authors: Chengyan Wu, Bolei Ma, Zheyu Zhang, Ningyuan Deng, Yanqing He, Yun Xue,
- Abstract summary: We evaluate large language models (LLMs) under zero-shot conditions to explore their potential to tackle ABSA task.
We investigate various prompting strategies, including vanilla zero-shot, chain-of-thought (CoT), self-improvement, self-debate, and self-consistency.
Results indicate that while LLMs show promise in handling multilingual ABSA, they generally fall short of fine-tuned, task-specific models.
- Score: 0.9832963381777073
- License:
- Abstract: Aspect-based sentiment analysis (ABSA), a sequence labeling task, has attracted increasing attention in multilingual contexts. While previous research has focused largely on fine-tuning or training models specifically for ABSA, we evaluate large language models (LLMs) under zero-shot conditions to explore their potential to tackle this challenge with minimal task-specific adaptation. We conduct a comprehensive empirical evaluation of a series of LLMs on multilingual ABSA tasks, investigating various prompting strategies, including vanilla zero-shot, chain-of-thought (CoT), self-improvement, self-debate, and self-consistency, across nine different models. Results indicate that while LLMs show promise in handling multilingual ABSA, they generally fall short of fine-tuned, task-specific models. Notably, simpler zero-shot prompts often outperform more complex strategies, especially in high-resource languages like English. These findings underscore the need for further refinement of LLM-based approaches to effectively address ABSA task across diverse languages.
Related papers
- Demystifying Multilingual Chain-of-Thought in Process Reward Modeling [71.12193680015622]
We tackle the challenge of extending process reward models (PRMs) to multilingual settings.
We train multilingual PRMs on a dataset spanning seven languages, which is translated from English.
Our results highlight the sensitivity of multilingual PRMs to both the number of training languages and the volume of English data.
arXiv Detail & Related papers (2025-02-18T09:11:44Z) - Evalita-LLM: Benchmarking Large Language Models on Italian [3.3334839725239798]
Evalita-LLM is a benchmark designed to evaluate Large Language Models (LLMs) on Italian tasks.
All tasks are native Italian, avoiding issues of translating from Italian and potential cultural biases.
The benchmark includes generative tasks, enabling more natural interaction with LLMs.
arXiv Detail & Related papers (2025-02-04T12:58:19Z) - Exploring Robustness of LLMs to Sociodemographically-Conditioned Paraphrasing [7.312170216336085]
We take a broader approach to explore a wider range of variations across sociodemographic dimensions.
We extend the SocialIQA dataset to create diverse paraphrased sets conditioned on sociodemographic styles.
We find that demographic-specific paraphrasing significantly impacts the performance of language models.
arXiv Detail & Related papers (2025-01-14T17:50:06Z) - Align, Generate, Learn: A Novel Closed-Loop Framework for Cross-Lingual In-Context Learning [0.0]
Cross-lingual in-context learning (XICL) has emerged as a transformative paradigm for leveraging large language models (LLMs) to tackle multilingual tasks.
We propose a novel self-supervised framework that harnesses the generative capabilities of LLMs to internally select and utilize task-relevant examples.
arXiv Detail & Related papers (2024-12-12T05:36:51Z) - Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language.
Currently, instruction-tuned large language models (LLMs) excel at various English tasks.
Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z) - Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs [36.30321941154582]
Hercule is a cross-lingual evaluation model that learns to assign scores to responses based on easily available reference answers in English.
This study is the first comprehensive examination of cross-lingual evaluation using LLMs, presenting a scalable and effective approach for multilingual assessment.
arXiv Detail & Related papers (2024-10-17T09:45:32Z) - LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments [70.91258869156353]
We introduce LangSuitE, a versatile and simulation-free testbed featuring 6 representative embodied tasks in textual embodied worlds.
Compared with previous LLM-based testbeds, LangSuitE offers adaptability to diverse environments without multiple simulation engines.
We devise a novel chain-of-thought (CoT) schema, EmMem, which summarizes embodied states w.r.t. history information.
arXiv Detail & Related papers (2024-06-24T03:36:29Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - Analyzing and Adapting Large Language Models for Few-Shot Multilingual
NLU: Are We There Yet? [82.02076369811402]
Supervised fine-tuning (SFT), supervised instruction tuning (SIT) and in-context learning (ICL) are three alternative, de facto standard approaches to few-shot learning.
We present an extensive and systematic comparison of the three approaches, testing them on 6 high- and low-resource languages, three different NLU tasks, and a myriad of language and domain setups.
Our observations show that supervised instruction tuning has the best trade-off between performance and resource requirements.
arXiv Detail & Related papers (2024-03-04T10:48:13Z) - Multilingual Large Language Models Are Not (Yet) Code-Switchers [41.47534626749588]
Large Language Models (LLMs) have recently shown great capabilities in a wide range of tasks.
The practice of alternating languages within an utterance remains relatively uncharted.
We argue that current "multilingualism" in LLMs does not inherently imply proficiency with code-switching texts.
arXiv Detail & Related papers (2023-05-23T16:50:48Z) - Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of
Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process.
We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks.
Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.