Related papers: Human-LLM Collaborative Construction of a Cantonese Emotion Lexicon

Human-LLM Collaborative Construction of a Cantonese Emotion Lexicon

URL: http://arxiv.org/abs/2410.11526v1
Date: Tue, 15 Oct 2024 11:57:34 GMT
Title: Human-LLM Collaborative Construction of a Cantonese Emotion Lexicon
Authors: Yusong Zhang, Dong Dong, Chi-tim Hung, Leonard Heyerdahl, Tamara Giles-Vernick, Eng-kiong Yeoh,
Abstract summary: This study proposes to develop an emotion lexicon for Cantonese, a low-resource language. By integrating emotion labels provided by Large Language Models (LLMs) and human annotators, the study leveraged existing linguistic resources. The consistency of the proposed emotion lexicon in emotion extraction was assessed through modification and utilization of three distinct emotion text datasets.
Score: 1.3074442742310615
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in language understanding and generation. Advanced utilization of the knowledge embedded in LLMs for automated annotation has consistently been explored. This study proposed to develop an emotion lexicon for Cantonese, a low-resource language, through collaborative efforts between LLM and human annotators. By integrating emotion labels provided by LLM and human annotators, the study leveraged existing linguistic resources including lexicons in other languages and local forums to construct a Cantonese emotion lexicon enriched with colloquial expressions. The consistency of the proposed emotion lexicon in emotion extraction was assessed through modification and utilization of three distinct emotion text datasets. This study not only validates the efficacy of the constructed lexicon but also emphasizes that collaborative annotation between human and artificial intelligence can significantly enhance the quality of emotion labels, highlighting the potential of such partnerships in facilitating natural language processing tasks for low-resource languages.

Related papers

BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages [93.92804151830744]
We present BRIGHTER -- a collection of multi-labeled datasets in 28 different languages. We describe the data collection and annotation processes and the challenges of building these datasets. We show that BRIGHTER datasets are a step towards bridging the gap in text-based emotion recognition.
arXiv Detail & Related papers (2025-02-17T15:39:50Z)
Akan Cinematic Emotions (ACE): A Multimodal Multi-party Dataset for Emotion Recognition in Movie Dialogues [4.894647740789939]
The Akan Conversation Emotion dataset is the first multimodal emotion dialogue dataset for an African language. It contains 385 emotion-labeled dialogues and 6,162 utterances across audio, visual, and textual modalities. The presence of prosodic labels in this dataset also makes it the first prosodically annotated African language dataset.
arXiv Detail & Related papers (2025-02-16T03:24:33Z)
Prompt and circumstance: A word-by-word LLM prompting approach to interlinear glossing for low-resource languages [6.4977738682502295]
We investigate the effectiveness of a retrieval-based LLM prompting approach to glossing, applied to the seven languages from the SIGMORPHON 2023 shared task. Our system beats the BERT-based shared task baseline for every language in the morpheme-level score category. In a case study on Tsez, we ask the LLM to automatically create and follow linguistic instructions, reducing errors on a confusing grammatical feature.
arXiv Detail & Related papers (2025-02-13T21:23:16Z)
Investigating large language models for their competence in extracting grammatically sound sentences from transcribed noisy utterances [1.3597551064547497]
Humans exhibit remarkable cognitive abilities to separate semantically significant content from speech-specific noise. We investigate whether large language models (LLMs) can effectively perform analogical speech comprehension tasks.
arXiv Detail & Related papers (2024-10-07T14:55:20Z)
AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models [18.482881562645264]
This study is the first to explore the potential of Large Language Models (LLMs) in recognizing ambiguous emotions. We design zero-shot and few-shot prompting and incorporate past dialogue as context information for ambiguous emotion recognition.
arXiv Detail & Related papers (2024-09-26T23:25:21Z)
Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models [117.20416338476856]
Large language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora. We propose a novel detection method, language activation probability entropy (LAPE), to identify language-specific neurons within LLMs. Our findings indicate that LLMs' proficiency in processing a particular language is predominantly due to a small subset of neurons.
arXiv Detail & Related papers (2024-02-26T09:36:05Z)
Sociolinguistically Informed Interpretability: A Case Study on Hinglish Emotion Classification [8.010713141364752]
We study the effect of language on emotion prediction across 3 PLMs on a Hinglish emotion classification dataset. We find that models do learn these associations between language choice and emotional expression. Having code-mixed data present in the pre-training can augment that learning when task-specific data is scarce.
arXiv Detail & Related papers (2024-02-05T16:05:32Z)
BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models [56.93604813379634]
Self-supervised techniques for learning speech representations have been shown to develop linguistic competence from exposure to speech without the need for human labels. We propose a language-acquisition-friendly benchmark to probe spoken language models at the lexical and syntactic levels. We highlight two exciting challenges that need to be addressed for further progress: bridging the gap between text and speech and between clean speech and in-the-wild speech.
arXiv Detail & Related papers (2023-06-02T12:54:38Z)
Multilingual Word Sense Disambiguation with Unified Sense Representation [55.3061179361177]
We propose building knowledge and supervised-based Multilingual Word Sense Disambiguation (MWSD) systems. We build unified sense representations for multiple languages and address the annotation scarcity problem for MWSD by transferring annotations from rich-sourced languages to poorer ones. Evaluations of SemEval-13 and SemEval-15 datasets demonstrate the effectiveness of our methodology.
arXiv Detail & Related papers (2022-10-14T01:24:03Z)
Exposing Cross-Lingual Lexical Knowledge from Multilingual Sentence Encoders [85.80950708769923]
We probe multilingual language models for the amount of cross-lingual lexical knowledge stored in their parameters, and compare them against the original multilingual LMs. We also devise a novel method to expose this knowledge by additionally fine-tuning multilingual models. We report substantial gains on standard benchmarks.
arXiv Detail & Related papers (2022-04-30T13:23:16Z)
Cross-Lingual Ability of Multilingual Masked Language Models: A Study of Language Structure [54.01613740115601]
We study three language properties: constituent order, composition and word co-occurrence. Our main conclusion is that the contribution of constituent order and word co-occurrence is limited, while the composition is more crucial to the success of cross-linguistic transfer.
arXiv Detail & Related papers (2022-03-16T07:09:35Z)
AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context. It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts. Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z)
Learning and Evaluating Emotion Lexicons for 91 Languages [10.06987680744477]
We introduce a methodology for creating almost arbitrarily large emotion lexicons for any target language. We generate representationally rich high-coverage lexicons comprising eight emotional variables with more than 100k lexical entries each. Our approach produces results in line with state-of-the-art monolingual approaches to lexicon creation and even surpasses human reliability for some languages and variables.
arXiv Detail & Related papers (2020-05-12T10:32:03Z)
Cross Lingual Cross Corpus Speech Emotion Recognition [0.0]
This paper presents results for speech emotion recognition for 4 languages in both single corpus and cross corpus setting. Since multi-task learning (MTL) with gender, naturalness and arousal has shown to enhance the generalisation capabilities of the emotion models, this paper introduces language ID as another auxiliary task in MTL framework.
arXiv Detail & Related papers (2020-03-18T00:23:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.