Are Lexicon-Based Tools Still the Gold Standard for Valence Analysis in Low-Resource Flemish?
- URL: http://arxiv.org/abs/2506.04139v1
- Date: Wed, 04 Jun 2025 16:31:37 GMT
- Title: Are Lexicon-Based Tools Still the Gold Standard for Valence Analysis in Low-Resource Flemish?
- Authors: Ratna Kandala, Katie Hoemann,
- Abstract summary: Traditional lexicon-based tools such as LIWC and Pattern have long served as foundational instruments in this domain.<n>We first conducted a study involving approximately 25,000 textual responses from 102 Dutch-speaking participants.<n>We assessed the performance of three Dutch-specific LLMs in predicting these valence scores, and compared their outputs to those generated by LIWC and Pattern.<n>This study underscores the imperative for developing culturally and linguistically tailored models/tools that can adeptly handle the complexities of natural language use.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Understanding the nuances in everyday language is pivotal for advancements in computational linguistics & emotions research. Traditional lexicon-based tools such as LIWC and Pattern have long served as foundational instruments in this domain. LIWC is the most extensively validated word count based text analysis tool in the social sciences and Pattern is an open source Python library offering functionalities for NLP. However, everyday language is inherently spontaneous, richly expressive, & deeply context dependent. To explore the capabilities of LLMs in capturing the valences of daily narratives in Flemish, we first conducted a study involving approximately 25,000 textual responses from 102 Dutch-speaking participants. Each participant provided narratives prompted by the question, "What is happening right now and how do you feel about it?", accompanied by self-assessed valence ratings on a continuous scale from -50 to +50. We then assessed the performance of three Dutch-specific LLMs in predicting these valence scores, and compared their outputs to those generated by LIWC and Pattern. Our findings indicate that, despite advancements in LLM architectures, these Dutch tuned models currently fall short in accurately capturing the emotional valence present in spontaneous, real-world narratives. This study underscores the imperative for developing culturally and linguistically tailored models/tools that can adeptly handle the complexities of natural language use. Enhancing automated valence analysis is not only pivotal for advancing computational methodologies but also holds significant promise for psychological research with ecologically valid insights into human daily experiences. We advocate for increased efforts in creating comprehensive datasets & finetuning LLMs for low-resource languages like Flemish, aiming to bridge the gap between computational linguistics & emotion research.
Related papers
- Comparing LLM Text Annotation Skills: A Study on Human Rights Violations in Social Media Data [2.812898346527047]
This study investigates the capabilities of large language models (LLMs) for zero-shot and few-shot annotation of social media posts in Russian and Ukrainian.<n>To evaluate the effectiveness of these models, their annotations are compared against a gold standard set of human double-annotated labels.<n>The study explores the unique patterns of errors and disagreements exhibited by each model, offering insights into their strengths, limitations, and cross-linguistic adaptability.
arXiv Detail & Related papers (2025-05-15T13:10:47Z) - One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks [68.33068005789116]
We present the first study aimed at objectively assessing the fairness and robustness of Large Language Models (LLMs) in handling dialects in canonical reasoning tasks.<n>We hire AAVE speakers, including experts with computer science backgrounds, to rewrite seven popular benchmarks, such as HumanEval and GSM8K.<n>Our findings reveal that textbfalmost all of these widely used models show significant brittleness and unfairness to queries in AAVE.
arXiv Detail & Related papers (2024-10-14T18:44:23Z) - Unraveling the Dominance of Large Language Models Over Transformer Models for Bangla Natural Language Inference: A Comprehensive Study [0.0]
Natural Language Inference (NLI) is a cornerstone of Natural Language Processing (NLP)
This study addresses the underexplored area of evaluating Large Language Models (LLMs) in low-resourced languages like Bengali.
arXiv Detail & Related papers (2024-05-05T13:57:05Z) - Holmes: A Benchmark to Assess the Linguistic Competence of Language Models [59.627729608055006]
We introduce Holmes, a new benchmark designed to assess language models (LMs) linguistic competence.
We use computation-based probing to examine LMs' internal representations regarding distinct linguistic phenomena.
As a result, we meet recent calls to disentangle LMs' linguistic competence from other cognitive abilities.
arXiv Detail & Related papers (2024-04-29T17:58:36Z) - PhonologyBench: Evaluating Phonological Skills of Large Language Models [57.80997670335227]
Phonology, the study of speech's structure and pronunciation rules, is a critical yet often overlooked component in Large Language Model (LLM) research.
We present PhonologyBench, a novel benchmark consisting of three diagnostic tasks designed to explicitly test the phonological skills of LLMs.
We observe a significant gap of 17% and 45% on Rhyme Word Generation and Syllable counting, respectively, when compared to humans.
arXiv Detail & Related papers (2024-04-03T04:53:14Z) - Beware of Words: Evaluating the Lexical Diversity of Conversational LLMs using ChatGPT as Case Study [3.0059120458540383]
We consider the evaluation of the lexical richness of the text generated by conversational Large Language Models (LLMs) and how it depends on the model parameters.
The results show how lexical richness depends on the version of ChatGPT and some of its parameters, such as the presence penalty, or on the role assigned to the model.
arXiv Detail & Related papers (2024-02-11T13:41:17Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.<n>This survey delves into an important attribute of these datasets: the dialect of a language.<n>Motivated by the performance degradation of NLP models for dialectal datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - Spoken Language Intelligence of Large Language Models for Language Learning [3.1964044595140217]
We focus on evaluating the efficacy of large language models (LLMs) in the realm of education.<n>We introduce a new multiple-choice question dataset to evaluate the effectiveness of LLMs in the aforementioned scenarios.<n>We also investigate the influence of various prompting techniques such as zero- and few-shot method.<n>We find that models of different sizes have good understanding of concepts in phonetics, phonology, and second language acquisition, but show limitations in reasoning for real-world problems.
arXiv Detail & Related papers (2023-08-28T12:47:41Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.