Beyond English: Evaluating Automated Measurement of Moral Foundations in Non-English Discourse with a Chinese Case Study
- URL: http://arxiv.org/abs/2502.02451v2
- Date: Wed, 05 Feb 2025 09:17:17 GMT
- Title: Beyond English: Evaluating Automated Measurement of Moral Foundations in Non-English Discourse with a Chinese Case Study
- Authors: Calvin Yixiang Cheng, Scott A Hale,
- Abstract summary: This study explores computational approaches for measuring moral foundations (MFs) in non-English corpora.
Using Chinese as a case study, this paper evaluates the effectiveness of applying English resources to machine translated text, local language lexicons, multilingual language models, and large language models (LLMs) in measuring MFs in non-English texts.
- Score: 8.068626035121875
- License:
- Abstract: This study explores computational approaches for measuring moral foundations (MFs) in non-English corpora. Since most resources are developed primarily for English, cross-linguistic applications of moral foundation theory remain limited. Using Chinese as a case study, this paper evaluates the effectiveness of applying English resources to machine translated text, local language lexicons, multilingual language models, and large language models (LLMs) in measuring MFs in non-English texts. The results indicate that machine translation and local lexicon approaches are insufficient for complex moral assessments, frequently resulting in a substantial loss of cultural information. In contrast, multilingual models and LLMs demonstrate reliable cross-language performance with transfer learning, with LLMs excelling in terms of data efficiency. Importantly, this study also underscores the need for human-in-the-loop validation of automated MF assessment, as the most advanced models may overlook cultural nuances in cross-language measurements. The findings highlight the potential of LLMs for cross-language MF measurements and other complex multilingual deductive coding tasks.
Related papers
- Exploring Large Language Models for Translating Romanian Computational Problems into English [0.0]
This study shows that robust large language models (LLMs) can maintain or even enhance their performance in translating less common languages when given well-structured prompts.
We evaluate several translation methods across multiple LLMs, including OpenRoLLM, Llama 3.1 8B, Llama 3.2 3B and GPT-4o.
arXiv Detail & Related papers (2025-01-09T22:17:44Z) - Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning [15.919493497867567]
This study aims to evaluate the performance of Multimodal Large Language Models (MLLMs) on the VALSE benchmark.
We conducted a comprehensive assessment of state-of-the-art MLLMs, varying in model size and pretraining datasets.
arXiv Detail & Related papers (2024-07-17T11:26:47Z) - Disce aut Deficere: Evaluating LLMs Proficiency on the INVALSI Italian Benchmark [12.729687989535359]
evaluating Large Language Models (LLMs) in languages other than English is crucial for ensuring their linguistic versatility, cultural relevance, and applicability in diverse global contexts.
We tackle this challenge by introducing a structured benchmark using the INVALSI tests, a set of well-established assessments designed to measure educational competencies across Italy.
arXiv Detail & Related papers (2024-06-25T13:20:08Z) - Language Ranker: A Metric for Quantifying LLM Performance Across High and Low-Resource Languages [48.40607157158246]
Large Language Models (LLMs) perform better on high-resource languages like English, German, and French, while their capabilities in low-resource languages remain inadequate.
We propose the Language Ranker, an intrinsic metric designed to benchmark and rank languages based on LLM performance using internal representations.
Our analysis reveals that high-resource languages exhibit higher similarity scores with English, demonstrating superior performance, while low-resource languages show lower similarity scores.
arXiv Detail & Related papers (2024-04-17T16:53:16Z) - Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine Translation [64.5862977630713]
This study investigates how Large Language Models (LLMs) leverage source and reference data in machine translation evaluation task.
We find that reference information significantly enhances the evaluation accuracy, while surprisingly, source information sometimes is counterproductive.
arXiv Detail & Related papers (2024-01-12T13:23:21Z) - LLaMA Beyond English: An Empirical Study on Language Capability Transfer [49.298360366468934]
We focus on how to effectively transfer the capabilities of language generation and following instructions to a non-English language.
We analyze the impact of key factors such as vocabulary extension, further pretraining, and instruction tuning on transfer.
We employ four widely used standardized testing benchmarks: C-Eval, MMLU, AGI-Eval, and GAOKAO-Bench.
arXiv Detail & Related papers (2024-01-02T06:29:02Z) - Zero-Shot Cross-Lingual Reranking with Large Language Models for
Low-Resource Languages [51.301942056881146]
We investigate how large language models (LLMs) function as rerankers in cross-lingual information retrieval systems for African languages.
Our implementation covers English and four African languages (Hausa, Somali, Swahili, and Yoruba)
We examine cross-lingual reranking with queries in English and passages in the African languages.
arXiv Detail & Related papers (2023-12-26T18:38:54Z) - Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts [75.33019401706188]
Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars.
We propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English.
Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages.
arXiv Detail & Related papers (2023-06-20T08:27:47Z) - CMMLU: Measuring massive multitask language understanding in Chinese [133.70911295934746]
This paper introduces a comprehensive Chinese benchmark that covers various subjects, including natural science, social sciences, engineering, and humanities.
CMMLU fills the gap in evaluating the knowledge and reasoning capabilities of large language models within the Chinese context.
arXiv Detail & Related papers (2023-06-15T15:49:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.