An Investigation into Value Misalignment in LLM-Generated Texts for Cultural Heritage
- URL: http://arxiv.org/abs/2501.02039v1
- Date: Fri, 03 Jan 2025 14:35:32 GMT
- Title: An Investigation into Value Misalignment in LLM-Generated Texts for Cultural Heritage
- Authors: Fan Bu, Zheng Wang, Siyi Wang, Ziyao Liu,
- Abstract summary: Large Language Models (LLMs) are increasingly prevalent in tasks related to cultural heritage.
They are used to generate descriptions of historical monuments, translating ancient texts, preserving oral traditions, and creating educational content.
However, cultural value misalignments may exist in generated texts, such as the misrepresentation of historical facts, the erosion of cultural identity, and the oversimplification of complex cultural narratives.
- Score: 5.893281327912503
- License:
- Abstract: As Large Language Models (LLMs) become increasingly prevalent in tasks related to cultural heritage, such as generating descriptions of historical monuments, translating ancient texts, preserving oral traditions, and creating educational content, their ability to produce accurate and culturally aligned texts is being increasingly relied upon by users and researchers. However, cultural value misalignments may exist in generated texts, such as the misrepresentation of historical facts, the erosion of cultural identity, and the oversimplification of complex cultural narratives, which may lead to severe consequences. Therefore, investigating value misalignment in the context of LLM for cultural heritage is crucial for mitigating these risks, yet there has been a significant lack of systematic and comprehensive study and investigation in this area. To fill this gap, we systematically assess the reliability of LLMs in generating culturally aligned texts for cultural heritage-related tasks. We conduct a comprehensive evaluation by compiling an extensive set of 1066 query tasks covering 5 widely recognized categories with 17 aspects within the knowledge framework of cultural heritage across 5 open-source LLMs, and examine both the type and rate of cultural value misalignments in the generated texts. Using both automated and manual approaches, we effectively detect and analyze the cultural value misalignments in LLM-generated texts. Our findings are concerning: over 65% of the generated texts exhibit notable cultural misalignments, with certain tasks demonstrating almost complete misalignment with key cultural values. Beyond these findings, this paper introduces a benchmark dataset and a comprehensive evaluation workflow that can serve as a valuable resource for future research aimed at enhancing the cultural sensitivity and reliability of LLMs.
Related papers
- CultureVLM: Characterizing and Improving Cultural Understanding of Vision-Language Models for over 100 Countries [63.00147630084146]
Vision-language models (VLMs) have advanced human-AI interaction but struggle with cultural understanding.
CultureVerse is a large-scale multimodal benchmark covering 19, 682 cultural concepts, 188 countries/regions, 15 cultural concepts, and 3 question types.
We propose CultureVLM, a series of VLMs fine-tuned on our dataset to achieve significant performance improvement in cultural understanding.
arXiv Detail & Related papers (2025-01-02T14:42:37Z) - Survey of Cultural Awareness in Language Models: Text and Beyond [39.77033652289063]
Large-scale deployment of large language models (LLMs) in various applications requires LLMs to be culturally sensitive to the user to ensure inclusivity.
Culture has been widely studied in psychology and anthropology, and there has been a recent surge in research on making LLMs more culturally inclusive.
arXiv Detail & Related papers (2024-10-30T16:37:50Z) - Navigating the Cultural Kaleidoscope: A Hitchhiker's Guide to Sensitivity in Large Language Models [4.771099208181585]
LLMs are increasingly deployed in global applications, ensuring users from diverse backgrounds feel respected and understood.
Cultural harm can arise when these models fail to align with specific cultural norms, resulting in misrepresentations or violations of cultural values.
We present two key contributions: A cultural harm test dataset, created to assess model outputs across different cultural contexts through scenarios that expose potential cultural insensitivities, and a culturally aligned preference dataset, aimed at restoring cultural sensitivity through fine-tuning based on feedback from diverse annotators.
arXiv Detail & Related papers (2024-10-15T18:13:10Z) - Self-Alignment: Improving Alignment of Cultural Values in LLMs via In-Context Learning [13.034603322224548]
We present a simple and inexpensive method that uses a combination of in-context learning (ICL) and human survey data.
We show that our method could prove useful in test languages other than English and can improve alignment to the cultural values that correspond to a range of culturally diverse countries.
arXiv Detail & Related papers (2024-08-29T12:18:04Z) - Extrinsic Evaluation of Cultural Competence in Large Language Models [53.626808086522985]
We focus on extrinsic evaluation of cultural competence in two text generation tasks.
We evaluate model outputs when an explicit cue of culture, specifically nationality, is perturbed in the prompts.
We find weak correlations between text similarity of outputs for different countries and the cultural values of these countries.
arXiv Detail & Related papers (2024-06-17T14:03:27Z) - CulturePark: Boosting Cross-cultural Understanding in Large Language Models [63.452948673344395]
This paper introduces CulturePark, an LLM-powered multi-agent communication framework for cultural data collection.
It generates high-quality cross-cultural dialogues encapsulating human beliefs, norms, and customs.
We evaluate these models across three downstream tasks: content moderation, cultural alignment, and cultural education.
arXiv Detail & Related papers (2024-05-24T01:49:02Z) - Understanding the Capabilities and Limitations of Large Language Models for Cultural Commonsense [98.09670425244462]
Large language models (LLMs) have demonstrated substantial commonsense understanding.
This paper examines the capabilities and limitations of several state-of-the-art LLMs in the context of cultural commonsense tasks.
arXiv Detail & Related papers (2024-05-07T20:28:34Z) - CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge [69.82940934994333]
We introduce CulturalTeaming, an interactive red-teaming system that leverages human-AI collaboration to build challenging evaluation dataset.
Our study reveals that CulturalTeaming's various modes of AI assistance support annotators in creating cultural questions.
CULTURALBENCH-V0.1 is a compact yet high-quality evaluation dataset with users' red-teaming attempts.
arXiv Detail & Related papers (2024-04-10T00:25:09Z) - Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking [48.21982147529661]
This paper introduces a novel approach for massively multicultural knowledge acquisition.
Our method strategically navigates from densely informative Wikipedia documents on cultural topics to an extensive network of linked pages.
Our work marks an important step towards deeper understanding and bridging the gaps of cultural disparities in AI.
arXiv Detail & Related papers (2024-02-14T18:16:54Z) - CDEval: A Benchmark for Measuring the Cultural Dimensions of Large Language Models [41.885600036131045]
CDEval is a benchmark aimed at evaluating the cultural dimensions of Large Language Models.
It is constructed by incorporating both GPT-4's automated generation and human verification, covering six cultural dimensions across seven domains.
arXiv Detail & Related papers (2023-11-28T02:01:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.