ValuesRAG: Enhancing Cultural Alignment Through Retrieval-Augmented Contextual Learning
- URL: http://arxiv.org/abs/2501.01031v2
- Date: Mon, 20 Jan 2025 06:42:48 GMT
- Title: ValuesRAG: Enhancing Cultural Alignment Through Retrieval-Augmented Contextual Learning
- Authors: Wonduk Seo, Zonghao Yuan, Yi Bu,
- Abstract summary: We propose ValuesRAG to integrate cultural and demographic knowledge dynamically during text generation.
ValuesRAG consistently outperforms baseline methods, both in the main experiment and in the ablation study.
Notably, ValuesRAG demonstrates an accuracy of 21% improvement over other baseline methods.
- Score: 1.1343849658875087
- License:
- Abstract: Cultural values alignment in Large Language Models (LLMs) is a critical challenge due to their tendency to embed Western-centric biases from training data, leading to misrepresentations and fairness issues in cross-cultural contexts. Recent approaches, such as role-assignment and few-shot learning, often struggle with reliable cultural alignment as they heavily rely on pre-trained knowledge, lack scalability, and fail to capture nuanced cultural values effectively. To address these issues, we propose ValuesRAG, a novel and effective framework that applies Retrieval-Augmented Generation (RAG) with In-Context Learning (ICL) to integrate cultural and demographic knowledge dynamically during text generation. Leveraging the World Values Survey (WVS) dataset, ValuesRAG first generates summaries of values for each individual. Subsequently, we curate several representative regional datasets to serve as test datasets and retrieve relevant summaries of values based on demographic features, followed by a reranking step to select the top-k relevant summaries. ValuesRAG consistently outperforms baseline methods, both in the main experiment and in the ablation study where only the values summary was provided. Notably, ValuesRAG demonstrates an accuracy of 21% improvement over other baseline methods, highlighting its potential to foster culturally aligned AI systems and enhance the inclusivity of AI-driven applications.
Related papers
- Rethinking AI Cultural Evaluation [1.8434042562191815]
Current evaluation methods predominantly rely on multiple-choice question (MCQ) datasets.
Our findings highlight significant discrepancies between MCQ-based assessments and the values conveyed in unconstrained interactions.
We recommend moving beyond MCQs to adopt more open-ended, context-specific assessments.
arXiv Detail & Related papers (2025-01-13T23:42:37Z) - An Investigation into Value Misalignment in LLM-Generated Texts for Cultural Heritage [5.893281327912503]
Large Language Models (LLMs) are increasingly prevalent in tasks related to cultural heritage.
They are used to generate descriptions of historical monuments, translating ancient texts, preserving oral traditions, and creating educational content.
However, cultural value misalignments may exist in generated texts, such as the misrepresentation of historical facts, the erosion of cultural identity, and the oversimplification of complex cultural narratives.
arXiv Detail & Related papers (2025-01-03T14:35:32Z) - Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation [71.59208664920452]
Cultural biases in multilingual datasets pose significant challenges for their effectiveness as global benchmarks.
We show that progress on MMLU depends heavily on learning Western-centric concepts, with 28% of all questions requiring culturally sensitive knowledge.
We release Global MMLU, an improved MMLU with evaluation coverage across 42 languages.
arXiv Detail & Related papers (2024-12-04T13:27:09Z) - Bottom-Up and Top-Down Analysis of Values, Agendas, and Observations in Corpora and LLMs [1.3119775978504942]
Large language models (LLMs) generate diverse, situated, persuasive texts from a plurality of potential perspectives.
We seek to characterize socio-cultural values that they express, for reasons of safety, accuracy, inclusion, and cultural fidelity.
arXiv Detail & Related papers (2024-11-06T18:51:04Z) - Benchmarking Cognitive Domains for LLMs: Insights from Taiwanese Hakka Culture [4.467334566487944]
This study introduces a benchmark designed to evaluate the performance of large language models (LLMs) in understanding and processing cultural knowledge.
The study develops a multi-dimensional framework that systematically assesses LLMs across six cognitive domains: Remembering, Understanding, Applying, Analyzing, evaluating, and Creating.
The results highlight the effectiveness of RAG in improving accuracy across all cognitive domains, particularly in tasks requiring precise retrieval and application of cultural knowledge.
arXiv Detail & Related papers (2024-09-03T02:50:04Z) - Extrinsic Evaluation of Cultural Competence in Large Language Models [53.626808086522985]
We focus on extrinsic evaluation of cultural competence in two text generation tasks.
We evaluate model outputs when an explicit cue of culture, specifically nationality, is perturbed in the prompts.
We find weak correlations between text similarity of outputs for different countries and the cultural values of these countries.
arXiv Detail & Related papers (2024-06-17T14:03:27Z) - No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models [38.932610459192105]
We study cultural and socioeconomic diversity in contrastive vision-language models (VLMs)
Our work underscores the value of using diverse data to create more inclusive multimodal systems.
arXiv Detail & Related papers (2024-05-22T16:04:22Z) - CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge [69.82940934994333]
We introduce CulturalTeaming, an interactive red-teaming system that leverages human-AI collaboration to build challenging evaluation dataset.
Our study reveals that CulturalTeaming's various modes of AI assistance support annotators in creating cultural questions.
CULTURALBENCH-V0.1 is a compact yet high-quality evaluation dataset with users' red-teaming attempts.
arXiv Detail & Related papers (2024-04-10T00:25:09Z) - Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking [48.21982147529661]
This paper introduces a novel approach for massively multicultural knowledge acquisition.
Our method strategically navigates from densely informative Wikipedia documents on cultural topics to an extensive network of linked pages.
Our work marks an important step towards deeper understanding and bridging the gaps of cultural disparities in AI.
arXiv Detail & Related papers (2024-02-14T18:16:54Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Multi-Dimensional Evaluation of Text Summarization with In-Context
Learning [79.02280189976562]
In this paper, we study the efficacy of large language models as multi-dimensional evaluators using in-context learning.
Our experiments show that in-context learning-based evaluators are competitive with learned evaluation frameworks for the task of text summarization.
We then analyze the effects of factors such as the selection and number of in-context examples on performance.
arXiv Detail & Related papers (2023-06-01T23:27:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.