Related papers: Assessing Socio-Cultural Alignment and Technical Safety of Sovereign LLMs

Assessing Socio-Cultural Alignment and Technical Safety of Sovereign LLMs

URL: http://arxiv.org/abs/2510.14565v1
Date: Thu, 16 Oct 2025 11:17:44 GMT
Title: Assessing Socio-Cultural Alignment and Technical Safety of Sovereign LLMs
Authors: Kyubyung Chae, Gihoon Kim, Gyuseong Lee, Taesup Kim, Jaejin Lee, Heejin Kim,
Abstract summary: Global debate over sovereign LLMs highlights the need for governments to develop their LLMs tailored to their unique socio-cultural and historical contexts.<n>We introduce an analytic framework for extracting and evaluating the socio-cultural elements of sovereign LLMs.<n>We show that while sovereign LLMs play a meaningful role in supporting low-resource languages, they do not always meet the popular claim that these models serve their target users well.
Score: 12.162590322796435
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent trends in LLMs development clearly show growing interest in the use and application of sovereign LLMs. The global debate over sovereign LLMs highlights the need for governments to develop their LLMs, tailored to their unique socio-cultural and historical contexts. However, there remains a shortage of frameworks and datasets to verify two critical questions: (1) how well these models align with users' socio-cultural backgrounds, and (2) whether they maintain safety and technical robustness without exposing users to potential harms and risks. To address this gap, we construct a new dataset and introduce an analytic framework for extracting and evaluating the socio-cultural elements of sovereign LLMs, alongside assessments of their technical robustness. Our experimental results demonstrate that while sovereign LLMs play a meaningful role in supporting low-resource languages, they do not always meet the popular claim that these models serve their target users well. We also show that pursuing this untested claim may lead to underestimating critical quality attributes such as safety. Our study suggests that advancing sovereign LLMs requires a more extensive evaluation that incorporates a broader range of well-grounded and practical criteria.

Related papers

A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy [31.839815402460918]
Large language models (LLMs) present significant potential for supporting numerous real-world applications.<n>They still face significant challenges in terms of the inherent risk of privacy leakage, hallucinated outputs, and value misalignment.
arXiv Detail & Related papers (2025-01-16T09:59:45Z)
CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models [60.59638232596912]
We introduce CLAMBER, a benchmark for evaluating large language models (LLMs) Building upon the taxonomy, we construct 12K high-quality data to assess the strengths, weaknesses, and potential risks of various off-the-shelf LLMs. Our findings indicate the limited practical utility of current LLMs in identifying and clarifying ambiguous user queries.
arXiv Detail & Related papers (2024-05-20T14:34:01Z)
A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law [65.87885628115946]
Large language models (LLMs) are revolutionizing the landscapes of finance, healthcare, and law. We highlight the instrumental role of LLMs in enhancing diagnostic and treatment methodologies in healthcare, innovating financial analytics, and refining legal interpretation and compliance strategies. We critically examine the ethics for LLM applications in these fields, pointing out the existing ethical concerns and the need for transparent, fair, and robust AI systems.
arXiv Detail & Related papers (2024-05-02T22:43:02Z)
Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress. Our investigation exposes a critical oversight in this belief. By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z)
CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge [69.82940934994333]
We introduce CulturalTeaming, an interactive red-teaming system that leverages human-AI collaboration to build challenging evaluation dataset. Our study reveals that CulturalTeaming's various modes of AI assistance support annotators in creating cultural questions. CULTURALBENCH-V0.1 is a compact yet high-quality evaluation dataset with users' red-teaming attempts.
arXiv Detail & Related papers (2024-04-10T00:25:09Z)
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety [37.07970624135514]
OpenEval is an evaluation testbed that benchmarks Chinese LLMs across capability, alignment and safety. For capability assessment, we include 12 benchmark datasets to evaluate Chinese LLMs from 4 sub-dimensions: NLP tasks, disciplinary knowledge, commonsense reasoning and mathematical reasoning. For alignment assessment, OpenEval contains 7 datasets that examines the bias, offensiveness and illegalness in the outputs yielded by Chinese LLMs.
arXiv Detail & Related papers (2024-03-18T23:21:37Z)
Evaluating Large Language Models: A Comprehensive Survey [41.64914110226901]
Large language models (LLMs) have demonstrated remarkable capabilities across a broad spectrum of tasks. They could suffer from private data leaks or yield inappropriate, harmful, or misleading content. To effectively capitalize on LLM capacities as well as ensure their safe and beneficial development, it is critical to conduct a rigorous and comprehensive evaluation.
arXiv Detail & Related papers (2023-10-30T17:00:52Z)
She had Cobalt Blue Eyes: Prompt Testing to Create Aligned and Sustainable Language Models [2.6089354079273512]
Recent events indicate ethical concerns around conventionally trained large language models (LLMs) We introduce a test suite of prompts to foster the development of aligned LLMs that are fair, safe, and robust. Our test suite evaluates outputs from four state-of-the-art language models: GPT-3.5, GPT-4, OPT, and LLaMA-2.
arXiv Detail & Related papers (2023-10-20T14:18:40Z)
A Survey on Evaluation of Large Language Models [87.60417393701331]
Large language models (LLMs) are gaining increasing popularity in both academia and industry. This paper focuses on three key dimensions: what to evaluate, where to evaluate, and how to evaluate.
arXiv Detail & Related papers (2023-07-06T16:28:35Z)
Safety Assessment of Chinese Large Language Models [51.83369778259149]
Large language models (LLMs) may generate insulting and discriminatory content, reflect incorrect social values, and may be used for malicious purposes. To promote the deployment of safe, responsible, and ethical AI, we release SafetyPrompts including 100k augmented prompts and responses by LLMs.
arXiv Detail & Related papers (2023-04-20T16:27:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.