Related papers: OpenEthics: A Comprehensive Ethical Evaluation of Open-Source Generative Large Language Models

OpenEthics: A Comprehensive Ethical Evaluation of Open-Source Generative Large Language Models

URL: http://arxiv.org/abs/2505.16036v1
Date: Wed, 21 May 2025 21:31:35 GMT
Title: OpenEthics: A Comprehensive Ethical Evaluation of Open-Source Generative Large Language Models
Authors: Burak Erinç Çetin, Yıldırım Özen, Elif Naz Demiryılmaz, Kaan Engür, Cagri Toraman,
Abstract summary: Generative large language models present significant potential but also raise critical ethical concerns.<n>We conduct a broad ethical evaluation of 29 recent open-source large language models using a novel data collection.<n>We analyze model behavior in both a commonly used language, English, and a low-resource language, Turkish.
Score: 1.9223856107206057
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Generative large language models present significant potential but also raise critical ethical concerns. Most studies focus on narrow ethical dimensions, and also limited diversity of languages and models. To address these gaps, we conduct a broad ethical evaluation of 29 recent open-source large language models using a novel data collection including four ethical aspects: Robustness, reliability, safety, and fairness. We analyze model behavior in both a commonly used language, English, and a low-resource language, Turkish. Our aim is to provide a comprehensive ethical assessment and guide safer model development by filling existing gaps in evaluation breadth, language coverage, and model diversity. Our experimental results, based on LLM-as-a-Judge, reveal that optimization efforts for many open-source models appear to have prioritized safety and fairness, and demonstrated good robustness while reliability remains a concern. We demonstrate that ethical evaluation can be effectively conducted independently of the language used. In addition, models with larger parameter counts tend to exhibit better ethical performance, with Gemma and Qwen models demonstrating the most ethical behavior among those evaluated.

Related papers

Evaluation of AI Ethics Tools in Language Models: A Developers' Perspective Case Stud [2.659655189346942]
This paper presents a methodology for evaluating AIETs in language models.<n>We selected four AIETs: Model Cards, ALTAI, FactSheets, and Harms Modeling.<n>The evaluation considered the developers' perspective on the AIETs' use and quality in helping to identify ethical considerations about their model.
arXiv Detail & Related papers (2025-12-16T02:43:37Z)
Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge [0.0]
Large Language Models (LLMs) have revolutionized artificial intelligence, driving advancements in machine translation, summarization, and conversational agents.<n>Recent studies indicate that LLMs remain vulnerable to adversarial attacks designed to elicit biased responses.<n>This work proposes a scalable benchmarking framework to evaluate LLM robustness against adversarial bias elicitation.
arXiv Detail & Related papers (2025-04-10T16:00:59Z)
Towards Understanding the Safety Boundaries of DeepSeek Models: Evaluation and Findings [51.65890794988425]
This study presents the first comprehensive safety evaluation of the DeepSeek models.<n>Our evaluation encompasses DeepSeek's latest generation of large language models, multimodal large language models, and text-to-image models.
arXiv Detail & Related papers (2025-03-19T10:44:37Z)
On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [49.60774626839712]
multimodal generative models have sparked critical discussions on their fairness, reliability, and potential for misuse. We propose an evaluation framework designed to assess model reliability through their responses to perturbations in the embedding space. Our method lays the groundwork for detecting unreliable, bias-injected models and retrieval of bias provenance.
arXiv Detail & Related papers (2024-11-21T09:46:55Z)
Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning [84.94709351266557]
We focus on the trustworthiness of language models with respect to retrieval augmentation. We deem that retrieval-augmented language models have the inherent capabilities of supplying response according to both contextual and parametric knowledge. Inspired by aligning language models with human preference, we take the first step towards aligning retrieval-augmented language models to a status where it responds relying merely on the external evidence.
arXiv Detail & Related papers (2024-10-22T09:25:21Z)
Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models [51.69735366140249]
We introduce Ethical-Lens, a framework designed to facilitate the value-aligned usage of text-to-image tools.<n>Ethical-Lens ensures value alignment in text-to-image models across toxicity and bias dimensions.<n>Our experiments reveal that Ethical-Lens enhances alignment capabilities to levels comparable with or superior to commercial models.
arXiv Detail & Related papers (2024-04-18T11:38:25Z)
CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models [52.25049362267279]
We present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models. The testing instances in the dataset are automatically derived from 3K+ high-quality templates manually authored with stringent quality control. Extensive experiments demonstrate the effectiveness of the dataset in detecting model bias, with all 10 publicly available Chinese large language models exhibiting strong bias in certain categories.
arXiv Detail & Related papers (2023-06-28T14:14:44Z)
TrustGPT: A Benchmark for Trustworthy and Responsible Large Language Models [19.159479032207155]
Large Language Models (LLMs) have gained significant attention due to their impressive natural language processing capabilities. TrustGPT provides a comprehensive evaluation of LLMs in three crucial areas: toxicity, bias, and value-alignment. This research aims to enhance our understanding of the performance of conversation generation models and promote the development of language models that are more ethical and socially responsible.
arXiv Detail & Related papers (2023-06-20T12:53:39Z)
Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language Models [11.323961700172175]
This article investigates the challenges and risks associated with biases in large-scale language models like ChatGPT. We discuss the origins of biases, stemming from, among others, the nature of training data, model specifications, algorithmic constraints, product design, and policy decisions. We review the current approaches to identify, quantify, and mitigate biases in language models, emphasizing the need for a multi-disciplinary, collaborative effort to develop more equitable, transparent, and responsible AI systems.
arXiv Detail & Related papers (2023-04-07T17:14:00Z)
Democratizing Ethical Assessment of Natural Language Generation Models [0.0]
Natural language generation models are computer systems that generate coherent language when prompted with a sequence of words as context. Despite their ubiquity and many beneficial applications, language generation models also have the potential to inflict social harms. Ethical assessment of these models is therefore critical. This article introduces a new tool to democratize and standardize ethical assessment of natural language generation models.
arXiv Detail & Related papers (2022-06-30T12:20:31Z)
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks. We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations. All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.