Related papers: Measuring Value Alignment

Related papers

Resource Rational Contractualism Should Guide AI Alignment [69.07915246220985]
Contractualist alignment proposes grounding decisions in agreements that diverse stakeholders would endorse.<n>We propose Resource-Rationalism: a framework where AI systems approximate the agreements rational parties would form.<n>An RRC-aligned agent would not only operate efficiently, but also be equipped to dynamically adapt to and interpret the ever-changing human social world.
arXiv Detail & Related papers (2025-06-20T18:57:13Z)
Towards an Approach for Evaluating the Impact of AI Standards [0.0]
The goal of AI standards is to promote innovation and public trust in systems that use AI.<n>There is a lack of a formal or shared method to measure the impact of these standardization activities on the goals of innovation and trust.<n>This concept paper proposes an analytical approach that could inform the evaluation of the impact of AI standards.
arXiv Detail & Related papers (2025-06-16T13:58:59Z)
Human-AI Governance (HAIG): A Trust-Utility Approach [0.0]
This paper introduces the HAIG framework for analysing trust dynamics across evolving human-AI relationships.<n>Our analysis reveals how technical advances in self-supervision, reasoning authority, and distributed decision-making drive non-uniform trust evolution.
arXiv Detail & Related papers (2025-05-03T01:57:08Z)
LLM Ethics Benchmark: A Three-Dimensional Assessment System for Evaluating Moral Reasoning in Large Language Models [8.018569128518187]
This study establishes a novel framework for systematically evaluating the moral reasoning capabilities of large language models (LLMs)<n>Our framework addresses this challenge by quantifying alignment with human ethical standards through three dimensions.<n>This approach enables precise identification of ethical strengths and weaknesses in LLMs, facilitating targeted improvements and stronger alignment with societal values.
arXiv Detail & Related papers (2025-05-01T20:36:19Z)
AI Ethics by Design: Implementing Customizable Guardrails for Responsible AI Development [0.0]
We propose a structure that integrates rules, policies, and AI assistants to ensure responsible AI behavior. Our approach accommodates ethical pluralism, offering a flexible and adaptable solution for the evolving landscape of AI governance.
arXiv Detail & Related papers (2024-11-05T18:38:30Z)
Engineering Trustworthy AI: A Developer Guide for Empirical Risk Minimization [53.80919781981027]
Key requirements for trustworthy AI can be translated into design choices for the components of empirical risk minimization. We hope to provide actionable guidance for building AI systems that meet emerging standards for trustworthiness of AI.
arXiv Detail & Related papers (2024-10-25T07:53:32Z)
ValueCompass: A Framework for Measuring Contextual Value Alignment Between Human and LLMs [14.621675648356236]
We introduce Value, a framework of fundamental values, grounded in psychological theory and a systematic review. We apply Value to measure the value alignment of humans and large language models (LLMs) across four real-world scenarios.
arXiv Detail & Related papers (2024-09-15T02:13:03Z)
Beyond Preferences in AI Alignment [15.878773061188516]
We characterize and challenge the preferentist approach to AI alignment. We show how preferences fail to capture the thick semantic content of human values. We argue that AI systems should be aligned with normative standards appropriate to their social roles.
arXiv Detail & Related papers (2024-08-30T03:14:20Z)
Ethical AI Governance: Methods for Evaluating Trustworthy AI [0.552480439325792]
Trustworthy Artificial Intelligence (TAI) integrates ethics that align with human values. TAI evaluation aims to ensure ethical standards and safety in AI development and usage.
arXiv Detail & Related papers (2024-08-28T09:25:50Z)
Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions [101.67121669727354]
Recent advancements in AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. The lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment. We introduce a systematic review of over 400 papers published between 2019 and January 2024, spanning multiple domains such as Human-Computer Interaction (HCI), Natural Language Processing (NLP), Machine Learning (ML)
arXiv Detail & Related papers (2024-06-13T16:03:25Z)
Towards Responsible AI in Banking: Addressing Bias for Fair Decision-Making [69.44075077934914]
"Responsible AI" emphasizes the critical nature of addressing biases within the development of a corporate culture. This thesis is structured around three fundamental pillars: understanding bias, mitigating bias, and accounting for bias. In line with open-source principles, we have released Bias On Demand and FairView as accessible Python packages.
arXiv Detail & Related papers (2024-01-13T14:07:09Z)
Hybrid Approaches for Moral Value Alignment in AI Agents: a Manifesto [3.7414804164475983]
Increasing interest in ensuring the safety of next-generation Artificial Intelligence (AI) systems calls for novel approaches to embedding morality into autonomous agents. We provide a systematization of existing approaches to the problem of introducing morality in machines - modelled as a continuum. We argue that more hybrid solutions are needed to create adaptable and robust, yet controllable and interpretable agentic systems.
arXiv Detail & Related papers (2023-12-04T11:46:34Z)
Training Socially Aligned Language Models on Simulated Social Interactions [99.39979111807388]
Social alignment in AI systems aims to ensure that these models behave according to established societal values. Current language models (LMs) are trained to rigidly replicate their training corpus in isolation. This work presents a novel training paradigm that permits LMs to learn from simulated social interactions.
arXiv Detail & Related papers (2023-05-26T14:17:36Z)
Human Values in Multiagent Systems [3.5027291542274357]
This paper presents a formal representation of values, grounded in the social sciences. We use this formal representation to articulate the key challenges for achieving value-aligned behaviour in multiagent systems.
arXiv Detail & Related papers (2023-05-04T11:23:59Z)
Metaethical Perspectives on 'Benchmarking' AI Ethics [81.65697003067841]
Benchmarks are seen as the cornerstone for measuring technical progress in Artificial Intelligence (AI) research. An increasingly prominent research area in AI is ethics, which currently has no set of benchmarks nor commonly accepted way for measuring the 'ethicality' of an AI system. We argue that it makes more sense to talk about 'values' rather than 'ethics' when considering the possible actions of present and future AI systems.
arXiv Detail & Related papers (2022-04-11T14:36:39Z)
Value alignment: a formal approach [2.8348950186890467]
principles that should govern autonomous AI systems. We first provide a formal model to represent values through preferences and ways to compute value aggregations. Value alignment is then defined, and computed, for a given norm with respect to a given value through the increase/decrease that it results in the preferences of future states of the world.
arXiv Detail & Related papers (2021-10-18T12:40:04Z)
An interdisciplinary conceptual study of Artificial Intelligence (AI) for helping benefit-risk assessment practices: Towards a comprehensive qualification matrix of AI programs and devices (pre-print 2020) [55.41644538483948]
This paper proposes a comprehensive analysis of existing concepts coming from different disciplines tackling the notion of intelligence. The aim is to identify shared notions or discrepancies to consider for qualifying AI systems.
arXiv Detail & Related papers (2021-05-07T12:01:31Z)
AAAI FSS-19: Human-Centered AI: Trustworthiness of AI Models and Data Proceedings [8.445274192818825]
It is crucial for predictive models to be uncertainty-aware and yield trustworthy predictions. The focus of this symposium was on AI systems to improve data quality and technical robustness and safety. submissions from broadly defined areas also discussed approaches addressing requirements such as explainable models, human trust and ethical aspects of AI.
arXiv Detail & Related papers (2020-01-15T15:30:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.