Related papers: Concept Alignment

Concept Alignment

URL: http://arxiv.org/abs/2401.08672v1
Date: Tue, 9 Jan 2024 23:32:18 GMT
Title: Concept Alignment
Authors: Sunayana Rane, Polyphony J. Bruna, Ilia Sucholutsky, Christopher Kello, Thomas L. Griffiths
Abstract summary: We argue that before we can attempt to align values, it is imperative that AI systems and humans align the concepts they use to understand the world. We integrate ideas from philosophy, cognitive science, and deep learning to explain the need for concept alignment.
Score: 10.285482205152729
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Discussion of AI alignment (alignment between humans and AI systems) has focused on value alignment, broadly referring to creating AI systems that share human values. We argue that before we can even attempt to align values, it is imperative that AI systems and humans align the concepts they use to understand the world. We integrate ideas from philosophy, cognitive science, and deep learning to explain the need for concept alignment, not just value alignment, between humans and machines. We summarize existing accounts of how humans and machines currently learn concepts, and we outline opportunities and challenges in the path towards shared concepts. Finally, we explain how we can leverage the tools already being developed in cognitive science and AI research to accelerate progress towards concept alignment.

Related papers

Towards a Theory of AI Personhood [1.6317061277457001]
We outline necessary conditions for AI personhood, focusing on agency, theory-of-mind, and self-awareness. If AI systems can be considered persons, then typical framings of AI alignment may be incomplete.
arXiv Detail & Related papers (2025-01-23T10:31:26Z)
Aligning Generalisation Between Humans and Machines [74.120848518198]
AI technology can support humans in scientific discovery and forming decisions, but may also disrupt democracies and target individuals.<n>The responsible use of AI and its participation in human-AI teams increasingly shows the need for AI alignment.<n>A crucial yet often overlooked aspect of these interactions is the different ways in which humans and machines generalise.
arXiv Detail & Related papers (2024-11-23T18:36:07Z)
ValueCompass: A Framework of Fundamental Values for Human-AI Alignment [15.35489011078817]
We introduce Value, a framework of fundamental values, grounded in psychological theory and a systematic review. We apply Value to measure the value alignment of humans and language models (LMs) across four real-world vignettes. Our findings uncover risky misalignment between humans and LMs, such as LMs agreeing with values like "Choose Own Goals", which are largely disagreed by humans.
arXiv Detail & Related papers (2024-09-15T02:13:03Z)
Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions [101.67121669727354]
Recent advancements in AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. The lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment. We introduce a systematic review of over 400 papers published between 2019 and January 2024, spanning multiple domains such as Human-Computer Interaction (HCI), Natural Language Processing (NLP), Machine Learning (ML)
arXiv Detail & Related papers (2024-06-13T16:03:25Z)
Concept Alignment as a Prerequisite for Value Alignment [11.236150405125754]
Value alignment is essential for building AI systems that can safely and reliably interact with people. We show how concept alignment can lead to systematic value mis-alignment. We describe an approach that helps minimize such failure modes by jointly reasoning about a person's concepts and values.
arXiv Detail & Related papers (2023-10-30T22:23:15Z)
AI Alignment: A Comprehensive Survey [70.35693485015659]
AI alignment aims to make AI systems behave in line with human intentions and values. We identify four principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality. We decompose current alignment research into two key components: forward alignment and backward alignment.
arXiv Detail & Related papers (2023-10-30T15:52:15Z)
Reflective Hybrid Intelligence for Meaningful Human Control in Decision-Support Systems [4.1454448964078585]
We introduce the notion of self-reflective AI systems for meaningful human control over AI systems. We propose a framework that integrates knowledge from psychology and philosophy with formal reasoning methods and machine learning approaches. We argue that self-reflective AI systems can lead to self-reflective hybrid systems (human + AI)
arXiv Detail & Related papers (2023-07-12T13:32:24Z)
Trustworthy AI: A Computational Perspective [54.80482955088197]
We focus on six of the most crucial dimensions in achieving trustworthy AI: (i) Safety & Robustness, (ii) Non-discrimination & Fairness, (iii) Explainability, (iv) Privacy, (v) Accountability & Auditability, and (vi) Environmental Well-Being. For each dimension, we review the recent related technologies according to a taxonomy and summarize their applications in real-world systems.
arXiv Detail & Related papers (2021-07-12T14:21:46Z)
The Short Anthropological Guide to the Study of Ethical AI [91.3755431537592]
Short guide serves as both an introduction to AI ethics and social science and anthropological perspectives on the development of AI. Aims to provide those unfamiliar with the field with an insight into the societal impact of AI systems and how, in turn, these systems can lead us to rethink how our world operates.
arXiv Detail & Related papers (2020-10-07T12:25:03Z)
Aligning AI With Shared Human Values [85.2824609130584]
We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality. We find that current language models have a promising but incomplete ability to predict basic human ethical judgements. Our work shows that progress can be made on machine ethics today, and it provides a steppingstone toward AI that is aligned with human values.
arXiv Detail & Related papers (2020-08-05T17:59:16Z)
Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense [142.53911271465344]
We argue that the next generation of AI must embrace "dark" humanlike common sense for solving novel tasks. We identify functionality, physics, intent, causality, and utility (FPICU) as the five core domains of cognitive AI with humanlike common sense.
arXiv Detail & Related papers (2020-04-20T04:07:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.