Concept Alignment
- URL: http://arxiv.org/abs/2401.08672v1
- Date: Tue, 9 Jan 2024 23:32:18 GMT
- Title: Concept Alignment
- Authors: Sunayana Rane, Polyphony J. Bruna, Ilia Sucholutsky, Christopher
Kello, Thomas L. Griffiths
- Abstract summary: We argue that before we can attempt to align values, it is imperative that AI systems and humans align the concepts they use to understand the world.
We integrate ideas from philosophy, cognitive science, and deep learning to explain the need for concept alignment.
- Score: 10.285482205152729
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Discussion of AI alignment (alignment between humans and AI systems) has
focused on value alignment, broadly referring to creating AI systems that share
human values. We argue that before we can even attempt to align values, it is
imperative that AI systems and humans align the concepts they use to understand
the world. We integrate ideas from philosophy, cognitive science, and deep
learning to explain the need for concept alignment, not just value alignment,
between humans and machines. We summarize existing accounts of how humans and
machines currently learn concepts, and we outline opportunities and challenges
in the path towards shared concepts. Finally, we explain how we can leverage
the tools already being developed in cognitive science and AI research to
accelerate progress towards concept alignment.
Related papers
- Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions [101.67121669727354]
Recent advancements in AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment.
The lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment.
arXiv Detail & Related papers (2024-06-13T16:03:25Z) - Making AI Intelligible: Philosophical Foundations [0.0]
'Making AI Intelligible' shows that philosophical work on the metaphysics of meaning can help answer these questions.
Author: The questions addressed in the book are not only theoretically interesting, but the answers have pressing practical implications.
arXiv Detail & Related papers (2024-06-12T12:25:04Z) - Concept Alignment as a Prerequisite for Value Alignment [11.236150405125754]
Value alignment is essential for building AI systems that can safely and reliably interact with people.
We show how concept alignment can lead to systematic value mis-alignment.
We describe an approach that helps minimize such failure modes by jointly reasoning about a person's concepts and values.
arXiv Detail & Related papers (2023-10-30T22:23:15Z) - AI Alignment: A Comprehensive Survey [70.35693485015659]
AI alignment aims to make AI systems behave in line with human intentions and values.
We identify four principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality.
We decompose current alignment research into two key components: forward alignment and backward alignment.
arXiv Detail & Related papers (2023-10-30T15:52:15Z) - Reflective Hybrid Intelligence for Meaningful Human Control in
Decision-Support Systems [4.1454448964078585]
We introduce the notion of self-reflective AI systems for meaningful human control over AI systems.
We propose a framework that integrates knowledge from psychology and philosophy with formal reasoning methods and machine learning approaches.
We argue that self-reflective AI systems can lead to self-reflective hybrid systems (human + AI)
arXiv Detail & Related papers (2023-07-12T13:32:24Z) - Trustworthy AI: A Computational Perspective [54.80482955088197]
We focus on six of the most crucial dimensions in achieving trustworthy AI: (i) Safety & Robustness, (ii) Non-discrimination & Fairness, (iii) Explainability, (iv) Privacy, (v) Accountability & Auditability, and (vi) Environmental Well-Being.
For each dimension, we review the recent related technologies according to a taxonomy and summarize their applications in real-world systems.
arXiv Detail & Related papers (2021-07-12T14:21:46Z) - The Short Anthropological Guide to the Study of Ethical AI [91.3755431537592]
Short guide serves as both an introduction to AI ethics and social science and anthropological perspectives on the development of AI.
Aims to provide those unfamiliar with the field with an insight into the societal impact of AI systems and how, in turn, these systems can lead us to rethink how our world operates.
arXiv Detail & Related papers (2020-10-07T12:25:03Z) - Aligning AI With Shared Human Values [85.2824609130584]
We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality.
We find that current language models have a promising but incomplete ability to predict basic human ethical judgements.
Our work shows that progress can be made on machine ethics today, and it provides a steppingstone toward AI that is aligned with human values.
arXiv Detail & Related papers (2020-08-05T17:59:16Z) - Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike
Common Sense [142.53911271465344]
We argue that the next generation of AI must embrace "dark" humanlike common sense for solving novel tasks.
We identify functionality, physics, intent, causality, and utility (FPICU) as the five core domains of cognitive AI with humanlike common sense.
arXiv Detail & Related papers (2020-04-20T04:07:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.