Concept Alignment
- URL: http://arxiv.org/abs/2401.08672v1
- Date: Tue, 9 Jan 2024 23:32:18 GMT
- Title: Concept Alignment
- Authors: Sunayana Rane, Polyphony J. Bruna, Ilia Sucholutsky, Christopher
Kello, Thomas L. Griffiths
- Abstract summary: We argue that before we can attempt to align values, it is imperative that AI systems and humans align the concepts they use to understand the world.
We integrate ideas from philosophy, cognitive science, and deep learning to explain the need for concept alignment.
- Score: 10.285482205152729
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Discussion of AI alignment (alignment between humans and AI systems) has
focused on value alignment, broadly referring to creating AI systems that share
human values. We argue that before we can even attempt to align values, it is
imperative that AI systems and humans align the concepts they use to understand
the world. We integrate ideas from philosophy, cognitive science, and deep
learning to explain the need for concept alignment, not just value alignment,
between humans and machines. We summarize existing accounts of how humans and
machines currently learn concepts, and we outline opportunities and challenges
in the path towards shared concepts. Finally, we explain how we can leverage
the tools already being developed in cognitive science and AI research to
accelerate progress towards concept alignment.
Related papers
- ValueCompass: A Framework of Fundamental Values for Human-AI Alignment [15.35489011078817]
We introduce Value, a framework of fundamental values, grounded in psychological theory and a systematic review.
We apply Value to measure the value alignment of humans and language models (LMs) across four real-world vignettes.
Our findings uncover risky misalignment between humans and LMs, such as LMs agreeing with values like "Choose Own Goals", which are largely disagreed by humans.
arXiv Detail & Related papers (2024-09-15T02:13:03Z) - Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions [101.67121669727354]
Recent advancements in AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment.
The lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment.
We introduce a systematic review of over 400 papers published between 2019 and January 2024, spanning multiple domains such as Human-Computer Interaction (HCI), Natural Language Processing (NLP), Machine Learning (ML)
arXiv Detail & Related papers (2024-06-13T16:03:25Z) - Concept Alignment as a Prerequisite for Value Alignment [11.236150405125754]
Value alignment is essential for building AI systems that can safely and reliably interact with people.
We show how concept alignment can lead to systematic value mis-alignment.
We describe an approach that helps minimize such failure modes by jointly reasoning about a person's concepts and values.
arXiv Detail & Related papers (2023-10-30T22:23:15Z) - AI Alignment: A Comprehensive Survey [70.35693485015659]
AI alignment aims to make AI systems behave in line with human intentions and values.
We identify four principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality.
We decompose current alignment research into two key components: forward alignment and backward alignment.
arXiv Detail & Related papers (2023-10-30T15:52:15Z) - Reflective Hybrid Intelligence for Meaningful Human Control in
Decision-Support Systems [4.1454448964078585]
We introduce the notion of self-reflective AI systems for meaningful human control over AI systems.
We propose a framework that integrates knowledge from psychology and philosophy with formal reasoning methods and machine learning approaches.
We argue that self-reflective AI systems can lead to self-reflective hybrid systems (human + AI)
arXiv Detail & Related papers (2023-07-12T13:32:24Z) - Trustworthy AI: A Computational Perspective [54.80482955088197]
We focus on six of the most crucial dimensions in achieving trustworthy AI: (i) Safety & Robustness, (ii) Non-discrimination & Fairness, (iii) Explainability, (iv) Privacy, (v) Accountability & Auditability, and (vi) Environmental Well-Being.
For each dimension, we review the recent related technologies according to a taxonomy and summarize their applications in real-world systems.
arXiv Detail & Related papers (2021-07-12T14:21:46Z) - The Short Anthropological Guide to the Study of Ethical AI [91.3755431537592]
Short guide serves as both an introduction to AI ethics and social science and anthropological perspectives on the development of AI.
Aims to provide those unfamiliar with the field with an insight into the societal impact of AI systems and how, in turn, these systems can lead us to rethink how our world operates.
arXiv Detail & Related papers (2020-10-07T12:25:03Z) - Aligning AI With Shared Human Values [85.2824609130584]
We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality.
We find that current language models have a promising but incomplete ability to predict basic human ethical judgements.
Our work shows that progress can be made on machine ethics today, and it provides a steppingstone toward AI that is aligned with human values.
arXiv Detail & Related papers (2020-08-05T17:59:16Z) - Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike
Common Sense [142.53911271465344]
We argue that the next generation of AI must embrace "dark" humanlike common sense for solving novel tasks.
We identify functionality, physics, intent, causality, and utility (FPICU) as the five core domains of cognitive AI with humanlike common sense.
arXiv Detail & Related papers (2020-04-20T04:07:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.