Scopes of Alignment
- URL: http://arxiv.org/abs/2501.12405v1
- Date: Wed, 15 Jan 2025 03:06:59 GMT
- Title: Scopes of Alignment
- Authors: Kush R. Varshney, Zahra Ashktorab, Djallel Bouneffouf, Matthew Riemer, Justin D. Weisz,
- Abstract summary: Much of the research focus on AI alignment seeks to align large language models to generic values of helpfulness, harmlessness, and honesty.<n>In this paper, we motivate why we need to move beyond such a limited conception and propose three dimensions for doing so.
- Score: 38.65920343856857
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Much of the research focus on AI alignment seeks to align large language models and other foundation models to the context-less and generic values of helpfulness, harmlessness, and honesty. Frontier model providers also strive to align their models with these values. In this paper, we motivate why we need to move beyond such a limited conception and propose three dimensions for doing so. The first scope of alignment is competence: knowledge, skills, or behaviors the model must possess to be useful for its intended purpose. The second scope of alignment is transience: either semantic or episodic depending on the context of use. The third scope of alignment is audience: either mass, public, small-group, or dyadic. At the end of the paper, we use the proposed framework to position some technologies and workflows that go beyond prevailing notions of alignment.
Related papers
- Text Embeddings Should Capture Implicit Semantics, Not Just Surface Meaning [17.00358234728804]
We argue that the text embedding research community should move beyond surface meaning and embrace implicit semantics as a central modeling goal.<n>Current embedding models are typically trained on data that lacks such depth and evaluated on benchmarks that reward the capture of surface meaning.<n>Our pilot study highlights this gap, showing that even state-of-the-art models perform only marginally better than simplistic baselines on implicit semantics tasks.
arXiv Detail & Related papers (2025-06-10T02:11:42Z) - Dynamic Normativity: Necessary and Sufficient Conditions for Value Alignment [0.0]
We find "alignment" a problem related to the challenges of expressing human goals and values in a manner that artificial systems can follow without leading to unwanted adversarial effects.
This work addresses alignment as a technical-philosophical problem that requires solid philosophical foundations and practical implementations that bring normative theory to AI system development.
arXiv Detail & Related papers (2024-06-16T18:37:31Z) - Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations [61.141986747544024]
We present an approach that empowers application developers to tune a model to their particular values, social norms, laws and other regulations.
We lay out three main components of such an Alignment Studio architecture: Framers, Instructors, and Auditors.
arXiv Detail & Related papers (2024-03-08T21:26:49Z) - On the Essence and Prospect: An Investigation of Alignment Approaches
for Big Models [77.86952307745763]
Big models have achieved revolutionary breakthroughs in the field of AI, but they might also pose potential concerns.
Addressing such concerns, alignment technologies were introduced to make these models conform to human preferences and values.
Despite considerable advancements in the past year, various challenges lie in establishing the optimal alignment strategy.
arXiv Detail & Related papers (2024-03-07T04:19:13Z) - Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment [103.12563033438715]
Alignment in artificial intelligence pursues consistency between model responses and human preferences as well as values.
Existing alignment techniques are mostly unidirectional, leading to suboptimal trade-offs and poor flexibility over various objectives.
We introduce controllable preference optimization (CPO), which explicitly specifies preference scores for different objectives.
arXiv Detail & Related papers (2024-02-29T12:12:30Z) - AI Alignment: A Comprehensive Survey [69.61425542486275]
AI alignment aims to make AI systems behave in line with human intentions and values.
We identify four principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality.
We decompose current alignment research into two key components: forward alignment and backward alignment.
arXiv Detail & Related papers (2023-10-30T15:52:15Z) - The Empty Signifier Problem: Towards Clearer Paradigms for
Operationalising "Alignment" in Large Language Models [18.16062736448993]
We address the concept of "alignment" in large language models (LLMs) through the lens of post-structuralist socio-political theory.
We propose a framework that demarcates: 1) which dimensions of model behaviour are considered important, then 2) how meanings and definitions are ascribed to these dimensions.
We aim to foster a culture of transparency and critical evaluation, aiding the community in navigating the complexities of aligning LLMs with human populations.
arXiv Detail & Related papers (2023-10-03T22:02:17Z) - Grounded Entity-Landmark Adaptive Pre-training for Vision-and-Language
Navigation [23.94546957057613]
Cross-modal alignment is one key challenge for Vision-and-Language Navigation (VLN)
We propose a novel Grounded Entity-Landmark Adaptive (GELA) pre-training paradigm for VLN tasks.
arXiv Detail & Related papers (2023-08-24T06:25:20Z) - From Instructions to Intrinsic Human Values -- A Survey of Alignment
Goals for Big Models [48.326660953180145]
We conduct a survey of different alignment goals in existing work and trace their evolution paths to help identify the most essential goal.
Our analysis reveals a goal transformation from fundamental abilities to value orientation, indicating the potential of intrinsic human values as the alignment goal for enhanced LLMs.
arXiv Detail & Related papers (2023-08-23T09:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.