From Instructions to Intrinsic Human Values -- A Survey of Alignment
Goals for Big Models
- URL: http://arxiv.org/abs/2308.12014v2
- Date: Mon, 4 Sep 2023 03:32:05 GMT
- Title: From Instructions to Intrinsic Human Values -- A Survey of Alignment
Goals for Big Models
- Authors: Jing Yao, Xiaoyuan Yi, Xiting Wang, Jindong Wang and Xing Xie
- Abstract summary: We conduct a survey of different alignment goals in existing work and trace their evolution paths to help identify the most essential goal.
Our analysis reveals a goal transformation from fundamental abilities to value orientation, indicating the potential of intrinsic human values as the alignment goal for enhanced LLMs.
- Score: 48.326660953180145
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Big models, exemplified by Large Language Models (LLMs), are models typically
pre-trained on massive data and comprised of enormous parameters, which not
only obtain significantly improved performance across diverse tasks but also
present emergent capabilities absent in smaller models. However, the growing
intertwining of big models with everyday human lives poses potential risks and
might cause serious social harm. Therefore, many efforts have been made to
align LLMs with humans to make them better follow user instructions and satisfy
human preferences. Nevertheless, `what to align with' has not been fully
discussed, and inappropriate alignment goals might even backfire. In this
paper, we conduct a comprehensive survey of different alignment goals in
existing work and trace their evolution paths to help identify the most
essential goal. Particularly, we investigate related works from two
perspectives: the definition of alignment goals and alignment evaluation. Our
analysis encompasses three distinct levels of alignment goals and reveals a
goal transformation from fundamental abilities to value orientation, indicating
the potential of intrinsic human values as the alignment goal for enhanced
LLMs. Based on such results, we further discuss the challenges of achieving
such intrinsic value alignment and provide a collection of available resources
for future research on the alignment of big models.
Related papers
- Benchmarking General-Purpose In-Context Learning [19.40952728849431]
In-context learning (ICL) empowers generative models to address new tasks effectively and efficiently on the fly.
In this paper, we study extending ICL to address a broader range of tasks with an extended learning horizon and higher improvement potential.
arXiv Detail & Related papers (2024-05-27T14:50:42Z) - Assessment of Multimodal Large Language Models in Alignment with Human Values [43.023052912326314]
We introduce Ch3Ef, a Compreh3ensive Evaluation dataset and strategy for assessing alignment with human expectations.
Ch3Ef dataset contains 1002 human-annotated data samples, covering 12 domains and 46 tasks based on the hhh principle.
arXiv Detail & Related papers (2024-03-26T16:10:21Z) - Controllable Preference Optimization: Toward Controllable
Multi-Objective Alignment [107.63756895544842]
Alignment in artificial intelligence pursues consistency between model responses and human preferences as well as values.
Existing alignment techniques are mostly unidirectional, leading to suboptimal trade-offs and poor flexibility over various objectives.
We introduce controllable preference optimization (CPO), which explicitly specifies preference scores for different objectives.
arXiv Detail & Related papers (2024-02-29T12:12:30Z) - DeAL: Decoding-time Alignment for Large Language Models [59.63643988872571]
Large Language Models (LLMs) are nowadays expected to generate content aligned with human preferences.
We propose DeAL, a framework that allows the user to customize reward functions and enables Detime Alignment of LLMs.
Our experiments show that we can DeAL with fine-grained trade-offs, improve adherence to alignment objectives, and address residual gaps in LLMs.
arXiv Detail & Related papers (2024-02-05T06:12:29Z) - Simplifying Model-based RL: Learning Representations, Latent-space
Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent.
We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z) - StyleGAN-Human: A Data-Centric Odyssey of Human Generation [96.7080874757475]
This work takes a data-centric perspective and investigates multiple critical aspects in "data engineering"
We collect and annotate a large-scale human image dataset with over 230K samples capturing diverse poses and textures.
We rigorously investigate three essential factors in data engineering for StyleGAN-based human generation, namely data size, data distribution, and data alignment.
arXiv Detail & Related papers (2022-04-25T17:55:08Z) - A General Language Assistant as a Laboratory for Alignment [3.3598752405752106]
We study simple baseline techniques and evaluations, such as prompting.
We find that the benefits from modest interventions increase with model size, generalize to a variety of alignment evaluations, and do not compromise the performance of large models.
We study a preference model pre-training' stage of training, with the goal of improving sample efficiency when finetuning on human preferences.
arXiv Detail & Related papers (2021-12-01T22:24:34Z) - Action and Perception as Divergence Minimization [43.75550755678525]
Action Perception Divergence is an approach for categorizing the space of possible objective functions for embodied agents.
We show a spectrum that reaches from narrow to general objectives.
These agents use perception to align their beliefs with the world and use actions to align the world with their beliefs.
arXiv Detail & Related papers (2020-09-03T16:52:46Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.