A collection of principles for guiding and evaluating large language
models
- URL: http://arxiv.org/abs/2312.10059v1
- Date: Mon, 4 Dec 2023 12:06:12 GMT
- Title: A collection of principles for guiding and evaluating large language
models
- Authors: Konstantin Hebenstreit, Robert Praas, Matthias Samwald
- Abstract summary: We identify and curate a list of 220 principles from literature, and derive a set of 37 core principles organized into seven categories.
We conduct a small-scale expert survey, eliciting the subjective importance experts assign to different principles.
We envision that the development of a shared model of principles can serve multiple purposes.
- Score: 5.412690203810726
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) demonstrate outstanding capabilities, but
challenges remain regarding their ability to solve complex reasoning tasks, as
well as their transparency, robustness, truthfulness, and ethical alignment. In
this preliminary study, we compile a set of core principles for steering and
evaluating the reasoning of LLMs by curating literature from several relevant
strands of work: structured reasoning in LLMs, self-evaluation/self-reflection,
explainability, AI system safety/security, guidelines for human critical
thinking, and ethical/regulatory guidelines for AI. We identify and curate a
list of 220 principles from literature, and derive a set of 37 core principles
organized into seven categories: assumptions and perspectives, reasoning,
information and evidence, robustness and security, ethics, utility, and
implications. We conduct a small-scale expert survey, eliciting the subjective
importance experts assign to different principles and lay out avenues for
future work beyond our preliminary results. We envision that the development of
a shared model of principles can serve multiple purposes: monitoring and
steering models at inference time, improving model behavior during training,
and guiding human evaluation of model reasoning.
Related papers
- IDEA: Enhancing the Rule Learning Ability of Large Language Model Agent through Induction, Deduction, and Abduction [3.961279440272764]
We introduce RULEARN, a novel benchmark designed to assess the rule-learning abilities of large language models in interactive settings.
We propose IDEA, a novel reasoning framework that integrates the process of Induction, Deduction, and Abduction.
Our evaluation of the IDEA framework, which involves five representative LLMs, demonstrates significant improvements over the baseline.
arXiv Detail & Related papers (2024-08-19T23:37:07Z) - Can I understand what I create? Self-Knowledge Evaluation of Large Language Models [31.85129258347539]
Large language models (LLMs) have achieved remarkable progress in linguistic tasks.
Inspired by Feynman's principle of understanding through creation, we introduce a self-knowledge evaluation framework.
arXiv Detail & Related papers (2024-06-10T09:53:54Z) - MoralBench: Moral Evaluation of LLMs [34.43699121838648]
This paper introduces a novel benchmark designed to measure and compare the moral reasoning capabilities of large language models (LLMs)
We present the first comprehensive dataset specifically curated to probe the moral dimensions of LLM outputs.
Our methodology involves a multi-faceted approach, combining quantitative analysis with qualitative insights from ethics scholars to ensure a thorough evaluation of model performance.
arXiv Detail & Related papers (2024-06-06T18:15:01Z) - Enhancing LLM-Based Feedback: Insights from Intelligent Tutoring Systems and the Learning Sciences [0.0]
This work advocates careful and caring AIED research by going through previous research on feedback generation in ITS.
The main contributions of this paper include: an avocation of applying more cautious, theoretically grounded methods in feedback generation in the era of generative AI.
arXiv Detail & Related papers (2024-05-07T20:09:18Z) - FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models [64.11333762954283]
This paper introduces FoundaBench, a pioneering benchmark designed to rigorously evaluate the fundamental knowledge capabilities of Chinese LLMs.
We present an extensive evaluation of 12 state-of-the-art LLMs using FoundaBench, employing both traditional assessment methods and our CircularEval protocol to mitigate potential biases in model responses.
Our results highlight the superior performance of models pre-trained on Chinese corpora, and reveal a significant disparity between models' reasoning and memory recall capabilities.
arXiv Detail & Related papers (2024-04-29T01:49:07Z) - Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches [69.73783026870998]
This work proposes a novel framework, ValueLex, to reconstruct Large Language Models' unique value system from scratch.
Based on Lexical Hypothesis, ValueLex introduces a generative approach to elicit diverse values from 30+ LLMs.
We identify three core value dimensions, Competence, Character, and Integrity, each with specific subdimensions, revealing that LLMs possess a structured, albeit non-human, value system.
arXiv Detail & Related papers (2024-04-19T09:44:51Z) - Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress.
Our investigation exposes a critical oversight in this belief.
By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z) - F-Eval: Assessing Fundamental Abilities with Refined Evaluation Methods [102.98899881389211]
We propose F-Eval, a bilingual evaluation benchmark to evaluate the fundamental abilities, including expression, commonsense and logic.
For reference-free subjective tasks, we devise new evaluation methods, serving as alternatives to scoring by API models.
arXiv Detail & Related papers (2024-01-26T13:55:32Z) - Unpacking the Ethical Value Alignment in Big Models [46.560886177083084]
This paper provides an overview of the risks and challenges associated with big models, surveys existing AI ethics guidelines, and examines the ethical implications arising from the limitations of these models.
We introduce a novel conceptual paradigm for aligning the ethical values of big models and discuss promising research directions for alignment criteria, evaluation, and method.
arXiv Detail & Related papers (2023-10-26T16:45:40Z) - SALMON: Self-Alignment with Instructable Reward Models [80.83323636730341]
This paper presents a novel approach, namely SALMON, to align base language models with minimal human supervision.
We develop an AI assistant named Dromedary-2 with only 6 exemplars for in-context learning and 31 human-defined principles.
arXiv Detail & Related papers (2023-10-09T17:56:53Z) - Principle-Driven Self-Alignment of Language Models from Scratch with
Minimal Human Supervision [84.31474052176343]
Recent AI-assistant agents, such as ChatGPT, rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback to align the output with human intentions.
This dependence can significantly constrain the true potential of AI-assistant agents due to the high cost of obtaining human supervision.
We propose a novel approach called SELF-ALIGN, which combines principle-driven reasoning and the generative power of LLMs for the self-alignment of AI agents with minimal human supervision.
arXiv Detail & Related papers (2023-05-04T17:59:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.