A collection of principles for guiding and evaluating large language
models
- URL: http://arxiv.org/abs/2312.10059v1
- Date: Mon, 4 Dec 2023 12:06:12 GMT
- Title: A collection of principles for guiding and evaluating large language
models
- Authors: Konstantin Hebenstreit, Robert Praas, Matthias Samwald
- Abstract summary: We identify and curate a list of 220 principles from literature, and derive a set of 37 core principles organized into seven categories.
We conduct a small-scale expert survey, eliciting the subjective importance experts assign to different principles.
We envision that the development of a shared model of principles can serve multiple purposes.
- Score: 5.412690203810726
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) demonstrate outstanding capabilities, but
challenges remain regarding their ability to solve complex reasoning tasks, as
well as their transparency, robustness, truthfulness, and ethical alignment. In
this preliminary study, we compile a set of core principles for steering and
evaluating the reasoning of LLMs by curating literature from several relevant
strands of work: structured reasoning in LLMs, self-evaluation/self-reflection,
explainability, AI system safety/security, guidelines for human critical
thinking, and ethical/regulatory guidelines for AI. We identify and curate a
list of 220 principles from literature, and derive a set of 37 core principles
organized into seven categories: assumptions and perspectives, reasoning,
information and evidence, robustness and security, ethics, utility, and
implications. We conduct a small-scale expert survey, eliciting the subjective
importance experts assign to different principles and lay out avenues for
future work beyond our preliminary results. We envision that the development of
a shared model of principles can serve multiple purposes: monitoring and
steering models at inference time, improving model behavior during training,
and guiding human evaluation of model reasoning.
Related papers
- Advancing Reasoning in Large Language Models: Promising Methods and Approaches [0.0]
Large Language Models (LLMs) have succeeded remarkably in various natural language processing (NLP) tasks.
Their ability to perform complex reasoning-spanning logical deduction, mathematical problem-solving, commonsense inference, and multi-step reasoning-often falls short of human expectations.
This survey provides a comprehensive review of emerging techniques enhancing reasoning in LLMs.
arXiv Detail & Related papers (2025-02-05T23:31:39Z) - A Unified Understanding and Evaluation of Steering Methods [17.420727709895736]
Steering methods provide a practical approach to controlling large language models by applying steering vectors to intermediate activations.
Despite their growing importance, the field lacks a unified understanding and consistent evaluation across tasks and datasets.
This paper introduces a unified framework for analyzing and evaluating steering methods, formalizing their core principles and offering theoretical insights into their effectiveness.
arXiv Detail & Related papers (2025-02-04T20:55:24Z) - IDEA: Enhancing the Rule Learning Ability of Large Language Model Agent through Induction, Deduction, and Abduction [3.961279440272764]
We introduce RULEARN to assess the rule-learning abilities of large language models in interactive settings.
We propose IDEA, a novel reasoning framework that integrates the process of Induction, Deduction, and Abduction.
Our evaluation of the IDEA framework, which involves five representative LLMs, demonstrates significant improvements over the baseline.
arXiv Detail & Related papers (2024-08-19T23:37:07Z) - Can I understand what I create? Self-Knowledge Evaluation of Large Language Models [31.85129258347539]
Large language models (LLMs) have achieved remarkable progress in linguistic tasks.
Inspired by Feynman's principle of understanding through creation, we introduce a self-knowledge evaluation framework.
arXiv Detail & Related papers (2024-06-10T09:53:54Z) - Enhancing LLM-Based Feedback: Insights from Intelligent Tutoring Systems and the Learning Sciences [0.0]
This work advocates careful and caring AIED research by going through previous research on feedback generation in ITS.
The main contributions of this paper include: an avocation of applying more cautious, theoretically grounded methods in feedback generation in the era of generative AI.
arXiv Detail & Related papers (2024-05-07T20:09:18Z) - Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches [69.73783026870998]
This work proposes a novel framework, ValueLex, to reconstruct Large Language Models' unique value system from scratch.
Based on Lexical Hypothesis, ValueLex introduces a generative approach to elicit diverse values from 30+ LLMs.
We identify three core value dimensions, Competence, Character, and Integrity, each with specific subdimensions, revealing that LLMs possess a structured, albeit non-human, value system.
arXiv Detail & Related papers (2024-04-19T09:44:51Z) - Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress.
Our investigation exposes a critical oversight in this belief.
By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z) - Evaluating Interventional Reasoning Capabilities of Large Language Models [58.52919374786108]
Large language models (LLMs) are used to automate decision-making tasks.
In this paper, we evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention.
We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types.
These benchmarks allow us to isolate the ability of LLMs to accurately predict changes resulting from their ability to memorize facts or find other shortcuts.
arXiv Detail & Related papers (2024-04-08T14:15:56Z) - F-Eval: Assessing Fundamental Abilities with Refined Evaluation Methods [102.98899881389211]
We propose F-Eval, a bilingual evaluation benchmark to evaluate the fundamental abilities, including expression, commonsense and logic.
For reference-free subjective tasks, we devise new evaluation methods, serving as alternatives to scoring by API models.
arXiv Detail & Related papers (2024-01-26T13:55:32Z) - SALMON: Self-Alignment with Instructable Reward Models [80.83323636730341]
This paper presents a novel approach, namely SALMON, to align base language models with minimal human supervision.
We develop an AI assistant named Dromedary-2 with only 6 exemplars for in-context learning and 31 human-defined principles.
arXiv Detail & Related papers (2023-10-09T17:56:53Z) - Principle-Driven Self-Alignment of Language Models from Scratch with
Minimal Human Supervision [84.31474052176343]
Recent AI-assistant agents, such as ChatGPT, rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback to align the output with human intentions.
This dependence can significantly constrain the true potential of AI-assistant agents due to the high cost of obtaining human supervision.
We propose a novel approach called SELF-ALIGN, which combines principle-driven reasoning and the generative power of LLMs for the self-alignment of AI agents with minimal human supervision.
arXiv Detail & Related papers (2023-05-04T17:59:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.