Related papers: A collection of principles for guiding and evaluating large language models

A collection of principles for guiding and evaluating large language models

URL: http://arxiv.org/abs/2312.10059v1
Date: Mon, 4 Dec 2023 12:06:12 GMT
Title: A collection of principles for guiding and evaluating large language models
Authors: Konstantin Hebenstreit, Robert Praas, Matthias Samwald
Abstract summary: We identify and curate a list of 220 principles from literature, and derive a set of 37 core principles organized into seven categories. We conduct a small-scale expert survey, eliciting the subjective importance experts assign to different principles. We envision that the development of a shared model of principles can serve multiple purposes.
Score: 5.412690203810726
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) demonstrate outstanding capabilities, but challenges remain regarding their ability to solve complex reasoning tasks, as well as their transparency, robustness, truthfulness, and ethical alignment. In this preliminary study, we compile a set of core principles for steering and evaluating the reasoning of LLMs by curating literature from several relevant strands of work: structured reasoning in LLMs, self-evaluation/self-reflection, explainability, AI system safety/security, guidelines for human critical thinking, and ethical/regulatory guidelines for AI. We identify and curate a list of 220 principles from literature, and derive a set of 37 core principles organized into seven categories: assumptions and perspectives, reasoning, information and evidence, robustness and security, ethics, utility, and implications. We conduct a small-scale expert survey, eliciting the subjective importance experts assign to different principles and lay out avenues for future work beyond our preliminary results. We envision that the development of a shared model of principles can serve multiple purposes: monitoring and steering models at inference time, improving model behavior during training, and guiding human evaluation of model reasoning.

Related papers

Alignment and Safety in Large Language Models: Safety Mechanisms, Training Paradigms, and Emerging Challenges [47.14342587731284]
This survey provides a comprehensive overview of alignment techniques, training protocols, and empirical findings in large language models (LLMs) alignment.<n>We analyze the development of alignment methods across diverse paradigms, characterizing the fundamental trade-offs between core alignment objectives.<n>We discuss state-of-the-art techniques, including Direct Preference Optimization (DPO), Constitutional AI, brain-inspired methods, and alignment uncertainty quantification (AUQ)
arXiv Detail & Related papers (2025-07-25T20:52:58Z)
The Convergent Ethics of AI? Analyzing Moral Foundation Priorities in Large Language Models with a Multi-Framework Approach [6.0972634521845475]
This paper introduces the Priorities in Reasoning and Intrinsic Moral Evaluation (PRIME) framework. PRIME is a comprehensive methodology for analyzing moral priorities across foundational ethical dimensions. We apply this framework to six leading large language models (LLMs) through a dual-protocol approach.
arXiv Detail & Related papers (2025-04-27T14:26:48Z)
Auditing the Ethical Logic of Generative AI Models [6.0972634521845475]
This paper introduces a five-dimensional audit model to evaluate the ethical logic of leading large language models (LLMs) We benchmark seven major LLMs finding that while models generally converge on ethical decisions, they vary in explanatory rigor and moral prioritization. Chain-of-Thought prompting and reasoning-optimized models significantly enhance performance on our audit metrics.
arXiv Detail & Related papers (2025-04-24T13:32:30Z)
Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilities [101.77467538102924]
Recent advancements in Large Reasoning Models (LRMs) have demonstrated remarkable performance in specialized reasoning tasks. We show that acquiring deliberative reasoning capabilities significantly reduces the foundational capabilities of LRMs. We demonstrate that adaptive reasoning -- employing modes like Zero-Thinking, Less-Thinking, and Summary-Thinking -- can effectively alleviate these drawbacks.
arXiv Detail & Related papers (2025-03-23T08:18:51Z)
Advancing Reasoning in Large Language Models: Promising Methods and Approaches [0.0]
Large Language Models (LLMs) have succeeded remarkably in various natural language processing (NLP) tasks. Their ability to perform complex reasoning-spanning logical deduction, mathematical problem-solving, commonsense inference, and multi-step reasoning-often falls short of human expectations. This survey provides a comprehensive review of emerging techniques enhancing reasoning in LLMs.
arXiv Detail & Related papers (2025-02-05T23:31:39Z)
A Unified Understanding and Evaluation of Steering Methods [17.420727709895736]
Steering methods provide a practical approach to controlling large language models by applying steering vectors to intermediate activations. Despite their growing importance, the field lacks a unified understanding and consistent evaluation across tasks and datasets. This paper introduces a unified framework for analyzing and evaluating steering methods, formalizing their core principles and offering theoretical insights into their effectiveness.
arXiv Detail & Related papers (2025-02-04T20:55:24Z)
IDEA: Enhancing the Rule Learning Ability of Large Language Model Agent through Induction, Deduction, and Abduction [3.961279440272764]
We introduce RULEARN, a novel benchmark designed to assess the rule-learning abilities of large language models in interactive settings. We propose IDEA, a novel reasoning framework that integrates the process of Induction, Deduction, and Abduction. Our evaluation of the IDEA framework, which involves five representative LLMs, demonstrates significant improvements over the baseline.
arXiv Detail & Related papers (2024-08-19T23:37:07Z)
Can I understand what I create? Self-Knowledge Evaluation of Large Language Models [31.85129258347539]
Large language models (LLMs) have achieved remarkable progress in linguistic tasks. Inspired by Feynman's principle of understanding through creation, we introduce a self-knowledge evaluation framework.
arXiv Detail & Related papers (2024-06-10T09:53:54Z)
MoralBench: Moral Evaluation of LLMs [34.43699121838648]
This paper introduces a novel benchmark designed to measure and compare the moral reasoning capabilities of large language models (LLMs) We present the first comprehensive dataset specifically curated to probe the moral dimensions of LLM outputs. Our methodology involves a multi-faceted approach, combining quantitative analysis with qualitative insights from ethics scholars to ensure a thorough evaluation of model performance.
arXiv Detail & Related papers (2024-06-06T18:15:01Z)
Enhancing LLM-Based Feedback: Insights from Intelligent Tutoring Systems and the Learning Sciences [0.0]
This work advocates careful and caring AIED research by going through previous research on feedback generation in ITS. The main contributions of this paper include: an avocation of applying more cautious, theoretically grounded methods in feedback generation in the era of generative AI.
arXiv Detail & Related papers (2024-05-07T20:09:18Z)
FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models [64.11333762954283]
This paper introduces FoundaBench, a pioneering benchmark designed to rigorously evaluate the fundamental knowledge capabilities of Chinese LLMs. We present an extensive evaluation of 12 state-of-the-art LLMs using FoundaBench, employing both traditional assessment methods and our CircularEval protocol to mitigate potential biases in model responses. Our results highlight the superior performance of models pre-trained on Chinese corpora, and reveal a significant disparity between models' reasoning and memory recall capabilities.
arXiv Detail & Related papers (2024-04-29T01:49:07Z)
Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches [69.73783026870998]
This work proposes a novel framework, ValueLex, to reconstruct Large Language Models' unique value system from scratch. Based on Lexical Hypothesis, ValueLex introduces a generative approach to elicit diverse values from 30+ LLMs. We identify three core value dimensions, Competence, Character, and Integrity, each with specific subdimensions, revealing that LLMs possess a structured, albeit non-human, value system.
arXiv Detail & Related papers (2024-04-19T09:44:51Z)
Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress. Our investigation exposes a critical oversight in this belief. By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z)
Evaluating Interventional Reasoning Capabilities of Large Language Models [58.52919374786108]
Large language models (LLMs) are used to automate decision-making tasks. In this paper, we evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention. We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types. These benchmarks allow us to isolate the ability of LLMs to accurately predict changes resulting from their ability to memorize facts or find other shortcuts.
arXiv Detail & Related papers (2024-04-08T14:15:56Z)
F-Eval: Assessing Fundamental Abilities with Refined Evaluation Methods [102.98899881389211]
We propose F-Eval, a bilingual evaluation benchmark to evaluate the fundamental abilities, including expression, commonsense and logic. For reference-free subjective tasks, we devise new evaluation methods, serving as alternatives to scoring by API models.
arXiv Detail & Related papers (2024-01-26T13:55:32Z)
Unpacking the Ethical Value Alignment in Big Models [46.560886177083084]
This paper provides an overview of the risks and challenges associated with big models, surveys existing AI ethics guidelines, and examines the ethical implications arising from the limitations of these models. We introduce a novel conceptual paradigm for aligning the ethical values of big models and discuss promising research directions for alignment criteria, evaluation, and method.
arXiv Detail & Related papers (2023-10-26T16:45:40Z)
SALMON: Self-Alignment with Instructable Reward Models [80.83323636730341]
This paper presents a novel approach, namely SALMON, to align base language models with minimal human supervision. We develop an AI assistant named Dromedary-2 with only 6 exemplars for in-context learning and 31 human-defined principles.
arXiv Detail & Related papers (2023-10-09T17:56:53Z)
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision [84.31474052176343]
Recent AI-assistant agents, such as ChatGPT, rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback to align the output with human intentions. This dependence can significantly constrain the true potential of AI-assistant agents due to the high cost of obtaining human supervision. We propose a novel approach called SELF-ALIGN, which combines principle-driven reasoning and the generative power of LLMs for the self-alignment of AI agents with minimal human supervision.
arXiv Detail & Related papers (2023-05-04T17:59:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.