Related papers: Aligning Large Language Model Agents with Rational and Moral Preferences: A Supervised Fine-Tuning Approach

Aligning Large Language Model Agents with Rational and Moral Preferences: A Supervised Fine-Tuning Approach

URL: http://arxiv.org/abs/2507.20796v1
Date: Mon, 28 Jul 2025 13:05:04 GMT
Title: Aligning Large Language Model Agents with Rational and Moral Preferences: A Supervised Fine-Tuning Approach
Authors: Wei Lu, Daniel L. Chen, Christian B. Hansen,
Abstract summary: We evaluate large language model (LLM) preferences using canonical economic games.<n>Models like GPT-4o show excessive cooperation and limited incentive sensitivity, while reasoning models, such as o3-mini, align more consistently with payoff-maximizing strategies.<n>We propose a supervised fine-tuning pipeline that uses synthetic datasets derived from economic reasoning to align LLM agents with economic preferences.
Score: 4.389938747401259
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Understanding how large language model (LLM) agents behave in strategic interactions is essential as these systems increasingly participate autonomously in economically and morally consequential decisions. We evaluate LLM preferences using canonical economic games, finding substantial deviations from human behavior. Models like GPT-4o show excessive cooperation and limited incentive sensitivity, while reasoning models, such as o3-mini, align more consistently with payoff-maximizing strategies. We propose a supervised fine-tuning pipeline that uses synthetic datasets derived from economic reasoning to align LLM agents with economic preferences, focusing on two stylized preference structures. In the first, utility depends only on individual payoffs (homo economicus), while utility also depends on a notion of Kantian universalizability in the second preference structure (homo moralis). We find that fine-tuning based on small datasets shifts LLM agent behavior toward the corresponding economic agent. We further assess the fine-tuned agents' behavior in two applications: Moral dilemmas involving autonomous vehicles and algorithmic pricing in competitive markets. These examples illustrate how different normative objectives embedded via realizations from structured preference structures can influence market and moral outcomes. This work contributes a replicable, cost-efficient, and economically grounded pipeline to align AI preferences using moral-economic principles.

Related papers

Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs [25.067282214293904]
This paper explores whether post-training techniques, specifically Supervised Fine-Tuning (SFT) and Reinforcement Learning with Verifiable Rewards (RLVR), can effectively $textitgeneralize$ to multi-agent scenarios.<n>We use economic reasoning as a testbed, leveraging its strong foundations in mathematics and game theory.<n> Comprehensive evaluation on economic reasoning benchmarks and multi-agent games reveals clear improvements in structured reasoning and economic rationality.
arXiv Detail & Related papers (2025-05-31T14:22:40Z)
The Moral Mind(s) of Large Language Models [0.0]
We show that large language models (LLMs) exhibit a consistent structure of moral preferences guiding their decisions.<n>Using a probabilistic rationality test, we found that at least one model from each major provider exhibited behavior consistent with approximately stable moral preferences.<n>We then estimated these utility functions and found that most models cluster around neutral moral stances.
arXiv Detail & Related papers (2024-11-19T15:40:16Z)
GLEE: A Unified Framework and Benchmark for Language-based Economic Environments [19.366120861935105]
Large Language Models (LLMs) show significant potential in economic and strategic interactions.<n>These questions become crucial concerning the economic and societal implications of integrating LLM-based agents into real-world data-driven systems.<n>We introduce a benchmark for standardizing research on two-player, sequential, language-based games.
arXiv Detail & Related papers (2024-10-07T17:55:35Z)
Moral Alignment for LLM Agents [3.7414804164475983]
We introduce the design of reward functions that explicitly and transparently encode core human values.<n>We evaluate our approach using the traditional philosophical frameworks of Deontological Ethics and Utilitarianism.<n>We show how moral fine-tuning can be deployed to enable an agent to unlearn a previously developed selfish strategy.
arXiv Detail & Related papers (2024-10-02T15:09:36Z)
LLM economicus? Mapping the Behavioral Biases of LLMs via Utility Theory [20.79199807796242]
Utility theory is an approach to evaluate the economic biases of large language models. We find that the economic behavior of current LLMs is neither entirely human-like nor entirely economicus-like.
arXiv Detail & Related papers (2024-08-05T19:00:43Z)
Exploring and steering the moral compass of Large Language Models [55.2480439325792]
Large Language Models (LLMs) have become central to advancing automation and decision-making across various sectors. This study proposes a comprehensive comparative analysis of the most advanced LLMs to assess their moral profiles.
arXiv Detail & Related papers (2024-05-27T16:49:22Z)
Heterogeneous Value Alignment Evaluation for Large Language Models [91.96728871418]
Large Language Models (LLMs) have made it crucial to align their values with those of humans. We propose a Heterogeneous Value Alignment Evaluation (HVAE) system to assess the success of aligning LLMs with heterogeneous values.
arXiv Detail & Related papers (2023-05-26T02:34:20Z)
Structured Dynamic Pricing: Optimal Regret in a Global Shrinkage Model [50.06663781566795]
We consider a dynamic model with the consumers' preferences as well as price sensitivity varying over time. We measure the performance of a dynamic pricing policy via regret, which is the expected revenue loss compared to a clairvoyant that knows the sequence of model parameters in advance. Our regret analysis results not only demonstrate optimality of the proposed policy but also show that for policy planning it is essential to incorporate available structural information.
arXiv Detail & Related papers (2023-03-28T00:23:23Z)
Finding General Equilibria in Many-Agent Economic Simulations Using Deep Reinforcement Learning [72.23843557783533]
We show that deep reinforcement learning can discover stable solutions that are epsilon-Nash equilibria for a meta-game over agent types. Our approach is more flexible and does not need unrealistic assumptions, e.g., market clearing. We demonstrate our approach in real-business-cycle models, a representative family of DGE models, with 100 worker-consumers, 10 firms, and a government who taxes and redistributes.
arXiv Detail & Related papers (2022-01-03T17:00:17Z)
The AI Economist: Optimal Economic Policy Design via Two-level Deep Reinforcement Learning [126.37520136341094]
We show that machine-learning-based economic simulation is a powerful policy and mechanism design framework. The AI Economist is a two-level, deep RL framework that trains both agents and a social planner who co-adapt. In simple one-step economies, the AI Economist recovers the optimal tax policy of economic theory.
arXiv Detail & Related papers (2021-08-05T17:42:35Z)
Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions [80.49176924360499]
We establish a framework for directing a society of simple, specialized, self-interested agents to solve sequential decision problems. We derive a class of decentralized reinforcement learning algorithms. We demonstrate the potential advantages of a society's inherent modular structure for more efficient transfer learning.
arXiv Detail & Related papers (2020-07-05T16:41:09Z)
The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies [119.07163415116686]
We train social planners that discover tax policies that can effectively trade-off economic equality and productivity. We present an economic simulation environment that features competitive pressures and market dynamics. We show that AI-driven tax policies improve the trade-off between equality and productivity by 16% over baseline policies.
arXiv Detail & Related papers (2020-04-28T06:57:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.