Related papers: LLMs for Explainable Business Decision-Making: A Reinforcement Learning Fine-Tuning Approach

LLMs for Explainable Business Decision-Making: A Reinforcement Learning Fine-Tuning Approach

URL: http://arxiv.org/abs/2601.04208v1
Date: Wed, 10 Dec 2025 04:16:31 GMT
Title: LLMs for Explainable Business Decision-Making: A Reinforcement Learning Fine-Tuning Approach
Authors: Xiang Cheng, Wen Wang, Anindya Ghose,
Abstract summary: Large language models (LLMs) present an opportunity to generate natural-language explanations.<n>We introduce LEXMA, a reinforcement-learning-based fine-tuning framework that produces audience-appropriate explanations.<n>LEXMA yields significant improvements in predictive performance compared with other LLM baselines.
Score: 9.942833203981069
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Artificial Intelligence (AI) models increasingly drive high-stakes consumer interactions, yet their decision logic often remains opaque. Prevailing explainable AI techniques rely on post hoc numerical feature attributions, which fail to provide coherent narratives behind model decisions. Large language models (LLMs) present an opportunity to generate natural-language explanations, but three design challenges remain unresolved: explanations must be both decision-correct and faithful to the factors that drive the prediction; they should be able to serve multiple audiences without shifting the underlying decision rule; and they should be trained in a label-efficient way that does not depend on large corpora of human-scored explanations. To address these challenges, we introduce LEXMA (LLM-based EXplanations for Multi-Audience decisions), a reinforcement-learning-based fine-tuning framework that produces narrative-driven, audience-appropriate explanations. LEXMA combines reflection-augmented supervised fine-tuning with two stages of Group Relative Policy Optimization (GRPO). Specifically, it fine-tunes two separate parameter sets to improve decision correctness and satisfy stylistic requirements for different audiences, using reward signals that do not rely on human-annotated explanations. We instantiate LEXMA in the context of mortgage approval decisions. Results demonstrate that LEXMA yields significant improvements in predictive performance compared with other LLM baselines. Moreover, human evaluations show that expert-facing explanations generated by our approach are more risk-focused, and consumer-facing explanations are clearer, more actionable, and more polite. Our study contributes a cost-efficient, systematic LLM fine-tuning approach to enhance explanation quality for business decisions, offering strong potential for scalable deployment of transparent AI systems.

Related papers

From <Answer> to <Think>: Multidimensional Supervision of Reasoning Process for LLM Optimization [62.07990937720985]
Dimension-level Reward Model (DRM) is a new supervision framework for Large Language Models.<n>DRM evaluates the quality of a reasoning process along three fundamental, complementary, and interpretable dimensions.<n> Experimental results show that DRM provides effective supervision signals, guides the optimization of LLMs and enhances their reasoning ability.
arXiv Detail & Related papers (2025-10-13T14:29:15Z)
ReaLM: Reflection-Enhanced Autonomous Reasoning with Small Language Models [76.28894983518164]
Small Language Models (SLMs) are a cost-effective alternative to Large Language Models (LLMs)<n>They often struggle with complex reasoning due to their limited capacity and a tendency to produce mistakes or inconsistent answers.<n>We introduce ReaLM, a reinforcement learning framework for robust and self-sufficient reasoning in vertical domains.
arXiv Detail & Related papers (2025-08-17T14:50:23Z)
Are Today's LLMs Ready to Explain Well-Being Concepts? [17.02052397388858]
We construct a large-scale dataset comprising 43,880 explanations of 2,194 well-being concepts.<n>We introduce a principle-guided LLM-as-a-judge evaluation framework, employing dual judges to assess explanation quality.<n>We show that fine-tuning an open-source LLM using Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) can significantly enhance the quality of generated explanations.
arXiv Detail & Related papers (2025-08-06T00:45:02Z)
Feedback-Induced Performance Decline in LLM-Based Decision-Making [6.5990946334144756]
Large Language Models (LLMs) can extract context from natural language problem descriptions.<n>This paper studies the behaviour of these models within a Markov Decision Process (MDPs)
arXiv Detail & Related papers (2025-07-20T10:38:56Z)
DecisionFlow: Advancing Large Language Model as Principled Decision Maker [49.088778182807395]
DecisionFlow is a novel decision modeling framework that guides models to reason over structured representations of actions, attributes, and constraints.<n>Rather than predicting answers directly from prompts, DecisionFlow builds a semantically grounded decision space and infers a latent utility function.<n> Empirical results show that DecisionFlow achieves up to 30% accuracy gains over strong prompting baselines.
arXiv Detail & Related papers (2025-05-27T16:23:53Z)
LLMs for Explainable AI: A Comprehensive Survey [0.7373617024876725]
Large Language Models (LLMs) offer a promising approach to enhancing Explainable AI (XAI)<n>LLMs transform complex machine learning outputs into easy-to-understand narratives.<n>LLMs can bridge the gap between sophisticated model behavior and human interpretability.
arXiv Detail & Related papers (2025-03-31T18:19:41Z)
Decision Information Meets Large Language Models: The Future of Explainable Operations Research [8.228707152775964]
We propose a comprehensive framework, Explainable Operations Research (EOR), emphasizing actionable and understandable explanations.<n>The core of EOR is the concept of Decision Information, which emerges from what-if analysis and focuses on evaluating the impact of complex constraints.<n>We introduce the first industrial benchmark to rigorously evaluate the effectiveness of explanations and analyses in Operations Research.
arXiv Detail & Related papers (2025-02-14T08:25:06Z)
Reinforcing Thinking through Reasoning-Enhanced Reward Models [6.636512424910708]
Large Language Models (LLMs) exhibit great potential in complex multi-step reasoning through inference-time thinking.<n>LLMs struggle with deciding when to stop thinking due to limited self-awareness about their knowledge boundaries.<n>This work addresses these challenges by distilling the LLM's own reasoning processes into synthetic behavioral data.
arXiv Detail & Related papers (2024-12-31T04:50:15Z)
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.<n>We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.<n>Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z)
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning [77.72128397088409]
We show that most prevalent MLLMs can be easily fooled by the introduction of a presupposition into the question.<n>We also propose a novel reinforcement learning paradigm to encourage the model to actively perform composite deduction.
arXiv Detail & Related papers (2024-04-19T15:53:27Z)
Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models [54.21695754082441]
We propose a framework to teach Large Language Models (LLMs) to generate explainable stock predictions. A reflective agent learns how to explain past stock movements through self-reasoning, while the PPO trainer trains the model to generate the most likely explanations. Our framework can outperform both traditional deep-learning and LLM methods in prediction accuracy and Matthews correlation coefficient.
arXiv Detail & Related papers (2024-02-06T03:18:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.