Related papers: The Ethical Compass of the Machine: Evaluating Large Language Models for Decision Support in Construction Project Management

The Ethical Compass of the Machine: Evaluating Large Language Models for Decision Support in Construction Project Management

URL: http://arxiv.org/abs/2509.04505v1
Date: Tue, 02 Sep 2025 13:50:36 GMT
Title: The Ethical Compass of the Machine: Evaluating Large Language Models for Decision Support in Construction Project Management
Authors: Somtochukwu Azie, Yiping Meng,
Abstract summary: This study aims to critically evaluate the ethical viability and reliability of Large Language Models (LLMs)<n>It is one of the first studies to empirically test the ethical reasoning of LLMs within the construction domain.
Score: 0.38196178521289315
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The integration of Artificial Intelligence (AI) into construction project management (CPM) is accelerating, with Large Language Models (LLMs) emerging as accessible decision-support tools. This study aims to critically evaluate the ethical viability and reliability of LLMs when applied to the ethically sensitive, high-risk decision-making contexts inherent in CPM. A mixed-methods research design was employed, involving the quantitative performance testing of two leading LLMs against twelve real-world ethical scenarios using a novel Ethical Decision Support Assessment Checklist (EDSAC), and qualitative analysis of semi-structured interviews with 12 industry experts to capture professional perceptions. The findings reveal that while LLMs demonstrate adequate performance in structured domains such as legal compliance, they exhibit significant deficiencies in handling contextual nuance, ensuring accountability, and providing transparent reasoning. Stakeholders expressed considerable reservations regarding the autonomous use of AI for ethical judgments, strongly advocating for robust human-in-the-loop oversight. To our knowledge, this is one of the first studies to empirically test the ethical reasoning of LLMs within the construction domain. It introduces the EDSAC framework as a replicable methodology and provides actionable recommendations, emphasising that LLMs are currently best positioned as decision-support aids rather than autonomous ethical agents.

Related papers

Mirror: A Multi-Agent System for AI-Assisted Ethics Review [104.3684024153469]
Mirror is an agentic framework for AI-assisted ethical review.<n>It integrates ethical reasoning, structured rule interpretation, and multi-agent deliberation within a unified architecture.
arXiv Detail & Related papers (2026-02-09T03:38:55Z)
Advancing Automated Ethical Profiling in SE: a Zero-Shot Evaluation of LLM Reasoning [1.389448546196977]
Large Language Models (LLMs) are increasingly integrated into software engineering (SE) tools for tasks that extend beyond code synthesis.<n>We present a fully automated framework for assessing ethical reasoning capabilities across 16 LLMs in a zero-shot setting.
arXiv Detail & Related papers (2025-10-01T13:28:26Z)
How Good are Foundation Models in Step-by-Step Embodied Reasoning? [79.15268080287505]
Embodied agents must make decisions that are safe, spatially coherent, and grounded in context.<n>Recent advances in large multimodal models have shown promising capabilities in visual understanding and language generation.<n>Our benchmark includes over 1.1k samples with detailed step-by-step reasoning across 10 tasks and 8 embodiments.
arXiv Detail & Related papers (2025-09-18T17:56:30Z)
Using Large Language Models for Legal Decision-Making in Austrian Value-Added Tax Law: An Experimental Study [0.0]
This paper provides an experimental evaluation of the capability of large language models (LLMs) to assist in legal decision-making within the framework of Austrian and European Union value-added tax (VAT) law.
arXiv Detail & Related papers (2025-07-11T10:19:56Z)
The AI Imperative: Scaling High-Quality Peer Review in Machine Learning [49.87236114682497]
We argue that AI-assisted peer review must become an urgent research and infrastructure priority.<n>We propose specific roles for AI in enhancing factual verification, guiding reviewer performance, assisting authors in quality improvement, and supporting ACs in decision-making.
arXiv Detail & Related papers (2025-06-09T18:37:14Z)
LLM-based HSE Compliance Assessment: Benchmark, Performance, and Advancements [26.88382777632026]
HSE-Bench is the first benchmark dataset designed to evaluate the HSE compliance assessment capabilities of large language models.<n>It comprises over 1,000 manually curated questions drawn from regulations, court cases, safety exams, and fieldwork videos.<n>We conduct evaluations on different prompting strategies and more than 10 LLMs, including foundation models, reasoning models and multimodal vision models.
arXiv Detail & Related papers (2025-05-29T01:02:53Z)
Development of Application-Specific Large Language Models to Facilitate Research Ethics Review [0.0]
We propose application-specific large language models (LLMs) to facilitate IRB review processes.<n>These IRB-specific LLMs would be fine-tuned on IRB-specific literature and institutional datasets.<n>We outline potential applications, including pre-review screening, preliminary analysis, consistency checking, and decision support.
arXiv Detail & Related papers (2025-01-18T12:05:05Z)
Can We Trust AI Agents? A Case Study of an LLM-Based Multi-Agent System for Ethical AI [10.084913433923566]
AI-based systems impact millions by supporting diverse tasks but face issues like misinformation, bias, and misuse.<n>This study examines the use of Large Language Models (LLM) for AI ethics in practice.<n>We design a prototype, where agents engage in structured discussions on real-world AI ethics issues from the AI Incident Database.
arXiv Detail & Related papers (2024-10-25T20:17:59Z)
Cognitive LLMs: Towards Integrating Cognitive Architectures and Large Language Models for Manufacturing Decision-making [51.737762570776006]
LLM-ACTR is a novel neuro-symbolic architecture that provides human-aligned and versatile decision-making. Our framework extracts and embeds knowledge of ACT-R's internal decision-making process as latent neural representations. Our experiments on novel Design for Manufacturing tasks show both improved task performance as well as improved grounded decision-making capability.
arXiv Detail & Related papers (2024-08-17T11:49:53Z)
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.<n>We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.<n>Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z)
Rational Decision-Making Agent with Internalized Utility Judgment [88.01612847081677]
Large language models (LLMs) have demonstrated remarkable advancements and have attracted significant efforts to develop LLMs into agents capable of executing intricate multi-step decision-making tasks beyond traditional NLP applications.<n>This paper proposes RadAgent, which fosters the development of its rationality through an iterative framework involving Experience Exploration and Utility Learning.<n> Experimental results on the ToolBench dataset demonstrate RadAgent's superiority over baselines, achieving over 10% improvement in Pass Rate on diverse tasks.
arXiv Detail & Related papers (2023-08-24T03:11:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.