Machine Learning based Enterprise Financial Audit Framework and High Risk Identification
- URL: http://arxiv.org/abs/2507.06266v1
- Date: Tue, 08 Jul 2025 00:22:49 GMT
- Title: Machine Learning based Enterprise Financial Audit Framework and High Risk Identification
- Authors: Tingyu Yuan, Xi Zhang, Xuanjing Chen,
- Abstract summary: This study proposes an AI-driven framework for enterprise financial audits and high-risk identification.<n>Using a dataset from the Big Four accounting firms (EY, PwC, Deloitte, KPMG) from 2020 to 2025, the research examines trends in risk assessment, compliance violations, and fraud detection.<n>The study recommends adopting Random Forest as a core model, enhancing features via engineering, and implementing real-time risk monitoring.
- Score: 6.433444278723668
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the face of global economic uncertainty, financial auditing has become essential for regulatory compliance and risk mitigation. Traditional manual auditing methods are increasingly limited by large data volumes, complex business structures, and evolving fraud tactics. This study proposes an AI-driven framework for enterprise financial audits and high-risk identification, leveraging machine learning to improve efficiency and accuracy. Using a dataset from the Big Four accounting firms (EY, PwC, Deloitte, KPMG) from 2020 to 2025, the research examines trends in risk assessment, compliance violations, and fraud detection. The dataset includes key indicators such as audit project counts, high-risk cases, fraud instances, compliance breaches, employee workload, and client satisfaction, capturing both audit behaviors and AI's impact on operations. To build a robust risk prediction model, three algorithms - Support Vector Machine (SVM), Random Forest (RF), and K-Nearest Neighbors (KNN) - are evaluated. SVM uses hyperplane optimization for complex classification, RF combines decision trees to manage high-dimensional, nonlinear data with resistance to overfitting, and KNN applies distance-based learning for flexible performance. Through hierarchical K-fold cross-validation and evaluation using F1-score, accuracy, and recall, Random Forest achieves the best performance, with an F1-score of 0.9012, excelling in identifying fraud and compliance anomalies. Feature importance analysis reveals audit frequency, past violations, employee workload, and client ratings as key predictors. The study recommends adopting Random Forest as a core model, enhancing features via engineering, and implementing real-time risk monitoring. This research contributes valuable insights into using machine learning for intelligent auditing and risk management in modern enterprises.
Related papers
- Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs [58.24692529185971]
We introduce a comprehensive auditing framework for unlearning evaluation comprising three benchmark datasets, six unlearning algorithms, and five prompt-based auditing methods.<n>We evaluate the effectiveness and robustness of different unlearning strategies.
arXiv Detail & Related papers (2025-05-29T09:19:07Z) - Evaluating Supervised Learning Models for Fraud Detection: A Comparative Study of Classical and Deep Architectures on Imbalanced Transaction Data [2.5670390559986442]
Fraud detection remains a critical task in high-stakes domains such as finance and e-commerce.<n>We systematically compare the performance of four supervised learning models on a large-scale, highly imbalanced online transaction dataset.
arXiv Detail & Related papers (2025-05-28T16:08:04Z) - PredictaBoard: Benchmarking LLM Score Predictability [50.47497036981544]
Large Language Models (LLMs) often fail unpredictably.<n>This poses a significant challenge to ensuring their safe deployment.<n>We present PredictaBoard, a novel collaborative benchmarking framework.
arXiv Detail & Related papers (2025-02-20T10:52:38Z) - AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons [62.374792825813394]
This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability.<n>The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories.
arXiv Detail & Related papers (2025-02-19T05:58:52Z) - The Dual-use Dilemma in LLMs: Do Empowering Ethical Capacities Make a Degraded Utility? [54.18519360412294]
Large Language Models (LLMs) must balance between rejecting harmful requests for safety and accommodating legitimate ones for utility.<n>This paper presents a Direct Preference Optimization (DPO) based alignment framework that achieves better overall performance.<n>We analyze experimental results obtained from testing DeepSeek-R1 on our benchmark and reveal the critical ethical concerns raised by this highly acclaimed model.
arXiv Detail & Related papers (2025-01-20T06:35:01Z) - Explainable AI for Fraud Detection: An Attention-Based Ensemble of CNNs, GNNs, and A Confidence-Driven Gating Mechanism [5.486205584465161]
This study presents a new stacking-based approach for CCF detection by adding two extra layers to the usual classification process.<n>In the attention layer, we combine soft outputs from a convolutional neural network (CNN) and a recurrent neural network (RNN) using the dependent ordered weighted averaging (DOWA) operator.<n>In the confidence-based layer, we select whichever aggregate (DOWA or IOWA) shows lower uncertainty to feed into a meta-learner.<n>Experiments on three datasets show that our method achieves high accuracy and robust generalization, making it effective for CCF detection.
arXiv Detail & Related papers (2024-10-01T09:56:23Z) - EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [53.717918131568936]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction.<n>Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results.<n>However, the deployment of these agents in physical environments presents significant safety challenges.<n>This study introduces EARBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z) - A Customer Level Fraudulent Activity Detection Benchmark for Enhancing Machine Learning Model Research and Evaluation [0.4681661603096334]
This study introduces a benchmark that contains structured datasets specifically designed for customer-level fraud detection.
The benchmark not only adheres to strict privacy guidelines to ensure user confidentiality but also provides a rich source of information by encapsulating customer-centric features.
arXiv Detail & Related papers (2024-04-23T04:57:44Z) - A machine learning workflow to address credit default prediction [0.44943951389724796]
Credit default prediction (CDP) plays a crucial role in assessing the creditworthiness of individuals and businesses.
We propose a workflow-based approach to improve CDP, which refers to the task of assessing the probability that a borrower will default on his or her credit obligations.
arXiv Detail & Related papers (2024-03-06T15:30:41Z) - TrustFed: A Reliable Federated Learning Framework with Malicious-Attack
Resistance [8.924352407824566]
Federated learning (FL) enables collaborative learning among multiple clients while ensuring individual data privacy.
In this paper, we propose a hierarchical audit-based FL (HiAudit-FL) framework to enhance the reliability and security of the learning process.
Our simulation results demonstrate that HiAudit-FL can effectively identify and handle potential malicious users accurately, with small system overhead.
arXiv Detail & Related papers (2023-12-06T13:56:45Z) - Empirical study of Machine Learning Classifier Evaluation Metrics
behavior in Massively Imbalanced and Noisy data [0.0]
We develop a theoretical foundation to model human annotation errors and extreme imbalance typical in real world fraud detection data sets.
We demonstrate that a combined F1 score and g-mean, in that specific order, is the best evaluation metric for typical imbalanced fraud detection model classification.
arXiv Detail & Related papers (2022-08-25T07:30:31Z) - Explanations of Machine Learning predictions: a mandatory step for its
application to Operational Processes [61.20223338508952]
Credit Risk Modelling plays a paramount role.
Recent machine and deep learning techniques have been applied to the task.
We suggest to use LIME technique to tackle the explainability problem in this field.
arXiv Detail & Related papers (2020-12-30T10:27:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.