Machine Learning based Enterprise Financial Audit Framework and High Risk Identification
- URL: http://arxiv.org/abs/2507.06266v1
- Date: Tue, 08 Jul 2025 00:22:49 GMT
- Title: Machine Learning based Enterprise Financial Audit Framework and High Risk Identification
- Authors: Tingyu Yuan, Xi Zhang, Xuanjing Chen,
- Abstract summary: This study proposes an AI-driven framework for enterprise financial audits and high-risk identification.<n>Using a dataset from the Big Four accounting firms (EY, PwC, Deloitte, KPMG) from 2020 to 2025, the research examines trends in risk assessment, compliance violations, and fraud detection.<n>The study recommends adopting Random Forest as a core model, enhancing features via engineering, and implementing real-time risk monitoring.
- Score: 6.433444278723668
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the face of global economic uncertainty, financial auditing has become essential for regulatory compliance and risk mitigation. Traditional manual auditing methods are increasingly limited by large data volumes, complex business structures, and evolving fraud tactics. This study proposes an AI-driven framework for enterprise financial audits and high-risk identification, leveraging machine learning to improve efficiency and accuracy. Using a dataset from the Big Four accounting firms (EY, PwC, Deloitte, KPMG) from 2020 to 2025, the research examines trends in risk assessment, compliance violations, and fraud detection. The dataset includes key indicators such as audit project counts, high-risk cases, fraud instances, compliance breaches, employee workload, and client satisfaction, capturing both audit behaviors and AI's impact on operations. To build a robust risk prediction model, three algorithms - Support Vector Machine (SVM), Random Forest (RF), and K-Nearest Neighbors (KNN) - are evaluated. SVM uses hyperplane optimization for complex classification, RF combines decision trees to manage high-dimensional, nonlinear data with resistance to overfitting, and KNN applies distance-based learning for flexible performance. Through hierarchical K-fold cross-validation and evaluation using F1-score, accuracy, and recall, Random Forest achieves the best performance, with an F1-score of 0.9012, excelling in identifying fraud and compliance anomalies. Feature importance analysis reveals audit frequency, past violations, employee workload, and client ratings as key predictors. The study recommends adopting Random Forest as a core model, enhancing features via engineering, and implementing real-time risk monitoring. This research contributes valuable insights into using machine learning for intelligent auditing and risk management in modern enterprises.
Related papers
- Conformal Thinking: Risk Control for Reasoning on a Compute Budget [60.65072883773352]
Reasoning Large Language Models (LLMs) enable test-time scaling, with dataset-level accuracy improving as the token budget increases.<n>We re-frame the budget setting problem as risk control, limiting the error rate while minimizing compute.<n>Our framework introduces an upper threshold that stops reasoning when the model is confident and a novel lower threshold that preemptively stops unsolvable instances.
arXiv Detail & Related papers (2026-02-03T18:17:22Z) - Enhancing Credit Risk Prediction: A Meta-Learning Framework Integrating Baseline Models, LASSO, and ECOC for Superior Accuracy [7.254744067646655]
This research proposes a comprehensive meta-learning framework that synthesizes multiple complementary models.<n>We implement Permutation Feature Importance analysis for each prediction class across all constituent models.<n>Results demonstrate that our framework significantly enhances the accuracy of financial entity classification.
arXiv Detail & Related papers (2025-09-26T14:09:04Z) - FinHEAR: Human Expertise and Adaptive Risk-Aware Temporal Reasoning for Financial Decision-Making [58.04602111184477]
FinHEAR is a framework for Human Expertise and Adaptive Risk-aware reasoning.<n>It orchestrates specialized agents to analyze historical trends, interpret current events, and retrieve expert-informed precedents.<n> Empirical results on financial datasets show that FinHEAR consistently outperforms strong baselines across trend prediction and trading tasks.
arXiv Detail & Related papers (2025-06-10T04:06:51Z) - Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs [58.24692529185971]
We introduce a comprehensive auditing framework for unlearning evaluation comprising three benchmark datasets, six unlearning algorithms, and five prompt-based auditing methods.<n>We evaluate the effectiveness and robustness of different unlearning strategies.
arXiv Detail & Related papers (2025-05-29T09:19:07Z) - Evaluating Supervised Learning Models for Fraud Detection: A Comparative Study of Classical and Deep Architectures on Imbalanced Transaction Data [2.5670390559986442]
Fraud detection remains a critical task in high-stakes domains such as finance and e-commerce.<n>We systematically compare the performance of four supervised learning models on a large-scale, highly imbalanced online transaction dataset.
arXiv Detail & Related papers (2025-05-28T16:08:04Z) - From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning [41.02508512078575]
We take mathematical reasoning as a case study and conduct a comprehensive analysis of various verifiers in both static evaluation and RL training scenarios.<n>First, we find that current open-source rule-based verifiers often fail to recognize equivalent answers presented in different formats, resulting in non-negligible false negative rates.<n>We investigate model-based verifiers as a potential solution to address these limitations.
arXiv Detail & Related papers (2025-05-28T10:28:41Z) - PredictaBoard: Benchmarking LLM Score Predictability [50.47497036981544]
Large Language Models (LLMs) often fail unpredictably.<n>This poses a significant challenge to ensuring their safe deployment.<n>We present PredictaBoard, a novel collaborative benchmarking framework.
arXiv Detail & Related papers (2025-02-20T10:52:38Z) - AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons [62.374792825813394]
This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability.<n>The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories.
arXiv Detail & Related papers (2025-02-19T05:58:52Z) - The Dual-use Dilemma in LLMs: Do Empowering Ethical Capacities Make a Degraded Utility? [54.18519360412294]
Large Language Models (LLMs) must balance between rejecting harmful requests for safety and accommodating legitimate ones for utility.<n>This paper presents a Direct Preference Optimization (DPO) based alignment framework that achieves better overall performance.<n>We analyze experimental results obtained from testing DeepSeek-R1 on our benchmark and reveal the critical ethical concerns raised by this highly acclaimed model.
arXiv Detail & Related papers (2025-01-20T06:35:01Z) - Explainable AI for Fraud Detection: An Attention-Based Ensemble of CNNs, GNNs, and A Confidence-Driven Gating Mechanism [5.486205584465161]
This study presents a new stacking-based approach for CCF detection by adding two extra layers to the usual classification process.<n>In the attention layer, we combine soft outputs from a convolutional neural network (CNN) and a recurrent neural network (RNN) using the dependent ordered weighted averaging (DOWA) operator.<n>In the confidence-based layer, we select whichever aggregate (DOWA or IOWA) shows lower uncertainty to feed into a meta-learner.<n>Experiments on three datasets show that our method achieves high accuracy and robust generalization, making it effective for CCF detection.
arXiv Detail & Related papers (2024-10-01T09:56:23Z) - EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [53.717918131568936]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction.<n>Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results.<n>However, the deployment of these agents in physical environments presents significant safety challenges.<n>This study introduces EARBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z) - A Customer Level Fraudulent Activity Detection Benchmark for Enhancing Machine Learning Model Research and Evaluation [0.4681661603096334]
This study introduces a benchmark that contains structured datasets specifically designed for customer-level fraud detection.
The benchmark not only adheres to strict privacy guidelines to ensure user confidentiality but also provides a rich source of information by encapsulating customer-centric features.
arXiv Detail & Related papers (2024-04-23T04:57:44Z) - A machine learning workflow to address credit default prediction [0.44943951389724796]
Credit default prediction (CDP) plays a crucial role in assessing the creditworthiness of individuals and businesses.
We propose a workflow-based approach to improve CDP, which refers to the task of assessing the probability that a borrower will default on his or her credit obligations.
arXiv Detail & Related papers (2024-03-06T15:30:41Z) - TrustFed: A Reliable Federated Learning Framework with Malicious-Attack
Resistance [8.924352407824566]
Federated learning (FL) enables collaborative learning among multiple clients while ensuring individual data privacy.
In this paper, we propose a hierarchical audit-based FL (HiAudit-FL) framework to enhance the reliability and security of the learning process.
Our simulation results demonstrate that HiAudit-FL can effectively identify and handle potential malicious users accurately, with small system overhead.
arXiv Detail & Related papers (2023-12-06T13:56:45Z) - Empirical study of Machine Learning Classifier Evaluation Metrics
behavior in Massively Imbalanced and Noisy data [0.0]
We develop a theoretical foundation to model human annotation errors and extreme imbalance typical in real world fraud detection data sets.
We demonstrate that a combined F1 score and g-mean, in that specific order, is the best evaluation metric for typical imbalanced fraud detection model classification.
arXiv Detail & Related papers (2022-08-25T07:30:31Z) - Explanations of Machine Learning predictions: a mandatory step for its
application to Operational Processes [61.20223338508952]
Credit Risk Modelling plays a paramount role.
Recent machine and deep learning techniques have been applied to the task.
We suggest to use LIME technique to tackle the explainability problem in this field.
arXiv Detail & Related papers (2020-12-30T10:27:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.