Related papers: Smart Audit System Empowered by LLM

Smart Audit System Empowered by LLM

URL: http://arxiv.org/abs/2410.07677v1
Date: Thu, 10 Oct 2024 07:36:15 GMT
Title: Smart Audit System Empowered by LLM
Authors: Xu Yao, Xiaoxu Wu, Xi Li, Huan Xu, Chenlei Li, Ping Huang, Si Li, Xiaoning Ma, Jiulong Shan,
Abstract summary: We propose a smart audit system empowered by large language models (LLMs) Our approach introduces three innovations: a dynamic risk assessment model that streamlines audit procedures; a manufacturing compliance copilot that enhances data processing, retrieval, and evaluation; and a Re-act framework commonality analysis agent that provides real-time, customized analysis. These enhancements elevate audit efficiency and effectiveness, with testing scenarios demonstrating an improvement of over 24%.
Score: 25.2545519709246
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Manufacturing quality audits are pivotal for ensuring high product standards in mass production environments. Traditional auditing processes, however, are labor-intensive and reliant on human expertise, posing challenges in maintaining transparency, accountability, and continuous improvement across complex global supply chains. To address these challenges, we propose a smart audit system empowered by large language models (LLMs). Our approach introduces three innovations: a dynamic risk assessment model that streamlines audit procedures and optimizes resource allocation; a manufacturing compliance copilot that enhances data processing, retrieval, and evaluation for a self-evolving manufacturing knowledge base; and a Re-act framework commonality analysis agent that provides real-time, customized analysis to empower engineers with insights for supplier improvement. These enhancements elevate audit efficiency and effectiveness, with testing scenarios demonstrating an improvement of over 24%.

Related papers

Sustainability via LLM Right-sizing [21.17523328451591]
Large language models (LLMs) have become increasingly embedded in organizational. This study offers an empirical answer by evaluating eleven proprietary and open-weight LLMs across ten everyday occupational tasks. Results show that GPT-4o delivers consistently superior performance but at a significantly higher cost and environmental footprint.
arXiv Detail & Related papers (2025-04-17T04:00:40Z)
Towards Agentic Recommender Systems in the Era of Multimodal Large Language Models [75.4890331763196]
Recent breakthroughs in Large Language Models (LLMs) have led to the emergence of agentic AI systems. LLM-based Agentic RS (LLM-ARS) can offer more interactive, context-aware, and proactive recommendations.
arXiv Detail & Related papers (2025-03-20T22:37:15Z)
FACT-AUDIT: An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models [79.41859481668618]
Large Language Models (LLMs) have significantly advanced the fact-checking studies. Existing automated fact-checking evaluation methods rely on static datasets and classification metrics. We introduce FACT-AUDIT, an agent-driven framework that adaptively and dynamically assesses LLMs' fact-checking capabilities.
arXiv Detail & Related papers (2025-02-25T07:44:22Z)
The Dual-use Dilemma in LLMs: Do Empowering Ethical Capacities Make a Degraded Utility? [54.18519360412294]
Large Language Models (LLMs) must balance between rejecting harmful requests for safety and accommodating legitimate ones for utility. This paper presents a Direct Preference Optimization (DPO) based alignment framework that achieves better overall performance. We analyze experimental results obtained from testing DeepSeek-R1 on our benchmark and reveal the critical ethical concerns raised by this highly acclaimed model.
arXiv Detail & Related papers (2025-01-20T06:35:01Z)
Addressing Quality Challenges in Deep Learning: The Role of MLOps and Domain Knowledge [5.190998244098203]
Deep learning (DL) systems present unique challenges in software engineering, especially concerning quality attributes like correctness and resource efficiency. This experience paper explores the role of MLOps practices in creating transparent and reproducible experimentation environments. We report on experiences addressing the quality challenges by embedding domain knowledge into the design of a DL model and its integration within a larger system.
arXiv Detail & Related papers (2025-01-14T19:37:08Z)
Powering LLM Regulation through Data: Bridging the Gap from Compute Thresholds to Customer Experiences [0.0]
This paper argues that current regulatory approaches, which focus on compute-level thresholds and generalized model evaluations, are insufficient to ensure the safety and effectiveness of specific LLM-based user experiences. We propose a shift towards a certification process centered on actual user-facing experiences and the curation of high-quality datasets for evaluation.
arXiv Detail & Related papers (2025-01-12T16:20:40Z)
On the Adversarial Robustness of Instruction-Tuned Large Language Models for Code [4.286327408435937]
We assess the impact of diverse input challenges on the functionality and correctness of generated code using rigorous metrics and established benchmarks. Open-source models demonstrate an increased susceptibility to input perturbations, resulting in declines in functional correctness ranging from 12% to 34%. In contrast, commercial models demonstrate relatively greater resilience, with performance degradation ranging from 3% to 24%.
arXiv Detail & Related papers (2024-11-29T07:00:47Z)
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments [90.29937153770835]
We introduce CRMArena, a benchmark designed to evaluate AI agents on realistic tasks grounded in professional work environments. We show that state-of-the-art LLM agents succeed in less than 40% of the tasks with ReAct prompting, and less than 55% even with function-calling abilities. Our findings highlight the need for enhanced agent capabilities in function-calling and rule-following to be deployed in real-world work environments.
arXiv Detail & Related papers (2024-11-04T17:30:51Z)
Large Language Models for Manufacturing [41.12098478080648]
Large Language Models (LLMs) have the potential to transform manufacturing industry, offering new opportunities to optimize processes, improve efficiency, and drive innovation. This paper focuses on the integration of LLMs into the manufacturing domain, focusing on their potential to automate and enhance various aspects of manufacturing.
arXiv Detail & Related papers (2024-10-28T18:13:47Z)
Trustworthiness in Retrieval-Augmented Generation Systems: A Survey [59.26328612791924]
Retrieval-Augmented Generation (RAG) has quickly grown into a pivotal paradigm in the development of Large Language Models (LLMs) We propose a unified framework that assesses the trustworthiness of RAG systems across six key dimensions: factuality, robustness, fairness, transparency, accountability, and privacy.
arXiv Detail & Related papers (2024-09-16T09:06:44Z)
AI based Multiagent Approach for Requirements Elicitation and Analysis [3.9422957660677476]
This study empirically investigates the effectiveness of utilizing Large Language Models (LLMs) to automate requirements analysis tasks. We deployed four models, namely GPT-3.5, GPT-4 Omni, LLaMA3-70, and Mixtral-8B, and conducted experiments to analyze requirements on four real-world projects. Preliminary results indicate notable variations in task completion among the models.
arXiv Detail & Related papers (2024-08-18T07:23:12Z)
VERA: Validation and Evaluation of Retrieval-Augmented Systems [5.709401805125129]
VERA is a framework designed to enhance the transparency and reliability of outputs from large language models (LLMs) We show how VERA can strengthen decision-making processes and trust in AI applications.
arXiv Detail & Related papers (2024-08-16T21:59:59Z)
Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs) The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation. We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z)
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents [76.95062553043607]
evaluating large language models (LLMs) is essential for understanding their capabilities and facilitating their integration into practical applications. We introduce AgentBoard, a pioneering comprehensive benchmark and accompanied open-source evaluation framework tailored to analytical evaluation of LLM agents.
arXiv Detail & Related papers (2024-01-24T01:51:00Z)
TrustFed: A Reliable Federated Learning Framework with Malicious-Attack Resistance [8.924352407824566]
Federated learning (FL) enables collaborative learning among multiple clients while ensuring individual data privacy. In this paper, we propose a hierarchical audit-based FL (HiAudit-FL) framework to enhance the reliability and security of the learning process. Our simulation results demonstrate that HiAudit-FL can effectively identify and handle potential malicious users accurately, with small system overhead.
arXiv Detail & Related papers (2023-12-06T13:56:45Z)
QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement. QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights. We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z)
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models [61.28463542324576]
Vision-language models (VLMs) have recently demonstrated strong efficacy as visual assistants that can generate human-like outputs. We evaluate existing state-of-the-art VLMs and find that even the best-performing model is unable to demonstrate strong visual reasoning capabilities and consistency. We propose a two-stage training framework aimed at improving both the reasoning performance and consistency of VLMs.
arXiv Detail & Related papers (2023-09-08T17:49:44Z)
Trustworthy Artificial Intelligence and Process Mining: Challenges and Opportunities [0.8602553195689513]
We show that process mining can provide a useful framework for gaining fact-based visibility to AI compliance process execution. We provide for an automated approach to analyze, remediate and monitor uncertainty in AI regulatory compliance processes.
arXiv Detail & Related papers (2021-10-06T12:50:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.