Related papers: NLP-based Regulatory Compliance -- Using GPT 4.0 to Decode Regulatory Documents

NLP-based Regulatory Compliance -- Using GPT 4.0 to Decode Regulatory Documents

URL: http://arxiv.org/abs/2412.20602v1
Date: Sun, 29 Dec 2024 22:14:59 GMT
Title: NLP-based Regulatory Compliance -- Using GPT 4.0 to Decode Regulatory Documents
Authors: Bimal Kumar, Dmitri Roussinov,
Abstract summary: This study evaluates GPT-4.0's ability to identify conflicts within regulatory requirements.<n>Using metrics such as precision, recall, and F1 score, the experiment demonstrates GPT-4.0's effectiveness in detecting inconsistencies.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) such as GPT-4.0 have shown significant promise in addressing the semantic complexities of regulatory documents, particularly in detecting inconsistencies and contradictions. This study evaluates GPT-4.0's ability to identify conflicts within regulatory requirements by analyzing a curated corpus with artificially injected ambiguities and contradictions, designed in collaboration with architects and compliance engineers. Using metrics such as precision, recall, and F1 score, the experiment demonstrates GPT-4.0's effectiveness in detecting inconsistencies, with findings validated by human experts. The results highlight the potential of LLMs to enhance regulatory compliance processes, though further testing with larger datasets and domain-specific fine-tuning is needed to maximize accuracy and practical applicability. Future work will explore automated conflict resolution and real-world implementation through pilot projects with industry partners.

Related papers

Scoring Verifiers: Evaluating Synthetic Verification for Code and Reasoning [59.25951947621526]
We propose an approach which can transform existing coding benchmarks into scoring and ranking datasets to evaluate the effectiveness of synthetic verifiers. We release four new benchmarks (HE-R, HE-R+, MBPP-R, and MBPP-R+), and analyzed synthetic verification methods with standard, reasoning-based, and reward-based LLMs. Our experiments show that reasoning can significantly improve test case generation and that scaling the number of test cases enhances the verification accuracy.
arXiv Detail & Related papers (2025-02-19T15:32:11Z)
Automated Refactoring of Non-Idiomatic Python Code: A Differentiated Replication with LLMs [54.309127753635366]
We present the results of a replication study in which we investigate GPT-4 effectiveness in recommending and suggesting idiomatic actions. Our findings underscore the potential of LLMs to achieve tasks where, in the past, implementing recommenders based on complex code analyses was required.
arXiv Detail & Related papers (2025-01-28T15:41:54Z)
Patent-CR: A Dataset for Patent Claim Revision [0.0]
This paper presents Patent-CR, the first dataset created for the patent claim revision task in English.<n>It includes both initial patent applications rejected by patent examiners and the final granted versions.
arXiv Detail & Related papers (2024-12-03T16:43:42Z)
GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering [0.0]
Retrieval-Augmented Generation (RAG) has emerged as a common paradigm to use Large Language Models (LLMs) alongside private and up-to-date knowledge bases. We address the challenges of using LLM-as-a-Judge when evaluating grounded answers generated by RAG systems.
arXiv Detail & Related papers (2024-09-10T15:39:32Z)
Granting GPT-4 License and Opportunity: Enhancing Accuracy and Confidence Estimation for Few-Shot Event Detection [6.718542027371254]
Large Language Models (LLMs) have shown enough promise in few-shot learning context to suggest use in the generation of "silver" data. Confidence estimation is a documented weakness of models such as GPT-4. The present effort explores methods for effective confidence estimation with GPT-4 with few-shot learning for event detection in the BETTER License as a vehicle.
arXiv Detail & Related papers (2024-08-01T21:08:07Z)
Enhancing Legal Compliance and Regulation Analysis with Large Language Models [0.0]
This research explores the application of Large Language Models (LLMs) to accurately classify legal provisions and automate compliance checks. Our findings demonstrate promising results, indicating LLMs' significant potential to enhance legal compliance and regulatory analysis efficiency, notably by reducing manual workload and improving accuracy within reasonable time financial constraints.
arXiv Detail & Related papers (2024-04-26T16:40:49Z)
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models [92.6951708781736]
This work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5. We find that GPT models can be easily misled to generate toxic and biased outputs and leak private information. Our work illustrates a comprehensive trustworthiness evaluation of GPT models and sheds light on the trustworthiness gaps.
arXiv Detail & Related papers (2023-06-20T17:24:23Z)
Is GPT-4 a Good Data Analyst? [67.35956981748699]
We consider GPT-4 as a data analyst to perform end-to-end data analysis with databases from a wide range of domains. We design several task-specific evaluation metrics to systematically compare the performance between several professional human data analysts and GPT-4. Experimental results show that GPT-4 can achieve comparable performance to humans.
arXiv Detail & Related papers (2023-05-24T11:26:59Z)
A Symbolic Framework for Evaluating Mathematical Reasoning and Generalisation with Transformers [17.075558137261986]
We evaluate the generalisability of Transformers to out-of-distribution mathematical reasoning problems. We compare the capabilities of GPT-4, GPT-3.5, and a canon of fine-tuned BERT models. Surprisingly, our evaluation reveals that the average in-distribution performance of fine-tuned models surpasses GPT-3.5, and rivals GPT-4.
arXiv Detail & Related papers (2023-05-21T20:40:37Z)
GPT-4 Technical Report [116.90398195245983]
GPT-4 is a large-scale, multimodal model which can accept image and text inputs and produce text outputs. It exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers.
arXiv Detail & Related papers (2023-03-15T17:15:04Z)
Prompting GPT-3 To Be Reliable [117.23966502293796]
This work decomposes reliability into four facets: generalizability, fairness, calibration, and factuality. We find that GPT-3 outperforms smaller-scale supervised models by large margins on all these facets.
arXiv Detail & Related papers (2022-10-17T14:52:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.