Related papers: Compliance-to-Code: Enhancing Financial Compliance Checking via Code Generation

Compliance-to-Code: Enhancing Financial Compliance Checking via Code Generation

URL: http://arxiv.org/abs/2505.19804v2
Date: Mon, 09 Jun 2025 07:23:25 GMT
Title: Compliance-to-Code: Enhancing Financial Compliance Checking via Code Generation
Authors: Siyuan Li, Jian Chen, Rui Yao, Xuming Hu, Peilin Zhou, Weihua Qiu, Simin Zhang, Chucheng Dong, Zhiyao Li, Qipeng Xie, Zixuan Yuan,
Abstract summary: We present Compliance-to-Code, the first large-scale Chinese dataset dedicated to financial regulatory compliance.<n> Covering 1,159 annotated clauses from 361 regulations across ten categories, each clause is modularly structured with four logical elements-subject, condition, constraint, and contextual information-along with regulation relations.<n>We provide deterministic Python code mappings, detailed code reasoning, and code explanations to facilitate automated auditing.
Score: 36.166087396386445
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Nowadays, regulatory compliance has become a cornerstone of corporate governance, ensuring adherence to systematic legal frameworks. At its core, financial regulations often comprise highly intricate provisions, layered logical structures, and numerous exceptions, which inevitably result in labor-intensive or comprehension challenges. To mitigate this, recent Regulatory Technology (RegTech) and Large Language Models (LLMs) have gained significant attention in automating the conversion of regulatory text into executable compliance logic. However, their performance remains suboptimal particularly when applied to Chinese-language financial regulations, due to three key limitations: (1) incomplete domain-specific knowledge representation, (2) insufficient hierarchical reasoning capabilities, and (3) failure to maintain temporal and logical coherence. One promising solution is to develop a domain specific and code-oriented datasets for model training. Existing datasets such as LexGLUE, LegalBench, and CODE-ACCORD are often English-focused, domain-mismatched, or lack fine-grained granularity for compliance code generation. To fill these gaps, we present Compliance-to-Code, the first large-scale Chinese dataset dedicated to financial regulatory compliance. Covering 1,159 annotated clauses from 361 regulations across ten categories, each clause is modularly structured with four logical elements-subject, condition, constraint, and contextual information-along with regulation relations. We provide deterministic Python code mappings, detailed code reasoning, and code explanations to facilitate automated auditing. To demonstrate utility, we present FinCheck: a pipeline for regulation structuring, code generation, and report generation.

Related papers

Reflections on the design, applications and implementations of the normative specification language eFLINT [0.764671395172401]
Legal practices involve subjective processes such as interpretation and qualification.<n> computational reasoning with laws requires a cross-disciplinary process involving both legal and software expertise.<n>This paper reflects on the domain-specific software language eFLINT developed to experiment with novel solutions.
arXiv Detail & Related papers (2025-11-15T16:09:31Z)
Judging by the Rules: Compliance-Aligned Framework for Modern Slavery Statement Monitoring [24.13989765643719]
Modern slavery affects millions of people worldwide, and regulatory frameworks such as Modern Slavery Acts now require companies to publish detailed disclosures.<n>These statements are often vague and inconsistent, making manual review time-consuming and difficult to scale.<n>We propose a novel framework that harnesses AI for rule-level compliance verification while preserving expert oversight.
arXiv Detail & Related papers (2025-11-11T03:41:44Z)
Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning [65.20602712957725]
Caco is a novel framework that automates the synthesis of high-quality, verifiable, and diverse instruction-CoT reasoning data.<n>Our work establishes a paradigm for building self-sustaining, trustworthy reasoning systems without human intervention.
arXiv Detail & Related papers (2025-10-05T07:59:24Z)
Statutory Construction and Interpretation for Artificial Intelligence [19.65776192762091]
We show how different interpretations of the same rule can lead to inconsistent or unstable model behavior.<n>We propose a computational framework that mirrors two legal mechanisms.<n>Our approach offers a first step toward systematically managing interpretive ambiguity.
arXiv Detail & Related papers (2025-09-01T07:10:22Z)
Data Dependency-Aware Code Generation from Enhanced UML Sequence Diagrams [54.528185120850274]
We propose a novel step-by-step code generation framework named API2Dep.<n>First, we introduce an enhanced Unified Modeling Language (UML) API diagram tailored for service-oriented architectures.<n>Second, recognizing the critical role of data flow, we introduce a dedicated data dependency inference task.
arXiv Detail & Related papers (2025-08-05T12:28:23Z)
ARCEAK: An Automated Rule Checking Framework Enhanced with Architectural Knowledge [2.0159170788984024]
Automated Rule Checking (ARC) plays a crucial role in advancing the construction industry by addressing the laborious, inconsistent, and error-prone nature of traditional model review conducted by industry professionals.<n>Our study introduces a novel approach that decomposes ARC into two distinct tasks: rule information extraction and verification code generation.
arXiv Detail & Related papers (2024-12-10T10:37:11Z)
Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning [94.76546523689113]
We introduce CodePlan, a framework that generates and follows textcode-form plans -- pseudocode that outlines high-level, structured reasoning processes. CodePlan effectively captures the rich semantics and control flows inherent to sophisticated reasoning tasks. It achieves a 25.1% relative improvement compared with directly generating responses.
arXiv Detail & Related papers (2024-09-19T04:13:58Z)
RIRAG: Regulatory Information Retrieval and Answer Generation [51.998738311700095]
We introduce a task of generating question-passages pairs, where questions are automatically created and paired with relevant regulatory passages.<n>We create the ObliQA dataset, containing 27,869 questions derived from the collection of Abu Dhabi Global Markets (ADGM) financial regulation documents.<n>We design a baseline Regulatory Information Retrieval and Answer Generation (RIRAG) system and evaluate it with RePASs, a novel evaluation metric.
arXiv Detail & Related papers (2024-09-09T14:44:19Z)
Using Large Language Models for the Interpretation of Building Regulations [7.013802453969655]
Large language models (LLMs) can generate logically coherent text and source code responding to user prompts. This paper evaluates the performance of LLMs in translating building regulations into LegalRuleML in a few-shot learning setup.
arXiv Detail & Related papers (2024-07-26T08:30:47Z)
Learnable Item Tokenization for Generative Recommendation [78.30417863309061]
We propose LETTER (a LEarnable Tokenizer for generaTivE Recommendation), which integrates hierarchical semantics, collaborative signals, and code assignment diversity. LETTER incorporates Residual Quantized VAE for semantic regularization, a contrastive alignment loss for collaborative regularization, and a diversity loss to mitigate code assignment bias.
arXiv Detail & Related papers (2024-05-12T15:49:38Z)
CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking [1.9950441865030422]
CODE-ACCORD is a dataset of 862 sentences from the building regulations of England and Finland.<n>It supports a range of ML and Natural Language Processing (NLP) tasks, including text classification, entity recognition, and relation extraction.
arXiv Detail & Related papers (2024-03-04T17:21:19Z)
Bridging between LegalRuleML and TPTP for Automated Normative Reasoning (extended version) [77.34726150561087]
LegalRuleML is an XML-based representation framework for modeling and exchanging normative rules. The TPTP input and output formats are general-purpose standards for the interaction with automated reasoning systems. We provide a bridge between the two communities by defining a logic-pluralistic normative reasoning language based on the TPTP format.
arXiv Detail & Related papers (2022-09-12T08:42:34Z)
Tag-based regulation of modules in genetic programming improves context-dependent problem solving [62.997667081978825]
We introduce and experimentally demonstrate tag-based genetic regulation. Tag-based genetic regulation extends existing tag-based naming schemes. We find that tag-based regulation improves problem-solving performance on context-dependent problems.
arXiv Detail & Related papers (2020-12-16T19:49:28Z)
Institutional Grammar 2.0 Codebook [0.0]
This codebook provides coding guidelines for a revised version of the Institutional Grammar, the Institutional Grammar 2.0 (IG 2.0) IG 2.0 is a specification that aims at facilitating the encoding of policy to meet varying analytical objectives.
arXiv Detail & Related papers (2020-08-20T12:38:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.