Compliance-to-Code: Enhancing Financial Compliance Checking via Code Generation
- URL: http://arxiv.org/abs/2505.19804v2
- Date: Mon, 09 Jun 2025 07:23:25 GMT
- Title: Compliance-to-Code: Enhancing Financial Compliance Checking via Code Generation
- Authors: Siyuan Li, Jian Chen, Rui Yao, Xuming Hu, Peilin Zhou, Weihua Qiu, Simin Zhang, Chucheng Dong, Zhiyao Li, Qipeng Xie, Zixuan Yuan,
- Abstract summary: We present Compliance-to-Code, the first large-scale Chinese dataset dedicated to financial regulatory compliance.<n> Covering 1,159 annotated clauses from 361 regulations across ten categories, each clause is modularly structured with four logical elements-subject, condition, constraint, and contextual information-along with regulation relations.<n>We provide deterministic Python code mappings, detailed code reasoning, and code explanations to facilitate automated auditing.
- Score: 36.166087396386445
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Nowadays, regulatory compliance has become a cornerstone of corporate governance, ensuring adherence to systematic legal frameworks. At its core, financial regulations often comprise highly intricate provisions, layered logical structures, and numerous exceptions, which inevitably result in labor-intensive or comprehension challenges. To mitigate this, recent Regulatory Technology (RegTech) and Large Language Models (LLMs) have gained significant attention in automating the conversion of regulatory text into executable compliance logic. However, their performance remains suboptimal particularly when applied to Chinese-language financial regulations, due to three key limitations: (1) incomplete domain-specific knowledge representation, (2) insufficient hierarchical reasoning capabilities, and (3) failure to maintain temporal and logical coherence. One promising solution is to develop a domain specific and code-oriented datasets for model training. Existing datasets such as LexGLUE, LegalBench, and CODE-ACCORD are often English-focused, domain-mismatched, or lack fine-grained granularity for compliance code generation. To fill these gaps, we present Compliance-to-Code, the first large-scale Chinese dataset dedicated to financial regulatory compliance. Covering 1,159 annotated clauses from 361 regulations across ten categories, each clause is modularly structured with four logical elements-subject, condition, constraint, and contextual information-along with regulation relations. We provide deterministic Python code mappings, detailed code reasoning, and code explanations to facilitate automated auditing. To demonstrate utility, we present FinCheck: a pipeline for regulation structuring, code generation, and report generation.
Related papers
- ARCEAK: An Automated Rule Checking Framework Enhanced with Architectural Knowledge [2.0159170788984024]
Automated Rule Checking (ARC) plays a crucial role in advancing the construction industry by addressing the laborious, inconsistent, and error-prone nature of traditional model review conducted by industry professionals.<n>Our study introduces a novel approach that decomposes ARC into two distinct tasks: rule information extraction and verification code generation.
arXiv Detail & Related papers (2024-12-10T10:37:11Z) - Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning [94.76546523689113]
We introduce CodePlan, a framework that generates and follows textcode-form plans -- pseudocode that outlines high-level, structured reasoning processes.
CodePlan effectively captures the rich semantics and control flows inherent to sophisticated reasoning tasks.
It achieves a 25.1% relative improvement compared with directly generating responses.
arXiv Detail & Related papers (2024-09-19T04:13:58Z) - RIRAG: Regulatory Information Retrieval and Answer Generation [51.998738311700095]
We introduce a task of generating question-passages pairs, where questions are automatically created and paired with relevant regulatory passages.<n>We create the ObliQA dataset, containing 27,869 questions derived from the collection of Abu Dhabi Global Markets (ADGM) financial regulation documents.<n>We design a baseline Regulatory Information Retrieval and Answer Generation (RIRAG) system and evaluate it with RePASs, a novel evaluation metric.
arXiv Detail & Related papers (2024-09-09T14:44:19Z) - Using Large Language Models for the Interpretation of Building Regulations [7.013802453969655]
Large language models (LLMs) can generate logically coherent text and source code responding to user prompts.
This paper evaluates the performance of LLMs in translating building regulations into LegalRuleML in a few-shot learning setup.
arXiv Detail & Related papers (2024-07-26T08:30:47Z) - Learnable Item Tokenization for Generative Recommendation [78.30417863309061]
We propose LETTER (a LEarnable Tokenizer for generaTivE Recommendation), which integrates hierarchical semantics, collaborative signals, and code assignment diversity.
LETTER incorporates Residual Quantized VAE for semantic regularization, a contrastive alignment loss for collaborative regularization, and a diversity loss to mitigate code assignment bias.
arXiv Detail & Related papers (2024-05-12T15:49:38Z) - CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking [1.9950441865030422]
CODE-ACCORD is a dataset of 862 sentences from the building regulations of England and Finland.<n>It supports a range of ML and Natural Language Processing (NLP) tasks, including text classification, entity recognition, and relation extraction.
arXiv Detail & Related papers (2024-03-04T17:21:19Z) - Bridging between LegalRuleML and TPTP for Automated Normative Reasoning
(extended version) [77.34726150561087]
LegalRuleML is an XML-based representation framework for modeling and exchanging normative rules.
The TPTP input and output formats are general-purpose standards for the interaction with automated reasoning systems.
We provide a bridge between the two communities by defining a logic-pluralistic normative reasoning language based on the TPTP format.
arXiv Detail & Related papers (2022-09-12T08:42:34Z) - Tag-based regulation of modules in genetic programming improves
context-dependent problem solving [62.997667081978825]
We introduce and experimentally demonstrate tag-based genetic regulation.
Tag-based genetic regulation extends existing tag-based naming schemes.
We find that tag-based regulation improves problem-solving performance on context-dependent problems.
arXiv Detail & Related papers (2020-12-16T19:49:28Z) - Institutional Grammar 2.0 Codebook [0.0]
This codebook provides coding guidelines for a revised version of the Institutional Grammar, the Institutional Grammar 2.0 (IG 2.0)
IG 2.0 is a specification that aims at facilitating the encoding of policy to meet varying analytical objectives.
arXiv Detail & Related papers (2020-08-20T12:38:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.