Legal Requirements Translation from Law
- URL: http://arxiv.org/abs/2507.02846v1
- Date: Thu, 03 Jul 2025 17:53:48 GMT
- Title: Legal Requirements Translation from Law
- Authors: Anmol Singhal, Travis Breaux,
- Abstract summary: We introduce an approach based on textual entailment and in-context learning for automatically generating a canonical representation of legal text.<n>We evaluate our approach on 13 U.S. state data breach notification laws, demonstrating that our generated representations pass approximately 89.4% of test cases and achieve a precision and recall of 82.2 and 88.7, respectively.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Software systems must comply with legal regulations, which is a resource-intensive task, particularly for small organizations and startups lacking dedicated legal expertise. Extracting metadata from regulations to elicit legal requirements for software is a critical step to ensure compliance. However, it is a cumbersome task due to the length and complex nature of legal text. Although prior work has pursued automated methods for extracting structural and semantic metadata from legal text, key limitations remain: they do not consider the interplay and interrelationships among attributes associated with these metadata types, and they rely on manual labeling or heuristic-driven machine learning, which does not generalize well to new documents. In this paper, we introduce an approach based on textual entailment and in-context learning for automatically generating a canonical representation of legal text, encodable and executable as Python code. Our representation is instantiated from a manually designed Python class structure that serves as a domain-specific metamodel, capturing both structural and semantic legal metadata and their interrelationships. This design choice reduces the need for large, manually labeled datasets and enhances applicability to unseen legislation. We evaluate our approach on 13 U.S. state data breach notification laws, demonstrating that our generated representations pass approximately 89.4% of test cases and achieve a precision and recall of 82.2 and 88.7, respectively.
Related papers
- Transformer-Based Extraction of Statutory Definitions from the U.S. Code [0.0]
We present an advanced NLP system to automatically extract defined terms, their definitions, and their scope from the United States Code (U.S.C.)<n>Our best model achieves 96.8% precision and 98.9% recall (98.2% F1-score)<n>This work contributes to improving accessibility and understanding of legal information while establishing a foundation for downstream legal reasoning tasks.
arXiv Detail & Related papers (2025-04-23T02:09:53Z) - Improving the Accuracy and Efficiency of Legal Document Tagging with Large Language Models and Instruction Prompts [0.6554326244334866]
Legal-LLM is a novel approach that leverages the instruction-following capabilities of Large Language Models (LLMs) through fine-tuning.<n>We evaluate our method on two benchmark datasets, POSTURE50K and EURLEX57K, using micro-F1 and macro-F1 scores.
arXiv Detail & Related papers (2025-04-12T18:57:04Z) - Design and implementation of tools to build an ontology of Security Requirements for Internet of Medical Things [2.446672595462589]
In the Internet of Medical Things (IoMT) world, manufacturers or third parties must be aware of the security requirements expressed by both laws and specifications.<n>An ontology charting the relevant laws and specifications (for the European context) is very useful.<n>Due to the very high number and size of the considered specification documents, we have put in place a methodology and tools to simplify the transition from natural text to an ontology.
arXiv Detail & Related papers (2025-01-06T15:04:45Z) - DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment [55.91429725404988]
We introduce DELTA, a discriminative model designed for legal case retrieval.
We leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability.
Our approach can outperform existing state-of-the-art methods in legal case retrieval.
arXiv Detail & Related papers (2024-03-27T10:40:14Z) - From Text to Structure: Using Large Language Models to Support the
Development of Legal Expert Systems [0.6249768559720122]
Rule-based expert systems focused on legislation can support laypeople in understanding how legislation applies to them and provide them with helpful context and information.
Here, we investigate what degree large language models (LLMs), such as GPT-4, are able to automatically extract structured representations from legislation.
We use LLMs to create pathways from legislation, according to the JusticeBot methodology for legal decision support systems, evaluate the pathways and compare them to manually created pathways.
arXiv Detail & Related papers (2023-11-01T18:31:02Z) - SAILER: Structure-aware Pre-trained Language Model for Legal Case
Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system.
Most existing language models have difficulty understanding the long-distance dependencies between different structures.
We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z) - Using Document Similarity Methods to create Parallel Datasets for Code
Translation [60.36392618065203]
Translating source code from one programming language to another is a critical, time-consuming task.
We propose to use document similarity methods to create noisy parallel datasets of code.
We show that these models perform comparably to models trained on ground truth for reasonable levels of noise.
arXiv Detail & Related papers (2021-10-11T17:07:58Z) - Automatic Extraction of Rules Governing Morphological Agreement [103.78033184221373]
We develop an automated framework for extracting a first-pass grammatical specification from raw text.
We focus on extracting rules describing agreement, a morphosyntactic phenomenon at the core of the grammars of many of the world's languages.
We apply our framework to all languages included in the Universal Dependencies project, with promising results.
arXiv Detail & Related papers (2020-10-02T18:31:45Z) - SPECTER: Document-level Representation Learning using Citation-informed
Transformers [51.048515757909215]
SPECTER generates document-level embedding of scientific documents based on pretraining a Transformer language model.
We introduce SciDocs, a new evaluation benchmark consisting of seven document-level tasks ranging from citation prediction to document classification and recommendation.
arXiv Detail & Related papers (2020-04-15T16:05:51Z) - Procedural Reading Comprehension with Attribute-Aware Context Flow [85.34405161075276]
Procedural texts often describe processes that happen over entities.
We introduce an algorithm for procedural reading comprehension by translating the text into a general formalism.
arXiv Detail & Related papers (2020-03-31T00:06:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.