Structured Legal Document Generation in India: A Model-Agnostic Wrapper Approach with VidhikDastaavej
- URL: http://arxiv.org/abs/2504.03486v1
- Date: Fri, 04 Apr 2025 14:41:50 GMT
- Title: Structured Legal Document Generation in India: A Model-Agnostic Wrapper Approach with VidhikDastaavej
- Authors: Shubham Kumar Nigam, Balaramamahanthi Deepak Patnaik, Ajay Varghese Thomas, Noel Shallum, Kripabandhu Ghosh, Arnab Bhattacharya,
- Abstract summary: We introduce VidhikDastaavej, a novel, anonymized dataset of private legal documents.<n>We develop NyayaShilp, a fine-tuned legal document generation model specifically adapted to Indian legal texts.
- Score: 5.790242888372048
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automating legal document drafting can significantly enhance efficiency, reduce manual effort, and streamline legal workflows. While prior research has explored tasks such as judgment prediction and case summarization, the structured generation of private legal documents in the Indian legal domain remains largely unaddressed. To bridge this gap, we introduce VidhikDastaavej, a novel, anonymized dataset of private legal documents, and develop NyayaShilp, a fine-tuned legal document generation model specifically adapted to Indian legal texts. We propose a Model-Agnostic Wrapper (MAW), a two-step framework that first generates structured section titles and then iteratively produces content while leveraging retrieval-based mechanisms to ensure coherence and factual accuracy. We benchmark multiple open-source LLMs, including instruction-tuned and domain-adapted versions, alongside proprietary models for comparison. Our findings indicate that while direct fine-tuning on small datasets does not always yield improvements, our structured wrapper significantly enhances coherence, factual adherence, and overall document quality while mitigating hallucinations. To ensure real-world applicability, we developed a Human-in-the-Loop (HITL) Document Generation System, an interactive user interface that enables users to specify document types, refine section details, and generate structured legal drafts. This tool allows legal professionals and researchers to generate, validate, and refine AI-generated legal documents efficiently. Extensive evaluations, including expert assessments, confirm that our framework achieves high reliability in structured legal drafting. This research establishes a scalable and adaptable foundation for AI-assisted legal drafting in India, offering an effective approach to structured legal document generation.
Related papers
- JuDGE: Benchmarking Judgment Document Generation for Chinese Legal System [12.256518096712334]
JuDGE (Judgment Document Generation Evaluation) is a novel benchmark for evaluating the performance of judgment document generation in the Chinese legal system.<n>We construct a comprehensive dataset consisting of factual descriptions from real legal cases, paired with their corresponding full judgment documents.<n>In collaboration with legal professionals, we establish a comprehensive automated evaluation framework to assess the quality of generated judgment documents.
arXiv Detail & Related papers (2025-03-18T13:48:18Z) - CaseGen: A Benchmark for Multi-Stage Legal Case Documents Generation [22.98779736851499]
We introduce CaseGen, the benchmark for multi-stage legal case documents generation in the Chinese legal domain.<n>CaseGen is based on 500 real case samples annotated by legal experts and covers seven essential case sections.<n>It supports four key tasks: drafting defense statements, writing trial facts, composing legal reasoning, and generating judgment results.
arXiv Detail & Related papers (2025-02-25T08:03:32Z) - Named entity recognition for Serbian legal documents: Design, methodology and dataset development [0.0]
We present one solution for Named Entity Recognition (NER) in the case of legal documents written in Serbian language.<n>It leverages on the pre-trained bidirectional encoder representations from transformers (BERT), which had been carefully adapted to the specific task of identifying and classifying specific data points from textual content.
arXiv Detail & Related papers (2025-02-14T22:23:39Z) - DocMIA: Document-Level Membership Inference Attacks against DocVQA Models [52.13818827581981]
We introduce two novel membership inference attacks tailored specifically to DocVQA models.<n>Our methods outperform existing state-of-the-art membership inference attacks across a variety of DocVQA models and datasets.
arXiv Detail & Related papers (2025-02-06T00:58:21Z) - Improving Legal Entity Recognition Using a Hybrid Transformer Model and Semantic Filtering Approach [0.0]
This paper proposes a novel hybrid model that enhances the accuracy and precision of Legal-BERT, a transformer model fine-tuned for legal text processing.
We evaluate the model on a dataset of 15,000 annotated legal documents, achieving an F1 score of 93.4%, demonstrating significant improvements in precision and recall over previous methods.
arXiv Detail & Related papers (2024-10-11T04:51:28Z) - Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - Re3: A Holistic Framework and Dataset for Modeling Collaborative Document Revision [62.12545440385489]
We introduce Re3, a framework for joint analysis of collaborative document revision.
We present Re3-Sci, a large corpus of aligned scientific paper revisions manually labeled according to their action and intent.
We use the new data to provide first empirical insights into collaborative document revision in the academic domain.
arXiv Detail & Related papers (2024-05-31T21:19:09Z) - Enhancing Pre-Trained Language Models with Sentence Position Embeddings
for Rhetorical Roles Recognition in Legal Opinions [0.16385815610837165]
The size of legal opinions continues to grow, making it increasingly challenging to develop a model that can accurately predict the rhetorical roles of legal opinions.
We propose a novel model architecture for automatically predicting rhetorical roles using pre-trained language models (PLMs) enhanced with knowledge of sentence position information.
Based on an annotated corpus from the LegalEval@SemEval2023 competition, we demonstrate that our approach requires fewer parameters, resulting in lower computational costs.
arXiv Detail & Related papers (2023-10-08T20:33:55Z) - SAILER: Structure-aware Pre-trained Language Model for Legal Case
Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system.
Most existing language models have difficulty understanding the long-distance dependencies between different structures.
We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z) - WiCE: Real-World Entailment for Claims in Wikipedia [63.234352061821625]
We propose WiCE, a new fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia.
In addition to standard claim-level entailment, WiCE provides entailment judgments over sub-sentence units of the claim.
We show that real claims in our dataset involve challenging verification and retrieval problems that existing models fail to address.
arXiv Detail & Related papers (2023-03-02T17:45:32Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.