Improving Legal Entity Recognition Using a Hybrid Transformer Model and Semantic Filtering Approach
- URL: http://arxiv.org/abs/2410.08521v1
- Date: Fri, 11 Oct 2024 04:51:28 GMT
- Title: Improving Legal Entity Recognition Using a Hybrid Transformer Model and Semantic Filtering Approach
- Authors: Duraimurugan Rajamanickam,
- Abstract summary: This paper proposes a novel hybrid model that enhances the accuracy and precision of Legal-BERT, a transformer model fine-tuned for legal text processing.
We evaluate the model on a dataset of 15,000 annotated legal documents, achieving an F1 score of 93.4%, demonstrating significant improvements in precision and recall over previous methods.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Legal Entity Recognition (LER) is critical in automating legal workflows such as contract analysis, compliance monitoring, and litigation support. Existing approaches, including rule-based systems and classical machine learning models, struggle with the complexity of legal documents and domain specificity, particularly in handling ambiguities and nested entity structures. This paper proposes a novel hybrid model that enhances the accuracy and precision of Legal-BERT, a transformer model fine-tuned for legal text processing, by introducing a semantic similarity-based filtering mechanism. We evaluate the model on a dataset of 15,000 annotated legal documents, achieving an F1 score of 93.4%, demonstrating significant improvements in precision and recall over previous methods.
Related papers
- Transformer-Based Extraction of Statutory Definitions from the U.S. Code [0.0]
We present an advanced NLP system to automatically extract defined terms, their definitions, and their scope from the United States Code (U.S.C.)
Our best model achieves 96.8% precision and 98.9% recall (98.2% F1-score)
This work contributes to improving accessibility and understanding of legal information while establishing a foundation for downstream legal reasoning tasks.
arXiv Detail & Related papers (2025-04-23T02:09:53Z) - Improving the Accuracy and Efficiency of Legal Document Tagging with Large Language Models and Instruction Prompts [0.6554326244334866]
Legal-LLM is a novel approach that leverages the instruction-following capabilities of Large Language Models (LLMs) through fine-tuning.
We evaluate our method on two benchmark datasets, POSTURE50K and EURLEX57K, using micro-F1 and macro-F1 scores.
arXiv Detail & Related papers (2025-04-12T18:57:04Z) - Structured Legal Document Generation in India: A Model-Agnostic Wrapper Approach with VidhikDastaavej [5.790242888372048]
We introduce VidhikDastaavej, a novel, anonymized dataset of private legal documents.
We develop NyayaShilp, a fine-tuned legal document generation model specifically adapted to Indian legal texts.
arXiv Detail & Related papers (2025-04-04T14:41:50Z) - Localizing Factual Inconsistencies in Attributable Text Generation [91.981439746404]
We introduce QASemConsistency, a new formalism for localizing factual inconsistencies in attributable text generation.
We first demonstrate the effectiveness of the QASemConsistency methodology for human annotation.
We then implement several methods for automatically detecting localized factual inconsistencies.
arXiv Detail & Related papers (2024-10-09T22:53:48Z) - SparseCL: Sparse Contrastive Learning for Contradiction Retrieval [87.02936971689817]
Contradiction retrieval refers to identifying and extracting documents that explicitly disagree with or refute the content of a query.
Existing methods such as similarity search and crossencoder models exhibit significant limitations.
We introduce SparseCL that leverages specially trained sentence embeddings designed to preserve subtle, contradictory nuances between sentences.
arXiv Detail & Related papers (2024-06-15T21:57:03Z) - Empowering Prior to Court Legal Analysis: A Transparent and Accessible Dataset for Defensive Statement Classification and Interpretation [5.646219481667151]
This paper introduces a novel dataset tailored for classification of statements made during police interviews, prior to court proceedings.
We introduce a fine-tuned DistilBERT model that achieves state-of-the-art performance in distinguishing truthful from deceptive statements.
We also present an XAI interface that empowers both legal professionals and non-specialists to interact with and benefit from our system.
arXiv Detail & Related papers (2024-05-17T11:22:27Z) - Rethinking Legal Compliance Automation: Opportunities with Large Language Models [2.9088208525097365]
We argue that the examination of (textual) legal artifacts should, first employ broader context than sentences.
We present a compliance analysis approach designed to address these limitations.
arXiv Detail & Related papers (2024-04-22T17:10:27Z) - LegalPro-BERT: Classification of Legal Provisions by fine-tuning BERT Large Language Model [0.0]
Contract analysis requires the identification and classification of key provisions and paragraphs within an agreement.
LegalPro-BERT is a BERT transformer architecture model that we fine- tune to efficiently handle classification task for legal provisions.
arXiv Detail & Related papers (2024-04-15T19:08:48Z) - Enhancing Pre-Trained Language Models with Sentence Position Embeddings
for Rhetorical Roles Recognition in Legal Opinions [0.16385815610837165]
The size of legal opinions continues to grow, making it increasingly challenging to develop a model that can accurately predict the rhetorical roles of legal opinions.
We propose a novel model architecture for automatically predicting rhetorical roles using pre-trained language models (PLMs) enhanced with knowledge of sentence position information.
Based on an annotated corpus from the LegalEval@SemEval2023 competition, we demonstrate that our approach requires fewer parameters, resulting in lower computational costs.
arXiv Detail & Related papers (2023-10-08T20:33:55Z) - Precision-Recall Divergence Optimization for Generative Modeling with
GANs and Normalizing Flows [54.050498411883495]
We develop a novel training method for generative models, such as Generative Adversarial Networks and Normalizing Flows.
We show that achieving a specified precision-recall trade-off corresponds to minimizing a unique $f$-divergence from a family we call the textitPR-divergences.
Our approach improves the performance of existing state-of-the-art models like BigGAN in terms of either precision or recall when tested on datasets such as ImageNet.
arXiv Detail & Related papers (2023-05-30T10:07:17Z) - Federated Conformal Predictors for Distributed Uncertainty
Quantification [83.50609351513886]
Conformal prediction is emerging as a popular paradigm for providing rigorous uncertainty quantification in machine learning.
In this paper, we extend conformal prediction to the federated learning setting.
We propose a weaker notion of partial exchangeability, better suited to the FL setting, and use it to develop the Federated Conformal Prediction framework.
arXiv Detail & Related papers (2023-05-27T19:57:27Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Slimmable Domain Adaptation [112.19652651687402]
We introduce a simple framework, Slimmable Domain Adaptation, to improve cross-domain generalization with a weight-sharing model bank.
Our framework surpasses other competing approaches by a very large margin on multiple benchmarks.
arXiv Detail & Related papers (2022-06-14T06:28:04Z) - Transformer-based Approaches for Legal Text Processing [3.4630926944621643]
We introduce our approaches using Transformer-based models for different problems of the COLIEE 2021 automatic legal text processing competition.
We find that Transformer-based pretrained language models can perform well with automated legal text processing problems with appropriate approaches.
arXiv Detail & Related papers (2022-02-13T19:59:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.