Clustering-based Automatic Construction of Legal Entity Knowledge Base
from Contracts
- URL: http://arxiv.org/abs/2012.01942v2
- Date: Mon, 7 Dec 2020 09:49:26 GMT
- Title: Clustering-based Automatic Construction of Legal Entity Knowledge Base
from Contracts
- Authors: Fuqi Song and \'Eric de la Clergerie
- Abstract summary: We propose a clustering-based approach to automatically generate a reliable knowledge base of legal entities from given contracts.
The proposed method is robust to different types of errors brought by pre-processing such as OCR and NER.
Compared to the collected ground-truth data, our method is able to recall 84% of the knowledge.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In contract analysis and contract automation, a knowledge base (KB) of legal
entities is fundamental for performing tasks such as contract verification,
contract generation and contract analytic. However, such a KB does not always
exist nor can be produced in a short time. In this paper, we propose a
clustering-based approach to automatically generate a reliable knowledge base
of legal entities from given contracts without any supplemental references. The
proposed method is robust to different types of errors brought by
pre-processing such as Optical Character Recognition (OCR) and Named Entity
Recognition (NER), as well as editing errors such as typos. We evaluate our
method on a dataset that consists of 800 real contracts with various qualities
from 15 clients. Compared to the collected ground-truth data, our method is
able to recall 84\% of the knowledge.
Related papers
- Localizing Factual Inconsistencies in Attributable Text Generation [91.981439746404]
We introduce QASemConsistency, a new formalism for localizing factual inconsistencies in attributable text generation.
We first demonstrate the effectiveness of the QASemConsistency methodology for human annotation.
We then implement several methods for automatically detecting localized factual inconsistencies.
arXiv Detail & Related papers (2024-10-09T22:53:48Z) - Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding [118.75567341513897]
Existing methods typically analyze target text in isolation or solely with non-member contexts.
We propose Con-ReCall, a novel approach that leverages the asymmetric distributional shifts induced by member and non-member contexts.
arXiv Detail & Related papers (2024-09-05T09:10:38Z) - Federated Face Forgery Detection Learning with Personalized Representation [63.90408023506508]
Deep generator technology can produce high-quality fake videos that are indistinguishable, posing a serious social threat.
Traditional forgery detection methods directly centralized training on data.
The paper proposes a novel federated face forgery detection learning with personalized representation.
arXiv Detail & Related papers (2024-06-17T02:20:30Z) - SparseCL: Sparse Contrastive Learning for Contradiction Retrieval [87.02936971689817]
Contradiction retrieval refers to identifying and extracting documents that explicitly disagree with or refute the content of a query.
Existing methods such as similarity search and crossencoder models exhibit significant limitations.
We introduce SparseCL that leverages specially trained sentence embeddings designed to preserve subtle, contradictory nuances between sentences.
arXiv Detail & Related papers (2024-06-15T21:57:03Z) - Improving Smart Contract Security with Contrastive Learning-based Vulnerability Detection [8.121484960948303]
We propose Contrastive Learning Enhanced Automated Recognition Approach for Smart Contract Vulnerabilities, named Clear.
In particular, Clear employs a contrastive learning (CL) model to capture the fine-grained correlation information among contracts.
We show that Clear achieves optimal performance over all baseline methods; (2) 9.73%-39.99% higher F1-score than existing deep learning methods.
arXiv Detail & Related papers (2024-04-27T09:13:25Z) - A knowledge representation approach for construction contract knowledge
modeling [1.870031206586792]
The emergence of large language models (LLMs) presents an unprecedented opportunity to automate construction contract management.
LLMs may produce convincing yet inaccurate and misleading content due to a lack of domain expertise.
This paper introduces the Nested Contract Knowledge Graph (NCKG), a knowledge representation approach that captures the complexity of contract knowledge using a nested structure.
arXiv Detail & Related papers (2023-09-21T14:53:36Z) - On the problem of entity matching and its application in automated
settlement of receivables [47.187609203210705]
We consider setup, where base algorithm is used for preliminary ranking of matches.
We apply several novel methods to increase matching quality of base algorithm.
arXiv Detail & Related papers (2022-05-21T21:16:21Z) - Knowledge-Rich Self-Supervised Entity Linking [58.838404666183656]
Knowledge-RIch Self-Supervision ($tt KRISSBERT$) is a universal entity linker for four million UMLS entities.
Our approach subsumes zero-shot and few-shot methods, and can easily incorporate entity descriptions and gold mention labels if available.
Without using any labeled information, our method produces $tt KRISSBERT$, a universal entity linker for four million UMLS entities.
arXiv Detail & Related papers (2021-12-15T05:05:12Z) - Classification of Contract-Amendment Relationships [0.0]
We propose an approach based on machine learning (ML) and Natural Language Processing (NLP) to detect the amendment relationship between two documents.
The algorithm takes two PDF documents preprocessed by OCR (Optical Character Recognition) and NER (Named Entity Recognition) as input, and then it builds the features of each document pair.
arXiv Detail & Related papers (2021-06-08T07:57:10Z) - Learning to Check Contract Inconsistencies [26.4596456440168]
In many scenarios, a contract is written by filling the blanks in a precompiled form.
Due to carelessness, two blanks that should be filled with the same (or different)content may be incorrectly filled with different (or same) content.
In this work, we formulate a novel Contract Inconsistency Checking (CIC) problem, and design an end-to-end framework, called Pair-wise Blank Resolution (PBR)
Our PBR model contains a novel BlankCoder to address the challenge of modeling meaningless blanks.
arXiv Detail & Related papers (2020-12-15T08:43:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.