Classification of Contract-Amendment Relationships
- URL: http://arxiv.org/abs/2106.14619v1
- Date: Tue, 8 Jun 2021 07:57:10 GMT
- Title: Classification of Contract-Amendment Relationships
- Authors: Fuqi Song
- Abstract summary: We propose an approach based on machine learning (ML) and Natural Language Processing (NLP) to detect the amendment relationship between two documents.
The algorithm takes two PDF documents preprocessed by OCR (Optical Character Recognition) and NER (Named Entity Recognition) as input, and then it builds the features of each document pair.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In Contract Life-cycle Management (CLM), managing and tracking the master
agreements and their associated amendments is essential, in order to be kept
informed with different due dates and obligations. An automatic solution can
facilitate the daily jobs and improve the efficiency of legal practitioners. In
this paper, we propose an approach based on machine learning (ML) and Natural
Language Processing (NLP) to detect the amendment relationship between two
documents. The algorithm takes two PDF documents preprocessed by OCR (Optical
Character Recognition) and NER (Named Entity Recognition) as input, and then it
builds the features of each document pair and classifies the relationship. We
experimented with different configurations on a dataset consisting of 1124
pairs of contract-amendment documents in English and French. The best result
obtained a F1-score of 91%, which outperformed 23% compared to a
heuristic-based baseline.
Related papers
- Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - Learning from Litigation: Graphs and LLMs for Retrieval and Reasoning in eDiscovery [6.037276428689637]
This paper introduces DISCOvery Graph (DISCOG), a hybrid approach that combines the strengths of two worlds: a graph-based method for accurate document relevance prediction.
Our approach drastically reduces document review costs by 99.9% compared to manual processes and by 95% compared to LLM-based classification methods.
arXiv Detail & Related papers (2024-05-29T15:08:55Z) - In-context Pretraining: Language Modeling Beyond Document Boundaries [137.53145699439898]
In-Context Pretraining is a new approach where language models are pretrained on a sequence of related documents.
We introduce approximate algorithms for finding related documents with efficient nearest neighbor search.
We see notable improvements in tasks that require more complex contextual reasoning.
arXiv Detail & Related papers (2023-10-16T17:57:12Z) - A Hierarchical Neural Framework for Classification and its Explanation in Large Unstructured Legal Documents [0.5812284760539713]
We define this problem as "scarce annotated legal documents"
We propose a deep-learning-based classification framework which we call MESc.
We also propose an explanation extraction algorithm named ORSE.
arXiv Detail & Related papers (2023-09-19T12:18:28Z) - mCL-NER: Cross-Lingual Named Entity Recognition via Multi-view
Contrastive Learning [54.523172171533645]
Cross-lingual named entity recognition (CrossNER) faces challenges stemming from uneven performance due to the scarcity of multilingual corpora.
We propose Multi-view Contrastive Learning for Cross-lingual Named Entity Recognition (mCL-NER)
Our experiments on the XTREME benchmark, spanning 40 languages, demonstrate the superiority of mCL-NER over prior data-driven and model-based approaches.
arXiv Detail & Related papers (2023-08-17T16:02:29Z) - Towards Unsupervised Recognition of Token-level Semantic Differences in
Related Documents [61.63208012250885]
We formulate recognizing semantic differences as a token-level regression task.
We study three unsupervised approaches that rely on a masked language model.
Our results show that an approach based on word alignment and sentence-level contrastive learning has a robust correlation to gold labels.
arXiv Detail & Related papers (2023-05-22T17:58:04Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - ERICA: Improving Entity and Relation Understanding for Pre-trained
Language Models via Contrastive Learning [97.10875695679499]
We propose a novel contrastive learning framework named ERICA in pre-training phase to obtain a deeper understanding of the entities and their relations in text.
Experimental results demonstrate that our proposed ERICA framework achieves consistent improvements on several document-level language understanding tasks.
arXiv Detail & Related papers (2020-12-30T03:35:22Z) - Pairwise Multi-Class Document Classification for Semantic Relations
between Wikipedia Articles [5.40541521227338]
We model the problem of finding the relationship between two documents as a pairwise document classification task.
To find semantic relation between documents, we apply a series of techniques, such as GloVe, paragraph-s, BERT, and XLNet.
We perform our experiments on a newly proposed dataset of 32,168 Wikipedia article pairs and Wikidata properties that define the semantic document relations.
arXiv Detail & Related papers (2020-03-22T12:52:56Z) - Massively Multilingual Document Alignment with Cross-lingual
Sentence-Mover's Distance [8.395430195053061]
Document alignment aims to identify pairs of documents in two distinct languages that are of comparable content or translations of each other.
We develop an unsupervised scoring function that leverages cross-lingual sentence embeddings to compute the semantic distance between documents in different languages.
These semantic distances are then used to guide a document alignment algorithm to properly pair cross-lingual web documents across a variety of low, mid, and high-resource language pairs.
arXiv Detail & Related papers (2020-01-31T05:14:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.