The Law of Large Documents: Understanding the Structure of Legal
Contracts Using Visual Cues
- URL: http://arxiv.org/abs/2107.08128v1
- Date: Fri, 16 Jul 2021 21:21:50 GMT
- Title: The Law of Large Documents: Understanding the Structure of Legal
Contracts Using Visual Cues
- Authors: Allison Hegel, Marina Shah, Genevieve Peaslee, Brendan Roof, Emad
Elwany
- Abstract summary: We measure the impact of incorporating visual cues, obtained via computer vision methods, on the accuracy of document understanding tasks.
Our method of segmenting documents based on structural metadata out-performs existing methods on four long-document understanding tasks.
- Score: 0.7425558351422133
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large, pre-trained transformer models like BERT have achieved
state-of-the-art results on document understanding tasks, but most
implementations can only consider 512 tokens at a time. For many real-world
applications, documents can be much longer, and the segmentation strategies
typically used on longer documents miss out on document structure and
contextual information, hurting their results on downstream tasks. In our work
on legal agreements, we find that visual cues such as layout, style, and
placement of text in a document are strong features that are crucial to
achieving an acceptable level of accuracy on long documents. We measure the
impact of incorporating such visual cues, obtained via computer vision methods,
on the accuracy of document understanding tasks including document
segmentation, entity extraction, and attribute classification. Our method of
segmenting documents based on structural metadata out-performs existing methods
on four long-document understanding tasks as measured on the Contract
Understanding Atticus Dataset.
Related papers
- Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities.
Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z) - Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models [63.466265039007816]
We present DocGenome, a structured document benchmark constructed by annotating 500K scientific documents from 153 disciplines in the arXiv open-access community.
We conduct extensive experiments to demonstrate the advantages of DocGenome and objectively evaluate the performance of large models on our benchmark.
arXiv Detail & Related papers (2024-06-17T15:13:52Z) - DocumentNet: Bridging the Data Gap in Document Pre-Training [78.01647768018485]
We propose a method to collect massive-scale and weakly labeled data from the web to benefit the training of VDER models.
The collected dataset, named DocumentNet, does not depend on specific document types or entity sets.
Experiments on a set of broadly adopted VDER tasks show significant improvements when DocumentNet is incorporated into the pre-training.
arXiv Detail & Related papers (2023-06-15T08:21:15Z) - DLUE: Benchmarking Document Language Understanding [32.550855843975484]
There is no well-established consensus on how to comprehensively evaluate document understanding abilities.
This paper summarizes four representative abilities, i.e., document classification, document structural analysis, document information extraction, and document transcription.
Under the new evaluation framework, we propose textbfDocument Language Understanding Evaluation -- textbfDLUE, a new task suite.
arXiv Detail & Related papers (2023-05-16T15:16:24Z) - HADES: Homologous Automated Document Exploration and Summarization [3.3509104620016092]
HADES is designed to streamline the work of professionals dealing with large volumes of documents.
The tool employs a multi-step pipeline that begins with processing PDF documents using topic modeling, summarization, and analysis of the most important words for each topic.
arXiv Detail & Related papers (2023-02-25T15:16:10Z) - Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding.
UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input.
An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z) - DocSegTr: An Instance-Level End-to-End Document Image Segmentation
Transformer [16.03084865625318]
Business intelligence processes often require the extraction of useful semantic content from documents.
We present a transformer-based model for end-to-end segmentation of complex layouts in document images.
Our model achieved comparable or better segmentation performance than the existing state-of-the-art approaches.
arXiv Detail & Related papers (2022-01-27T10:50:22Z) - Timestamping Documents and Beliefs [1.4467794332678539]
Document dating is a challenging problem which requires inference over the temporal structure of the document.
In this paper we propose NeuralDater, a Graph Convolutional Network (GCN) based document dating approach.
We also propose AD3: Attentive Deep Document Dater, an attention-based document dating system.
arXiv Detail & Related papers (2021-06-09T02:12:18Z) - Multilevel Text Alignment with Cross-Document Attention [59.76351805607481]
Existing alignment methods operate at a single, predefined level.
We propose a new learning approach that equips previously established hierarchical attention encoders for representing documents with a cross-document attention component.
arXiv Detail & Related papers (2020-10-03T02:52:28Z) - Towards a Multi-modal, Multi-task Learning based Pre-training Framework
for Document Representation Learning [5.109216329453963]
We introduce Document Topic Modelling and Document Shuffle Prediction as novel pre-training tasks.
We utilize the Longformer network architecture as the backbone to encode the multi-modal information from multi-page documents in an end-to-end fashion.
arXiv Detail & Related papers (2020-09-30T05:39:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.