Related papers: Context-Aware Classification of Legal Document Pages

Context-Aware Classification of Legal Document Pages

URL: http://arxiv.org/abs/2304.02787v2
Date: Tue, 25 Apr 2023 14:59:24 GMT
Title: Context-Aware Classification of Legal Document Pages
Authors: Pavlos Fragkogiannis, Martina Forster, Grace E. Lee, Dell Zhang
Abstract summary: We present a simple but effective approach that overcomes the constraint on input length. Specifically, we enhance the input with extra tokens carrying sequential information about previous pages. Our experiments conducted on two legal datasets in English and Portuguese respectively show that the proposed approach can significantly improve the performance of document page classification.
Score: 7.306025535482021
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: For many business applications that require the processing, indexing, and retrieval of professional documents such as legal briefs (in PDF format etc.), it is often essential to classify the pages of any given document into their corresponding types beforehand. Most existing studies in the field of document image classification either focus on single-page documents or treat multiple pages in a document independently. Although in recent years a few techniques have been proposed to exploit the context information from neighboring pages to enhance document page classification, they typically cannot be utilized with large pre-trained language models due to the constraint on input length. In this paper, we present a simple but effective approach that overcomes the above limitation. Specifically, we enhance the input with extra tokens carrying sequential information about previous pages - introducing recurrence - which enables the usage of pre-trained Transformer models like BERT for context-aware page classification. Our experiments conducted on two legal datasets in English and Portuguese respectively show that the proposed approach can significantly improve the performance of document page classification compared to the non-recurrent setup as well as the other context-aware baselines.

Related papers

Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities. Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z)
Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings. First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss. Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z)
μgat: Improving Single-Page Document Parsing by Providing Multi-Page Context [26.820913216377903]
This work focuses on Regesta Pontificum Romanum, a large collection of papal registers. Regesta are catalogs of summaries of other documents and, in some cases, are the only source of information about the content of such full-length documents.
arXiv Detail & Related papers (2024-08-28T09:01:18Z)
Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism [12.289101189321181]
Document Visual Question Answering (Document VQA) has garnered significant interest from both the document understanding and natural language processing communities. The state-of-the-art single-page Document VQA methods show impressive performance, yet in multi-page scenarios, these methods struggle. We propose a novel method and efficient training strategy for multi-page Document VQA tasks.
arXiv Detail & Related papers (2024-04-29T18:07:47Z)
GRAM: Global Reasoning for Multi-Page VQA [14.980413646626234]
We present GRAM, a method that seamlessly extends pre-trained single-page models to the multi-page setting. To do so, we leverage a single-page encoder for local page-level understanding, and enhance it with document-level designated layers and learnable tokens. For additional computational savings during decoding, we introduce an optional compression stage.
arXiv Detail & Related papers (2024-01-07T08:03:06Z)
In-context Pretraining: Language Modeling Beyond Document Boundaries [137.53145699439898]
In-Context Pretraining is a new approach where language models are pretrained on a sequence of related documents. We introduce approximate algorithms for finding related documents with efficient nearest neighbor search. We see notable improvements in tasks that require more complex contextual reasoning.
arXiv Detail & Related papers (2023-10-16T17:57:12Z)
Beyond Document Page Classification: Design, Datasets, and Challenges [32.94494070330065]
This paper highlights the need to bring document classification benchmarking closer to real-world applications. We identify the lack of public multi-page document classification datasets, formalize different classification tasks arising in application scenarios, and motivate the value of targeting efficient multi-page document representations.
arXiv Detail & Related papers (2023-08-24T16:16:47Z)
Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding. UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input. An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z)
Multilevel Text Alignment with Cross-Document Attention [59.76351805607481]
Existing alignment methods operate at a single, predefined level. We propose a new learning approach that equips previously established hierarchical attention encoders for representing documents with a cross-document attention component.
arXiv Detail & Related papers (2020-10-03T02:52:28Z)
Towards a Multi-modal, Multi-task Learning based Pre-training Framework for Document Representation Learning [5.109216329453963]
We introduce Document Topic Modelling and Document Shuffle Prediction as novel pre-training tasks. We utilize the Longformer network architecture as the backbone to encode the multi-modal information from multi-page documents in an end-to-end fashion.
arXiv Detail & Related papers (2020-09-30T05:39:04Z)
SPECTER: Document-level Representation Learning using Citation-informed Transformers [51.048515757909215]
SPECTER generates document-level embedding of scientific documents based on pretraining a Transformer language model. We introduce SciDocs, a new evaluation benchmark consisting of seven document-level tasks ranging from citation prediction to document classification and recommendation.
arXiv Detail & Related papers (2020-04-15T16:05:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.