Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models
- URL: http://arxiv.org/abs/2412.13859v1
- Date: Wed, 18 Dec 2024 13:53:16 GMT
- Title: Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models
- Authors: Anna Scius-Bertrand, Michael Jungo, Lars Vögtlin, Jean-Marc Spat, Andreas Fischer,
- Abstract summary: Classifying scanned documents is a challenging problem that involves image, layout, and text analysis for document understanding.
For certain benchmark datasets, notably RVL-CDIP, the state of the art is closing in to near-perfect performance.
- Score: 0.2517406173566782
- License:
- Abstract: Classifying scanned documents is a challenging problem that involves image, layout, and text analysis for document understanding. Nevertheless, for certain benchmark datasets, notably RVL-CDIP, the state of the art is closing in to near-perfect performance when considering hundreds of thousands of training samples. With the advent of large language models (LLMs), which are excellent few-shot learners, the question arises to what extent the document classification problem can be addressed with only a few training samples, or even none at all. In this paper, we investigate this question in the context of zero-shot prompting and few-shot model fine-tuning, with the aim of reducing the need for human-annotated training samples as much as possible.
Related papers
- Recurrent Few-Shot model for Document Verification [1.9686770963118383]
General-purpose ID, or travel, document image- and video-based verification systems have yet to achieve good enough performance to be considered a solved problem.
We propose a recurrent-based model able to detect forged documents in a few-shot scenario.
Preliminary results on the SIDTD and Findit datasets show good performance of this model for this task.
arXiv Detail & Related papers (2024-10-03T13:05:27Z) - FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction [66.98008357232428]
We propose FineMatch, a new aspect-based fine-grained text and image matching benchmark.
FineMatch focuses on text and image mismatch detection and correction.
We show that models trained on FineMatch demonstrate enhanced proficiency in detecting fine-grained text and image mismatches.
arXiv Detail & Related papers (2024-04-23T03:42:14Z) - Noise-Aware Training of Layout-Aware Language Models [7.387030600322538]
Training a custom extractor that identifies named entities from a document requires a large number of instances of the target document type annotated at textual and visual modalities.
We propose a Noise-Aware Training method or NAT in this paper.
We show that NAT-trained models are not only robust in performance -- it outperforms a transfer-learning baseline by up to 6% in terms of macro-F1 score.
arXiv Detail & Related papers (2024-03-30T23:06:34Z) - Teaching Smaller Language Models To Generalise To Unseen Compositional
Questions [6.9076450524134145]
We propose a combination of multitask pretraining on up to 93 tasks designed to instill diverse reasoning abilities.
We show that performance can be significantly improved by adding retrieval-augmented training datasets.
arXiv Detail & Related papers (2023-08-02T05:00:12Z) - Evaluating Data Attribution for Text-to-Image Models [62.844382063780365]
We evaluate attribution through "customization" methods, which tune an existing large-scale model toward a given exemplar object or style.
Our key insight is that this allows us to efficiently create synthetic images that are computationally influenced by the exemplar by construction.
By taking into account the inherent uncertainty of the problem, we can assign soft attribution scores over a set of training images.
arXiv Detail & Related papers (2023-06-15T17:59:51Z) - Improving Handwritten OCR with Training Samples Generated by Glyph
Conditional Denoising Diffusion Probabilistic Model [10.239782333441031]
We propose a denoising diffusion probabilistic model (DDPM) to generate training samples.
This model creates mappings between printed characters and handwritten images.
Synthetic images are not always consistent with the glyph conditional images.
We propose a progressive data filtering strategy to add those samples with a high confidence of correctness to the training set.
arXiv Detail & Related papers (2023-05-31T04:18:30Z) - Questions Are All You Need to Train a Dense Passage Retriever [123.13872383489172]
ART is a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data.
It uses a new document-retrieval autoencoding scheme, where (1) an input question is used to retrieve a set of evidence documents, and (2) the documents are then used to compute the probability of reconstructing the original question.
arXiv Detail & Related papers (2022-06-21T18:16:31Z) - Revisiting Deep Local Descriptor for Improved Few-Shot Classification [56.74552164206737]
We show how one can improve the quality of embeddings by leveraging textbfDense textbfClassification and textbfAttentive textbfPooling.
We suggest to pool feature maps by applying attentive pooling instead of the widely used global average pooling (GAP) to prepare embeddings for few-shot classification.
arXiv Detail & Related papers (2021-03-30T00:48:28Z) - An Unsupervised Sampling Approach for Image-Sentence Matching Using
Document-Level Structural Information [64.66785523187845]
We focus on the problem of unsupervised image-sentence matching.
Existing research explores to utilize document-level structural information to sample positive and negative instances for model training.
We propose a new sampling strategy to select additional intra-document image-sentence pairs as positive or negative samples.
arXiv Detail & Related papers (2021-03-21T05:43:29Z) - One of these (Few) Things is Not Like the Others [0.0]
We propose a model which can both classify new images based on a small number of examples and recognize images which do not belong to any previously seen group.
We evaluate performance over a spectrum of model architectures, including setups small enough to be run on low powered devices.
arXiv Detail & Related papers (2020-05-22T21:49:35Z) - Any-Shot Object Detection [81.88153407655334]
'Any-shot detection' is where totally unseen and few-shot categories can simultaneously co-occur during inference.
We propose a unified any-shot detection model, that can concurrently learn to detect both zero-shot and few-shot object classes.
Our framework can also be used solely for Zero-shot detection and Few-shot detection tasks.
arXiv Detail & Related papers (2020-03-16T03:43:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.