Multimodal Side-Tuning for Document Classification
- URL: http://arxiv.org/abs/2301.07502v1
- Date: Mon, 16 Jan 2023 11:08:03 GMT
- Title: Multimodal Side-Tuning for Document Classification
- Authors: Stefano Pio Zingaro and Giuseppe Lisanti and Maurizio Gabbrielli
- Abstract summary: Side-tuning is a methodology for network adaptation recently introduced to solve some of the problems related to previous approaches.
We show that side-tuning can be successfully employed also when different data sources are considered.
- Score: 3.0229888038442914
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we propose to exploit the side-tuning framework for multimodal
document classification. Side-tuning is a methodology for network adaptation
recently introduced to solve some of the problems related to previous
approaches. Thanks to this technique it is actually possible to overcome model
rigidity and catastrophic forgetting of transfer learning by fine-tuning. The
proposed solution uses off-the-shelf deep learning architectures leveraging the
side-tuning framework to combine a base model with a tandem of two side
networks. We show that side-tuning can be successfully employed also when
different data sources are considered, e.g. text and images in document
classification. The experimental results show that this approach pushes further
the limit for document classification accuracy with respect to the state of the
art.
Related papers
- High-Performance Few-Shot Segmentation with Foundation Models: An Empirical Study [64.06777376676513]
We develop a few-shot segmentation (FSS) framework based on foundation models.
To be specific, we propose a simple approach to extract implicit knowledge from foundation models to construct coarse correspondence.
Experiments on two widely used datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-10T08:04:11Z) - DocXplain: A Novel Model-Agnostic Explainability Method for Document Image Classification [5.247930659596986]
This paper introduces DocXplain, a novel model-agnostic explainability method specifically designed for generating high interpretability feature attribution maps.
We extensively evaluate our proposed approach in the context of document image classification, utilizing 4 different evaluation metrics.
To the best of the authors' knowledge, this work presents the first model-agnostic attribution-based explainability method specifically tailored for document images.
arXiv Detail & Related papers (2024-07-04T10:59:15Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - LayeredDoc: Domain Adaptive Document Restoration with a Layer Separation Approach [9.643486775455841]
This paper introduces a text-graphic layer separation approach that enhances domain adaptability in document image restoration systems.
We propose LayeredDoc, which utilizes two layers of information: the first targets coarse-grained graphic components, while the second refines machine-printed textual content.
We evaluate our approach both qualitatively and quantitatively using a new real-world dataset, LayeredDocDB, developed for this study.
arXiv Detail & Related papers (2024-06-12T19:41:01Z) - Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to
Model Evaluation [6.7311791228366]
This paper introduces LyCORIS, an open-source library that offers a wide selection of fine-tuning methodologies for Stable Diffusion.
We also present a framework for the systematic assessment of varied fine-tuning techniques.
Our work provides essential insights into the nuanced effects of fine-tuning parameters, bridging the gap between state-of-the-art research and practical application.
arXiv Detail & Related papers (2023-09-26T11:36:26Z) - Switchable Representation Learning Framework with Self-compatibility [50.48336074436792]
We propose a Switchable representation learning Framework with Self-Compatibility (SFSC)
SFSC generates a series of compatible sub-models with different capacities through one training process.
SFSC achieves state-of-the-art performance on the evaluated datasets.
arXiv Detail & Related papers (2022-06-16T16:46:32Z) - RectiNet-v2: A stacked network architecture for document image dewarping [16.249023269158734]
We propose an end-to-end CNN architecture that can produce distortion free document images from warped documents it takes as input.
We train this model on warped document images simulated synthetically to compensate for lack of enough natural data.
We evaluate our method on the DocUNet dataset, a benchmark in this domain, and obtain results comparable to state-of-the-art methods.
arXiv Detail & Related papers (2021-02-01T19:26:17Z) - Unsupervised Neural Domain Adaptation for Document Image Binarization [13.848843012433187]
This paper proposes a method that combines neural networks and Domain Adaptation (DA) in order to carry out unsupervised document binarization.
Results show that our proposal successfully deals with the binarization of new document domains without the need for labeled data.
arXiv Detail & Related papers (2020-12-02T13:42:38Z) - Self-supervised Deep Reconstruction of Mixed Strip-shredded Text
Documents [63.41717168981103]
This work extends our previous deep learning method for single-page reconstruction to a more realistic/complex scenario.
In our approach, the compatibility evaluation is modeled as a two-class (valid or invalid) pattern recognition problem.
The proposed method outperforms the competing ones on complex scenarios, achieving accuracy superior to 90%.
arXiv Detail & Related papers (2020-07-01T21:48:05Z) - Learning Deformable Image Registration from Optimization: Perspective,
Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation.
We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z) - Fast(er) Reconstruction of Shredded Text Documents via Self-Supervised
Deep Asymmetric Metric Learning [62.34197797857823]
A central problem in automatic reconstruction of shredded documents is the pairwise compatibility evaluation of the shreds.
This work proposes a scalable deep learning approach for measuring pairwise compatibility in which the number of inferences scales linearly.
Our method has accuracy comparable to the state-of-the-art with a speed-up of about 22 times for a test instance with 505 shreds.
arXiv Detail & Related papers (2020-03-23T03:22:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.