Related papers: Multimodal Side-Tuning for Document Classification

Multimodal Side-Tuning for Document Classification

URL: http://arxiv.org/abs/2301.07502v1
Date: Mon, 16 Jan 2023 11:08:03 GMT
Title: Multimodal Side-Tuning for Document Classification
Authors: Stefano Pio Zingaro and Giuseppe Lisanti and Maurizio Gabbrielli
Abstract summary: Side-tuning is a methodology for network adaptation recently introduced to solve some of the problems related to previous approaches. We show that side-tuning can be successfully employed also when different data sources are considered.
Score: 3.0229888038442914
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In this paper, we propose to exploit the side-tuning framework for multimodal document classification. Side-tuning is a methodology for network adaptation recently introduced to solve some of the problems related to previous approaches. Thanks to this technique it is actually possible to overcome model rigidity and catastrophic forgetting of transfer learning by fine-tuning. The proposed solution uses off-the-shelf deep learning architectures leveraging the side-tuning framework to combine a base model with a tandem of two side networks. We show that side-tuning can be successfully employed also when different data sources are considered, e.g. text and images in document classification. The experimental results show that this approach pushes further the limit for document classification accuracy with respect to the state of the art.

Related papers

Task-Specific Adaptation with Restricted Model Access [23.114703555189937]
"Gray-box" fine-tuning approaches, where the model's architecture and weights remain hidden, allow only gradient propagation. We introduce a novel yet simple and effective framework that adapts to new tasks using two lightweight learnable modules at the model's input and output. We evaluate our approaches across several backbones on benchmarks such as text-image alignment, text-video alignment, and sketch-image alignment.
arXiv Detail & Related papers (2025-02-02T13:29:44Z)
Towards Compatible Fine-tuning for Vision-Language Model Updates [114.25776195225494]
Class-conditioned Context Optimization (ContCoOp) integrates learnable prompts with class embeddings using an attention layer before inputting them into the text encoder. Our experiments over 15 datasets show that our ContCoOp achieves the highest compatibility over the baseline methods, and exhibits robust out-of-distribution generalization.
arXiv Detail & Related papers (2024-12-30T12:06:27Z)
High-Performance Few-Shot Segmentation with Foundation Models: An Empirical Study [64.06777376676513]
We develop a few-shot segmentation (FSS) framework based on foundation models. To be specific, we propose a simple approach to extract implicit knowledge from foundation models to construct coarse correspondence. Experiments on two widely used datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-10T08:04:11Z)
DocXplain: A Novel Model-Agnostic Explainability Method for Document Image Classification [5.247930659596986]
This paper introduces DocXplain, a novel model-agnostic explainability method specifically designed for generating high interpretability feature attribution maps. We extensively evaluate our proposed approach in the context of document image classification, utilizing 4 different evaluation metrics. To the best of the authors' knowledge, this work presents the first model-agnostic attribution-based explainability method specifically tailored for document images.
arXiv Detail & Related papers (2024-07-04T10:59:15Z)
Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images. We identify model weaknesses by testing the model using the counterfactual image dataset. We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z)
LayeredDoc: Domain Adaptive Document Restoration with a Layer Separation Approach [9.643486775455841]
This paper introduces a text-graphic layer separation approach that enhances domain adaptability in document image restoration systems. We propose LayeredDoc, which utilizes two layers of information: the first targets coarse-grained graphic components, while the second refines machine-printed textual content. We evaluate our approach both qualitatively and quantitatively using a new real-world dataset, LayeredDocDB, developed for this study.
arXiv Detail & Related papers (2024-06-12T19:41:01Z)
Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to Model Evaluation [6.7311791228366]
This paper introduces LyCORIS, an open-source library that offers a wide selection of fine-tuning methodologies for Stable Diffusion. We also present a framework for the systematic assessment of varied fine-tuning techniques. Our work provides essential insights into the nuanced effects of fine-tuning parameters, bridging the gap between state-of-the-art research and practical application.
arXiv Detail & Related papers (2023-09-26T11:36:26Z)
Switchable Representation Learning Framework with Self-compatibility [50.48336074436792]
We propose a Switchable representation learning Framework with Self-Compatibility (SFSC) SFSC generates a series of compatible sub-models with different capacities through one training process. SFSC achieves state-of-the-art performance on the evaluated datasets.
arXiv Detail & Related papers (2022-06-16T16:46:32Z)
RectiNet-v2: A stacked network architecture for document image dewarping [16.249023269158734]
We propose an end-to-end CNN architecture that can produce distortion free document images from warped documents it takes as input. We train this model on warped document images simulated synthetically to compensate for lack of enough natural data. We evaluate our method on the DocUNet dataset, a benchmark in this domain, and obtain results comparable to state-of-the-art methods.
arXiv Detail & Related papers (2021-02-01T19:26:17Z)
Unsupervised Neural Domain Adaptation for Document Image Binarization [13.848843012433187]
This paper proposes a method that combines neural networks and Domain Adaptation (DA) in order to carry out unsupervised document binarization. Results show that our proposal successfully deals with the binarization of new document domains without the need for labeled data.
arXiv Detail & Related papers (2020-12-02T13:42:38Z)
Self-supervised Deep Reconstruction of Mixed Strip-shredded Text Documents [63.41717168981103]
This work extends our previous deep learning method for single-page reconstruction to a more realistic/complex scenario. In our approach, the compatibility evaluation is modeled as a two-class (valid or invalid) pattern recognition problem. The proposed method outperforms the competing ones on complex scenarios, achieving accuracy superior to 90%.
arXiv Detail & Related papers (2020-07-01T21:48:05Z)
Learning Deformable Image Registration from Optimization: Perspective, Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation. We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)
Fast(er) Reconstruction of Shredded Text Documents via Self-Supervised Deep Asymmetric Metric Learning [62.34197797857823]
A central problem in automatic reconstruction of shredded documents is the pairwise compatibility evaluation of the shreds. This work proposes a scalable deep learning approach for measuring pairwise compatibility in which the number of inferences scales linearly. Our method has accuracy comparable to the state-of-the-art with a speed-up of about 22 times for a test instance with 505 shreds.
arXiv Detail & Related papers (2020-03-23T03:22:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.