Related papers: RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization

RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization

URL: http://arxiv.org/abs/2404.09530v2
Date: Fri, 19 Apr 2024 06:44:18 GMT
Title: RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization
Authors: Avinash Anand, Raj Jaiswal, Mohit Gupta, Siddhesh S Bangar, Pijush Bhuyan, Naman Lal, Rajeev Singh, Ritika Jha, Rajiv Ratn Shah, Shin'ichi Satoh,
Abstract summary: RanLayNet is a synthetic document dataset enriched with automatically assigned labels. We show that a deep layout identification model trained on our dataset exhibits enhanced performance compared to a model trained solely on actual documents.
Score: 36.973388673687815
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large ground-truth datasets and recent advances in deep learning techniques have been useful for layout detection. However, because of the restricted layout diversity of these datasets, training on them requires a sizable number of annotated instances, which is both expensive and time-consuming. As a result, differences between the source and target domains may significantly impact how well these models function. To solve this problem, domain adaptation approaches have been developed that use a small quantity of labeled data to adjust the model to the target domain. In this research, we introduced a synthetic document dataset called RanLayNet, enriched with automatically assigned labels denoting spatial positions, ranges, and types of layout elements. The primary aim of this endeavor is to develop a versatile dataset capable of training models with robustness and adaptability to diverse document formats. Through empirical experimentation, we demonstrate that a deep layout identification model trained on our dataset exhibits enhanced performance compared to a model trained solely on actual documents. Moreover, we conduct a comparative analysis by fine-tuning inference models using both PubLayNet and IIIT-AR-13K datasets on the Doclaynet dataset. Our findings emphasize that models enriched with our dataset are optimal for tasks such as achieving 0.398 and 0.588 mAP95 score in the scientific document domain for the TABLE class.

Related papers

Prompt Tuning Vision Language Models with Margin Regularizer for Few-Shot Learning under Distribution Shifts [13.21626568246313]
We analyze whether vision-language foundation models can be adapted to target datasets with very different distributions and classes.<n>We propose a novel prompt-tuning method, PromptMargin, for adapting such large-scale VLMs directly on the few target samples.<n>PromptMargin effectively tunes the text as well as visual prompts for this task, and has two main modules.
arXiv Detail & Related papers (2025-05-21T13:26:56Z)
Similarity-Based Domain Adaptation with LLMs [13.692329347889212]
Unsupervised domain adaptation leverages abundant labeled data from various source domains to generalize onto unlabeled target data. This paper introduces a simple framework that utilizes the impressive generalization capabilities of Large Language Models (LLMs) for target data annotation. Our framework achieves impressive performance, specifically, 2.44% accuracy improvement when compared to the SOTA method.
arXiv Detail & Related papers (2025-03-07T09:51:07Z)
Cross-Domain Content Generation with Domain-Specific Small Language Models [3.2772349789781616]
This study explores methods to enable a small language model to produce coherent and relevant outputs for two different domains. We find that utilizing custom tokenizers tailored to each dataset significantly enhances generation quality. Our findings demonstrate that knowledge expansion with frozen layers is an effective method for small language models to generate domain-specific content.
arXiv Detail & Related papers (2024-09-19T21:45:13Z)
infoVerse: A Universal Framework for Dataset Characterization with Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization. infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information. In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z)
Stacking Ensemble Learning in Deep Domain Adaptation for Ophthalmic Image Classification [61.656149405657246]
Domain adaptation is effective in image classification tasks where obtaining sufficient label data is challenging. We propose a novel method, named SELDA, for stacking ensemble learning via extending three domain adaptation methods. The experimental results using Age-Related Eye Disease Study (AREDS) benchmark ophthalmic dataset demonstrate the effectiveness of the proposed model.
arXiv Detail & Related papers (2022-09-27T14:19:00Z)
Domain Alignment Meets Fully Test-Time Adaptation [24.546705919244936]
A foundational requirement of a deployed ML model is to generalize to data drawn from a testing distribution that is different from training. In this paper, we focus on a challenging variant of this problem, where access to the original source data is restricted. We propose a new approach, CATTAn, that bridges UDA and FTTA, by relaxing the need to access entire source data.
arXiv Detail & Related papers (2022-07-09T03:17:19Z)
CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance. In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z)
Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection [85.11649974840758]
3D object detection networks tend to be biased towards the data they are trained on. We propose a single-frame approach for source-free, unsupervised domain adaptation of lidar-based 3D object detectors.
arXiv Detail & Related papers (2021-11-30T18:42:42Z)
Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings. We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data. We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
Learning to Cluster under Domain Shift [20.00056591000625]
In this work we address the problem of transferring knowledge from a source to a target domain when both source and target data have no annotations. Inspired by recent works on deep clustering, our approach leverages information from data gathered from multiple source domains. We show that our method is able to automatically discover relevant semantic information even in presence of few target samples.
arXiv Detail & Related papers (2020-08-11T12:03:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.