Cross-Domain Document Object Detection: Benchmark Suite and Method
- URL: http://arxiv.org/abs/2003.13197v1
- Date: Mon, 30 Mar 2020 03:04:51 GMT
- Title: Cross-Domain Document Object Detection: Benchmark Suite and Method
- Authors: Kai Li, Curtis Wigington, Chris Tensmeyer, Handong Zhao, Nikolaos
Barmpalios, Vlad I. Morariu, Varun Manjunatha, Tong Sun, Yun Fu
- Abstract summary: Document object detection (DOD) is fundamental for downstream tasks like intelligent document editing and understanding.
We investigate cross-domain DOD, where the goal is to learn a detector for the target domain using labeled data from the source domain and only unlabeled data from the target domain.
For each dataset, we provide the page images, bounding box annotations, PDF files, and the rendering layers extracted from the PDF files.
- Score: 71.4339949510586
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decomposing images of document pages into high-level semantic regions (e.g.,
figures, tables, paragraphs), document object detection (DOD) is fundamental
for downstream tasks like intelligent document editing and understanding. DOD
remains a challenging problem as document objects vary significantly in layout,
size, aspect ratio, texture, etc. An additional challenge arises in practice
because large labeled training datasets are only available for domains that
differ from the target domain. We investigate cross-domain DOD, where the goal
is to learn a detector for the target domain using labeled data from the source
domain and only unlabeled data from the target domain. Documents from the two
domains may vary significantly in layout, language, and genre. We establish a
benchmark suite consisting of different types of PDF document datasets that can
be utilized for cross-domain DOD model training and evaluation. For each
dataset, we provide the page images, bounding box annotations, PDF files, and
the rendering layers extracted from the PDF files. Moreover, we propose a novel
cross-domain DOD model which builds upon the standard detection model and
addresses domain shifts by incorporating three novel alignment modules: Feature
Pyramid Alignment (FPA) module, Region Alignment (RA) module and Rendering
Layer alignment (RLA) module. Extensive experiments on the benchmark suite
substantiate the efficacy of the three proposed modules and the proposed method
significantly outperforms the baseline methods. The project page is at
\url{https://github.com/kailigo/cddod}.
Related papers
- Bidirectional Generative Framework for Cross-domain Aspect-based
Sentiment Analysis [68.742820522137]
Cross-domain aspect-based sentiment analysis (ABSA) aims to perform various fine-grained sentiment analysis tasks on a target domain by transferring knowledge from a source domain.
We propose a unified bidirectional generative framework to tackle various cross-domain ABSA tasks.
Our framework trains a generative model in both text-to-label and label-to-text directions.
arXiv Detail & Related papers (2023-05-16T15:02:23Z) - WUDA: Unsupervised Domain Adaptation Based on Weak Source Domain Labels [5.718326013810649]
Unsupervised domain adaptation (UDA) for semantic segmentation addresses the cross-domain problem with fine source domain labels.
This paper defines a new task: unsupervised domain adaptation based on weak source domain labels.
arXiv Detail & Related papers (2022-10-05T08:28:57Z) - Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval [55.122020263319634]
Video moment retrieval (VMR) aims to localize the target moment from an untrimmed video according to a given language query.
In this paper, we focus on a novel task: cross-domain VMR, where fully-annotated datasets are available in one domain but the domain of interest only contains unannotated datasets.
We propose a novel Multi-Modal Cross-Domain Alignment network to transfer the annotation knowledge from the source domain to the target domain.
arXiv Detail & Related papers (2022-09-23T12:58:20Z) - Cross-Domain Document Layout Analysis Using Document Style Guide [15.799572801059716]
Document layout analysis (DLA) aims to decompose document images into high-level semantic areas.
Many researchers devoted this challenge by synthesizing data to build large training sets.
In this paper, we propose an unsupervised cross-domain DLA framework based on document style guidance.
arXiv Detail & Related papers (2022-01-24T00:49:19Z) - Domain Adaptation for Real-World Single View 3D Reconstruction [1.611271868398988]
unsupervised domain adaptation can be used to transfer knowledge from the labeled synthetic source domain to the unlabeled real target domain.
We propose a novel architecture which takes advantage of the fact that in this setting, target domain data is unsupervised with regards to the 3D model but supervised for class labels.
Results are performed with ShapeNet as the source domain and domains within the Object Domain Suite (ODDS) dataset as the target.
arXiv Detail & Related papers (2021-08-24T22:02:27Z) - Meta-FDMixup: Cross-Domain Few-Shot Learning Guided by Labeled Target
Data [95.47859525676246]
A recent study finds that existing few-shot learning methods, trained on the source domain, fail to generalize to the novel target domain when a domain gap is observed.
In this paper, we realize that the labeled target data in Cross-Domain Few-Shot Learning has not been leveraged in any way to help the learning process.
arXiv Detail & Related papers (2021-07-26T06:15:45Z) - Cross-domain Contrastive Learning for Unsupervised Domain Adaptation [108.63914324182984]
Unsupervised domain adaptation (UDA) aims to transfer knowledge learned from a fully-labeled source domain to a different unlabeled target domain.
We build upon contrastive self-supervised learning to align features so as to reduce the domain discrepancy between training and testing sets.
arXiv Detail & Related papers (2021-06-10T06:32:30Z) - Inferring Latent Domains for Unsupervised Deep Domain Adaptation [54.963823285456925]
Unsupervised Domain Adaptation (UDA) refers to the problem of learning a model in a target domain where labeled data are not available.
This paper introduces a novel deep architecture which addresses the problem of UDA by automatically discovering latent domains in visual datasets.
We evaluate our approach on publicly available benchmarks, showing that it outperforms state-of-the-art domain adaptation methods.
arXiv Detail & Related papers (2021-03-25T14:33:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.