Graphical Object Detection in Document Images
- URL: http://arxiv.org/abs/2008.10843v1
- Date: Tue, 25 Aug 2020 06:35:57 GMT
- Title: Graphical Object Detection in Document Images
- Authors: Ranajit Saha and Ajoy Mondal and C. V. Jawahar
- Abstract summary: We present a novel end-to-end trainable deep learning based framework to localize graphical objects in the document images called as Graphical Object Detection (GOD)
Our framework is data-driven and does not require any meta-data to locate graphical objects in the document images.
Our model yields promising results as compared to state-of-the-art techniques.
- Score: 30.48863304419383
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Graphical elements: particularly tables and figures contain a visual summary
of the most valuable information contained in a document. Therefore,
localization of such graphical objects in the document images is the initial
step to understand the content of such graphical objects or document images. In
this paper, we present a novel end-to-end trainable deep learning based
framework to localize graphical objects in the document images called as
Graphical Object Detection (GOD). Our framework is data-driven and does not
require any heuristics or meta-data to locate graphical objects in the document
images. The GOD explores the concept of transfer learning and domain adaptation
to handle scarcity of labeled training images for graphical object detection
task in the document images. Performance analysis carried out on the various
public benchmark data sets: ICDAR-2013, ICDAR-POD2017,and UNLV shows that our
model yields promising results as compared to state-of-the-art techniques.
Related papers
- GraphKD: Exploring Knowledge Distillation Towards Document Object
Detection with Structured Graph Creation [14.511401955827875]
Object detection in documents is a key step to automate the structural elements identification process.
We present a graph-based knowledge distillation framework to correctly identify and localize the document objects in a document image.
arXiv Detail & Related papers (2024-02-17T23:08:32Z) - CiteTracker: Correlating Image and Text for Visual Tracking [114.48653709286629]
We propose the CiteTracker to enhance target modeling and inference in visual tracking by connecting images and text.
Specifically, we develop a text generation module to convert the target image patch into a descriptive text.
We then associate the target description and the search image using an attention-based correlation module to generate the correlated features for target state reference.
arXiv Detail & Related papers (2023-08-22T09:53:12Z) - Advancing Visual Grounding with Scene Knowledge: Benchmark and Method [74.72663425217522]
Visual grounding (VG) aims to establish fine-grained alignment between vision and language.
Most existing VG datasets are constructed using simple description texts.
We propose a novel benchmark of underlineScene underlineKnowledge-guided underlineVisual underlineGrounding.
arXiv Detail & Related papers (2023-07-21T13:06:02Z) - Line Graphics Digitization: A Step Towards Full Automation [29.017383766914406]
We present the Line Graphics (LG) dataset, which includes pixel-wise annotations of 5 coarse and 10 fine-grained categories.
Our dataset covers 520 images of mathematical graphics collected from 450 documents from different disciplines.
Our proposed dataset can support two different computer vision tasks, i.e., semantic segmentation and object detection.
arXiv Detail & Related papers (2023-07-05T07:08:58Z) - Learning to Generate Scene Graph from Natural Language Supervision [52.18175340725455]
We propose one of the first methods that learn from image-sentence pairs to extract a graphical representation of localized objects and their relationships within an image, known as scene graph.
We leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create "pseudo" labels for learning scene graph.
arXiv Detail & Related papers (2021-09-06T03:38:52Z) - CanvasVAE: Learning to Generate Vector Graphic Documents [1.8478165393315746]
We learn a generative model of vector graphic documents using a dataset of design templates from an online service.
In experiments, we show that our model, named CanvasVAE, constitutes a strong baseline for generative modeling of vector graphic documents.
arXiv Detail & Related papers (2021-08-03T02:14:25Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Multiple Document Datasets Pre-training Improves Text Line Detection
With Deep Neural Networks [2.5352713493505785]
We introduce a fully convolutional network for the document layout analysis task.
Our method Doc-UFCN relies on a U-shaped model trained from scratch for detecting objects from historical documents.
We show that Doc-UFCN outperforms state-of-the-art methods on various datasets.
arXiv Detail & Related papers (2020-12-28T09:48:33Z) - Learning Object Detection from Captions via Textual Scene Attributes [70.90708863394902]
We argue that captions contain much richer information about the image, including attributes of objects and their relations.
We present a method that uses the attributes in this "textual scene graph" to train object detectors.
We empirically demonstrate that the resulting model achieves state-of-the-art results on several challenging object detection datasets.
arXiv Detail & Related papers (2020-09-30T10:59:20Z) - Graph Edit Distance Reward: Learning to Edit Scene Graph [69.39048809061714]
We propose a new method to edit the scene graph according to the user instructions, which has never been explored.
To be specific, in order to learn editing scene graphs as the semantics given by texts, we propose a Graph Edit Distance Reward.
In the context of text-editing image retrieval, we validate the effectiveness of our method in CSS and CRIR dataset.
arXiv Detail & Related papers (2020-08-15T04:52:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.