OLALA: Object-Level Active Learning for Efficient Document Layout
Annotation
- URL: http://arxiv.org/abs/2010.01762v3
- Date: Mon, 29 Mar 2021 19:32:25 GMT
- Title: OLALA: Object-Level Active Learning for Efficient Document Layout
Annotation
- Authors: Zejiang Shen, Jian Zhao, Melissa Dell, Yaoliang Yu, Weining Li
- Abstract summary: We propose an Object-Level Active Learning framework for efficient document layout.
In this framework, only regions with the most ambiguous object predictions within an image are selected for annotators to label.
For unselected predictions, the semi-automatic correction algorithm is proposed to identify certain errors based on prior knowledge of layout structures.
- Score: 24.453873808984415
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Document images often have intricate layout structures, with numerous content
regions (e.g. texts, figures, tables) densely arranged on each page. This makes
the manual annotation of layout datasets expensive and inefficient. These
characteristics also challenge existing active learning methods, as image-level
scoring and selection suffer from the overexposure of common objects.Inspired
by recent progresses in semi-supervised learning and self-training, we propose
an Object-Level Active Learning framework for efficient document layout
Annotation, OLALA. In this framework, only regions with the most ambiguous
object predictions within an image are selected for annotators to label,
optimizing the use of the annotation budget. For unselected predictions, the
semi-automatic correction algorithm is proposed to identify certain errors
based on prior knowledge of layout structures and rectifies them with minor
supervision. Additionally, we carefully design a perturbation-based object
scoring function for document images. It governs the object selection process
via evaluating prediction ambiguities, and considers both the positions and
categories of predicted layout objects. Extensive experiments show that OLALA
can significantly boost model performance and improve annotation efficiency,
given the same labeling budget. Code for this paper can be accessed via
https://github.com/lolipopshock/detectron2_al.
Related papers
- GraphKD: Exploring Knowledge Distillation Towards Document Object
Detection with Structured Graph Creation [14.511401955827875]
Object detection in documents is a key step to automate the structural elements identification process.
We present a graph-based knowledge distillation framework to correctly identify and localize the document objects in a document image.
arXiv Detail & Related papers (2024-02-17T23:08:32Z) - Open-Vocabulary Camouflaged Object Segmentation [66.94945066779988]
We introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS)
We construct a large-scale complex scene dataset (textbfOVCamo) containing 11,483 hand-selected images with fine annotations and corresponding object classes.
By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects.
arXiv Detail & Related papers (2023-11-19T06:00:39Z) - Learning Object-Language Alignments for Open-Vocabulary Object Detection [83.09560814244524]
We propose a novel open-vocabulary object detection framework directly learning from image-text pair data.
It enables us to train an open-vocabulary object detector on image-text pairs in a much simple and effective way.
arXiv Detail & Related papers (2022-11-27T14:47:31Z) - ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich
Document Understanding [52.3895498789521]
We propose ERNIE, a novel document pre-training solution with layout knowledge enhancement.
We first rearrange input sequences in the serialization stage, then present a correlative pre-training task, reading order prediction, and learn the proper reading order of documents.
Experimental results show ERNIE achieves superior performance on various downstream tasks, setting new state-of-the-art on key information, and document question answering.
arXiv Detail & Related papers (2022-10-12T12:59:24Z) - The Weak Supervision Landscape [5.186945902380689]
We propose a framework for categorising weak supervision settings.
We identify the key elements that characterise weak supervision and devise a series of dimensions that categorise most of the existing approaches.
We show how common settings in the literature fit within the framework and discuss its possible uses in practice.
arXiv Detail & Related papers (2022-03-30T13:19:43Z) - Assisted Text Annotation Using Active Learning to Achieve High Quality
with Little Effort [9.379650501033465]
We propose a tool that enables researchers to create large, high-quality, annotated datasets with only a few manual annotations.
We combine an active learning (AL) approach with a pre-trained language model to semi-automatically identify annotation categories.
Our preliminary results show that employing AL strongly reduces the number of annotations for correct classification of even complex and subtle frames.
arXiv Detail & Related papers (2021-12-15T13:14:58Z) - OPAD: An Optimized Policy-based Active Learning Framework for Document
Content Analysis [6.159771892460152]
We propose textitOPAD, a novel framework using reinforcement policy for active learning in content detection tasks for documents.
The framework learns the acquisition function to decide the samples to be selected while optimizing performance metrics.
We show superior performance of the proposed textitOPAD framework for active learning for various tasks related to document understanding.
arXiv Detail & Related papers (2021-10-01T07:40:56Z) - Towards Good Practices for Efficiently Annotating Large-Scale Image
Classification Datasets [90.61266099147053]
We investigate efficient annotation strategies for collecting multi-class classification labels for a large collection of images.
We propose modifications and best practices aimed at minimizing human labeling effort.
Simulated experiments on a 125k image subset of the ImageNet100 show that it can be annotated to 80% top-1 accuracy with 0.35 annotations per image on average.
arXiv Detail & Related papers (2021-04-26T16:29:32Z) - Weakly-Supervised Salient Object Detection via Scribble Annotations [54.40518383782725]
We propose a weakly-supervised salient object detection model to learn saliency from scribble labels.
We present a new metric, termed saliency structure measure, to measure the structure alignment of the predicted saliency maps.
Our method not only outperforms existing weakly-supervised/unsupervised methods, but also is on par with several fully-supervised state-of-the-art models.
arXiv Detail & Related papers (2020-03-17T12:59:50Z) - Towards Using Count-level Weak Supervision for Crowd Counting [55.58468947486247]
This paper studies the problem of weakly-supervised crowd counting which learns a model from only a small amount of location-level annotations (fully-supervised) but a large amount of count-level annotations (weakly-supervised)
We devise a simple-yet-effective training strategy, namely Multiple Auxiliary Tasks Training (MATT), to construct regularizes for restricting the freedom of the generated density maps.
arXiv Detail & Related papers (2020-02-29T02:58:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.