Automatic Recognition of Learning Resource Category in a Digital Library
- URL: http://arxiv.org/abs/2401.12220v1
- Date: Tue, 28 Nov 2023 07:48:18 GMT
- Title: Automatic Recognition of Learning Resource Category in a Digital Library
- Authors: Soumya Banerjee, Debarshi Kumar Sanyal, Samiran Chattopadhyay, Plaban
Kumar Bhowmick, Partha Pratim Das
- Abstract summary: We introduce the Heterogeneous Learning Resources (HLR) dataset designed for document image classification.
The approach involves decomposing individual learning resources into constituent document images (sheets)
These images are then processed through an OCR tool to extract textual representation.
- Score: 6.865460045260549
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Digital libraries often face the challenge of processing a large volume of
diverse document types. The manual collection and tagging of metadata can be a
time-consuming and error-prone task. To address this, we aim to develop an
automatic metadata extractor for digital libraries. In this work, we introduce
the Heterogeneous Learning Resources (HLR) dataset designed for document image
classification. The approach involves decomposing individual learning resources
into constituent document images (sheets). These images are then processed
through an OCR tool to extract textual representation. State-of-the-art
classifiers are employed to classify both the document image and its textual
content. Subsequently, the labels of the constituent document images are
utilized to predict the label of the overall document.
Related papers
- Unifying Multimodal Retrieval via Document Screenshot Embedding [92.03571344075607]
Document Screenshot Embedding (DSE) is a novel retrieval paradigm that regards document screenshots as a unified input format.
We first craft the dataset of Wiki-SS, a 1.3M Wikipedia web page screenshots as the corpus to answer the questions from the Natural Questions dataset.
In such a text-intensive document retrieval setting, DSE shows competitive effectiveness compared to other text retrieval methods relying on parsing.
arXiv Detail & Related papers (2024-06-17T06:27:35Z) - CTP-Net: Character Texture Perception Network for Document Image Forgery
Localization [28.48117743313255]
We propose a Character Texture Perception Network (CTP-Net) to localize the forged regions in document images.
Considering the characters with semantics in a document image are highly vulnerable, capturing the forgery traces is the key to localize the forged regions.
The proposed-Net is able to localize multi-scale forged areas in document images, and outperform the state-of-the-art forgery localization methods.
arXiv Detail & Related papers (2023-08-04T06:37:28Z) - DocMAE: Document Image Rectification via Self-supervised Representation
Learning [144.44748607192147]
We present DocMAE, a novel self-supervised framework for document image rectification.
We first mask random patches of the background-excluded document images and then reconstruct the missing pixels.
With such a self-supervised learning approach, the network is encouraged to learn the intrinsic structure of deformed documents.
arXiv Detail & Related papers (2023-04-20T14:27:15Z) - Unifying Vision, Text, and Layout for Universal Document Processing [105.36490575974028]
We propose a Document AI model which unifies text, image, and layout modalities together with varied task formats, including document understanding and generation.
Our method sets the state-of-the-art on 9 Document AI tasks, e.g., document understanding and QA, across diverse data domains like finance reports, academic papers, and websites.
arXiv Detail & Related papers (2022-12-05T22:14:49Z) - I2DFormer: Learning Image to Document Attention for Zero-Shot Image
Classification [123.90912800376039]
Online textual documents, e.g., Wikipedia, contain rich visual descriptions about object classes.
We propose I2DFormer, a novel transformer-based ZSL framework that jointly learns to encode images and documents.
Our method leads to highly interpretable results where document words can be grounded in the image regions.
arXiv Detail & Related papers (2022-09-21T12:18:31Z) - Augraphy: A Data Augmentation Library for Document Images [59.457999432618614]
Augraphy is a Python library for constructing data augmentation pipelines.
It provides strategies to produce augmented versions of clean document images that appear to have been altered by standard office operations.
arXiv Detail & Related papers (2022-08-30T22:36:19Z) - Information Extraction from Scanned Invoice Images using Text Analysis
and Layout Features [0.0]
OCRMiner is designed to process documents in a similar way a human reader uses, i.e. to employ different layout and text attributes in a coordinated decision.
The system is able to recover the invoice data in 90% for English and in 88% for the Czech set.
arXiv Detail & Related papers (2022-08-08T09:46:33Z) - Open Set Classification of Untranscribed Handwritten Documents [56.0167902098419]
Huge amounts of digital page images of important manuscripts are preserved in archives worldwide.
The class or typology'' of a document is perhaps the most important tag to be included in the metadata.
The technical problem is one of automatic classification of documents, each consisting of a set of untranscribed handwritten text images.
arXiv Detail & Related papers (2022-06-20T20:43:50Z) - Graphical Object Detection in Document Images [30.48863304419383]
We present a novel end-to-end trainable deep learning based framework to localize graphical objects in the document images called as Graphical Object Detection (GOD)
Our framework is data-driven and does not require any meta-data to locate graphical objects in the document images.
Our model yields promising results as compared to state-of-the-art techniques.
arXiv Detail & Related papers (2020-08-25T06:35:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.