CompTLL-UNet: Compressed Domain Text-Line Localization in Challenging
Handwritten Documents using Deep Feature Learning from JPEG Coefficients
- URL: http://arxiv.org/abs/2308.06142v1
- Date: Fri, 11 Aug 2023 14:02:52 GMT
- Title: CompTLL-UNet: Compressed Domain Text-Line Localization in Challenging
Handwritten Documents using Deep Feature Learning from JPEG Coefficients
- Authors: Bulla Rajesh and Sk Mahafuz Zaman and Mohammed Javed and P.
Nagabhushan
- Abstract summary: We propose an idea that employs deep feature learning directly from the JPEG compressed coefficients without full decompression to accomplish text-line localization in the JPEG compressed domain.
A modified U-Net architecture known as Compressed Text-Line localization Network (CompTLL-UNet) is designed to accomplish it.
The model is trained and tested with JPEG compressed version of benchmark datasets including ICDAR 2017 (cBAD) and ICDAR 2019 (cBAD)
- Score: 0.9405458160620535
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic localization of text-lines in handwritten documents is still an
open and challenging research problem. Various writing issues such as uneven
spacing between the lines, oscillating and touching text, and the presence of
skew become much more challenging when the case of complex handwritten document
images are considered for segmentation directly in their respective compressed
representation. This is because, the conventional way of processing compressed
documents is through decompression, but here in this paper, we propose an idea
that employs deep feature learning directly from the JPEG compressed
coefficients without full decompression to accomplish text-line localization in
the JPEG compressed domain. A modified U-Net architecture known as Compressed
Text-Line Localization Network (CompTLL-UNet) is designed to accomplish it. The
model is trained and tested with JPEG compressed version of benchmark datasets
including ICDAR2017 (cBAD) and ICDAR2019 (cBAD), reporting the state-of-the-art
performance with reduced storage and computational costs in the JPEG compressed
domain.
Related papers
- The JPEG Pleno Learning-based Point Cloud Coding Standard: Serving Man and Machine [49.16996486119006]
Deep learning has emerged as a powerful tool in point cloud coding.
JPEG has recently finalized the JPEG Pleno Learning-based Point Cloud Coding standard.
This paper provides a complete technical description of the JPEG PCC standard.
arXiv Detail & Related papers (2024-09-12T15:20:23Z) - Compressed-Language Models for Understanding Compressed File Formats: a JPEG Exploration [82.88166538896331]
We focus on the JPEG format as a representative CFF, given its commonality and its representativeness of key concepts in compression.
We test if CLMs understand the JPEG format by probing their capabilities to perform along three axes: recognition of inherent file properties, handling of files with anomalies, and generation of new files.
Results suggest that CLMs can understand the semantics of compressed data when directly operating on the byte streams of files produced by CFFs.
arXiv Detail & Related papers (2024-05-27T13:09:23Z) - Learned Lossless Compression for JPEG via Frequency-Domain Prediction [50.20577108662153]
We propose a novel framework for learned lossless compression of JPEG images.
To enable learning in the frequency domain, DCT coefficients are partitioned into groups to utilize implicit local redundancy.
An autoencoder-like architecture is designed based on the weight-shared blocks to realize entropy modeling of grouped DCT coefficients.
arXiv Detail & Related papers (2023-03-05T13:15:28Z) - T2CI-GAN: Text to Compressed Image generation using Generative
Adversarial Network [9.657133242509671]
In practice, most of the visual data are processed and transmitted in the compressed representation form.
The proposed work attempts to generate the visual data directly in the compressed representation form using Deep Convolutional GANs (DCGANs)
The first model is directly trained with JPEG compressed DCT images (compressed domain) to generate the compressed images from text descriptions.
The second model is trained with RGB images (pixel domain) to generate JPEG compressed DCT representation from text descriptions.
arXiv Detail & Related papers (2022-10-01T09:26:25Z) - Document Image Binarization in JPEG Compressed Domain using Dual
Discriminator Generative Adversarial Networks [0.0]
The proposed model has been thoroughly tested with different versions of DIBCO dataset having challenges like holes, erased or smudged ink, dust, and misplaced fibres.
The model proved to be highly robust, efficient both in terms of time and space complexities, and also resulted in state-of-the-art performance in JPEG compressed domain.
arXiv Detail & Related papers (2022-09-13T12:07:32Z) - OCR for TIFF Compressed Document Images Directly in Compressed Domain
Using Text segmentation and Hidden Markov Model [0.0]
We propose a novel idea of developing an OCR for CCITT (The International Telegraph and Telephone Consultative Committee) compressed machine printed TIFF document images directly in the compressed domain.
After segmenting text regions into lines and words, HMM is applied for recognition using three coding modes of CCITT- horizontal, vertical and the pass mode.
arXiv Detail & Related papers (2022-09-13T06:34:26Z) - Learning-based Compression for Material and Texture Recognition [23.668803886355683]
This paper is concerned with learning-based compression schemes whose compressed-domain representations can be utilized to perform visual processing and computer vision tasks directly in the compressed domain.
We adopt the learning-based JPEG-AI framework for performing material and texture recognition using the compressed-domain latent representation at varing bit-rates.
It is also shown that the compressed-domain classification can yield a competitive performance in terms of Top-1 and Top-5 accuracy while using a smaller reduced-complexity classification model.
arXiv Detail & Related papers (2021-04-16T23:16:26Z) - Text Compression-aided Transformer Encoding [77.16960983003271]
We propose explicit and implicit text compression approaches to enhance the Transformer encoding.
backbone information, meaning the gist of the input text, is not specifically focused on.
Our evaluation on benchmark datasets shows that the proposed explicit and implicit text compression approaches improve results in comparison to strong baselines.
arXiv Detail & Related papers (2021-02-11T11:28:39Z) - Learning to Improve Image Compression without Changing the Standard
Decoder [100.32492297717056]
We propose learning to improve the encoding performance with the standard decoder.
Specifically, a frequency-domain pre-editing method is proposed to optimize the distribution of DCT coefficients.
We do not modify the JPEG decoder and therefore our approach is applicable when viewing images with the widely used standard JPEG decoder.
arXiv Detail & Related papers (2020-09-27T19:24:42Z) - Quantization Guided JPEG Artifact Correction [69.04777875711646]
We develop a novel architecture for artifact correction using the JPEG files quantization matrix.
This allows our single model to achieve state-of-the-art performance over models trained for specific quality settings.
arXiv Detail & Related papers (2020-04-17T00:10:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.