Related papers: TFIC: End-to-End Text-Focused Image Compression for Coding for Machines

TFIC: End-to-End Text-Focused Image Compression for Coding for Machines

URL: http://arxiv.org/abs/2503.19495v1
Date: Tue, 25 Mar 2025 09:36:13 GMT
Title: TFIC: End-to-End Text-Focused Image Compression for Coding for Machines
Authors: Stefano Della Fiore, Alessandro Gnutti, Marco Dalai, Pierangelo Migliorati, Riccardo Leonardi,
Abstract summary: We present an image compression system designed to retain text-specific features for subsequent Optical Character Recognition (OCR)<n>Our encoding process requires half the time needed by the OCR module, making it especially suitable for devices with limited computational capacity.
Score: 50.86328069558113
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Traditional image compression methods aim to faithfully reconstruct images for human perception. In contrast, Coding for Machines focuses on compressing images to preserve information relevant to a specific machine task. In this paper, we present an image compression system designed to retain text-specific features for subsequent Optical Character Recognition (OCR). Our encoding process requires half the time needed by the OCR module, making it especially suitable for devices with limited computational capacity. In scenarios where on-device OCR is computationally prohibitive, images are compressed and later processed to recover the text content. Experimental results demonstrate that our method achieves significant improvements in text extraction accuracy at low bitrates, even improving over the accuracy of OCR performed on uncompressed images, thus acting as a local pre-processing step.

Related papers

Efficient Masked Image Compression with Position-Indexed Self-Attention [6.64044416324419]
We propose an image compression method based on a position-indexed self-attention mechanism. Compared to existing semantic-structured compression methods, our approach can significantly reduce computational costs.
arXiv Detail & Related papers (2025-04-17T13:12:39Z)
Decoder Pre-Training with only Text for Scene Text Recognition [54.93037783663204]
Scene text recognition (STR) pre-training methods have achieved remarkable progress, primarily relying on synthetic datasets. We introduce a novel method named Decoder Pre-training with only text for STR (DPTR) DPTR treats text embeddings produced by the CLIP text encoder as pseudo visual embeddings and uses them to pre-train the decoder.
arXiv Detail & Related papers (2024-08-11T06:36:42Z)
Optimization of Image Processing Algorithms for Character Recognition in Cultural Typewritten Documents [0.8158530638728501]
This paper evaluates the impact of image processing methods and parameter tuning in Optical Character Recognition (OCR) The approach uses a multi-objective problem formulation to minimize Levenshtein edit distance and maximize the number of words correctly identified with a non-dominated sorting genetic algorithm (NSGA-II) Our findings suggest that employing image pre-processing algorithms in OCR might be more suitable for typologies where the text recognition task without pre-processing does not produce good results.
arXiv Detail & Related papers (2023-11-27T11:44:46Z)
Perceptual Image Compression with Cooperative Cross-Modal Side Information [53.356714177243745]
We propose a novel deep image compression method with text-guided side information to achieve a better rate-perception-distortion tradeoff. Specifically, we employ the CLIP text encoder and an effective Semantic-Spatial Aware block to fuse the text and image features.
arXiv Detail & Related papers (2023-11-23T08:31:11Z)
OCR for TIFF Compressed Document Images Directly in Compressed Domain Using Text segmentation and Hidden Markov Model [0.0]
We propose a novel idea of developing an OCR for CCITT (The International Telegraph and Telephone Consultative Committee) compressed machine printed TIFF document images directly in the compressed domain. After segmenting text regions into lines and words, HMM is applied for recognition using three coding modes of CCITT- horizontal, vertical and the pass mode.
arXiv Detail & Related papers (2022-09-13T06:34:26Z)
Implicit Neural Representations for Image Compression [103.78615661013623]
Implicit Neural Representations (INRs) have gained attention as a novel and effective representation for various data types. We propose the first comprehensive compression pipeline based on INRs including quantization, quantization-aware retraining and entropy coding. We find that our approach to source compression with INRs vastly outperforms similar prior work.
arXiv Detail & Related papers (2021-12-08T13:02:53Z)
Modeling Lost Information in Lossy Image Compression [72.69327382643549]
Lossy image compression is one of the most commonly used operators for digital images. We propose a novel invertible framework called Invertible Lossy Compression (ILC) to largely mitigate the information loss problem.
arXiv Detail & Related papers (2020-06-22T04:04:56Z)
Discernible Image Compression [124.08063151879173]
This paper aims to produce compressed images by pursuing both appearance and perceptual consistency. Based on the encoder-decoder framework, we propose using a pre-trained CNN to extract features of the original and compressed images. Experiments on benchmarks demonstrate that images compressed by using the proposed method can also be well recognized by subsequent visual recognition and detection models.
arXiv Detail & Related papers (2020-02-17T07:35:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.