TFIC: End-to-End Text-Focused Image Compression for Coding for Machines
- URL: http://arxiv.org/abs/2503.19495v1
- Date: Tue, 25 Mar 2025 09:36:13 GMT
- Title: TFIC: End-to-End Text-Focused Image Compression for Coding for Machines
- Authors: Stefano Della Fiore, Alessandro Gnutti, Marco Dalai, Pierangelo Migliorati, Riccardo Leonardi,
- Abstract summary: We present an image compression system designed to retain text-specific features for subsequent Optical Character Recognition (OCR)<n>Our encoding process requires half the time needed by the OCR module, making it especially suitable for devices with limited computational capacity.
- Score: 50.86328069558113
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Traditional image compression methods aim to faithfully reconstruct images for human perception. In contrast, Coding for Machines focuses on compressing images to preserve information relevant to a specific machine task. In this paper, we present an image compression system designed to retain text-specific features for subsequent Optical Character Recognition (OCR). Our encoding process requires half the time needed by the OCR module, making it especially suitable for devices with limited computational capacity. In scenarios where on-device OCR is computationally prohibitive, images are compressed and later processed to recover the text content. Experimental results demonstrate that our method achieves significant improvements in text extraction accuracy at low bitrates, even improving over the accuracy of OCR performed on uncompressed images, thus acting as a local pre-processing step.
Related papers
- Optical Context Compression Is Just (Bad) Autoencoding [32.622769616423035]
DeepSeek-OCR demonstrates that rendered text can be reconstructed with high fidelity from a small number of vision tokens.<n>We test two assumptions implicit in the optical-compression narrative: that vision-based compression provides unique advantages for text reconstruction from compressed representations, and that DeepSeek-OCR's reconstruction results are evidence that vision-based compression will be useful for language modeling.
arXiv Detail & Related papers (2025-12-03T10:27:27Z) - Words as Geometric Features: Estimating Homography using Optical Character Recognition as Compressed Image Representation [6.385732495789276]
Document alignment plays a crucial role in numerous real-world applications, such as automated form processing, anomaly detection, and workflow automation.<n>Traditional methods for document alignment rely on image-based features like keypoints, edges, and textures to estimate geometric transformations, such as homographies.<n>This paper introduces a novel approach that leverages Optical Character Recognition (OCR) outputs as features for homography estimation.
arXiv Detail & Related papers (2025-05-25T01:20:32Z) - Efficient Masked Image Compression with Position-Indexed Self-Attention [6.64044416324419]
We propose an image compression method based on a position-indexed self-attention mechanism.
Compared to existing semantic-structured compression methods, our approach can significantly reduce computational costs.
arXiv Detail & Related papers (2025-04-17T13:12:39Z) - Extremely low-bitrate Image Compression Semantically Disentangled by LMMs from a Human Perception Perspective [12.321609213934389]
Inspired by human progressive perception mechanism, we propose a Semantically Disentangled Image Compression framework.<n>We leverage LMMs to extract essential semantic components, including overall descriptions, object detailed description, and semantic segmentation masks.<n>We propose a training-free Object Restoration model with Attention Guidance (ORAG) built on pre-trained ControlNet to restore object details conditioned by object-level text descriptions and semantic masks.
arXiv Detail & Related papers (2025-03-01T08:27:11Z) - Hierarchical Semantic Compression for Consistent Image Semantic Restoration [62.97519327310638]
We propose a novel hierarchical semantic compression (HSC) framework that purely operates within intrinsic semantic spaces from generative models.<n> Experimental results demonstrate that the proposed HSC framework achieves the state-of-the-art performance on subjective quality and consistency for human vision.
arXiv Detail & Related papers (2025-02-24T03:20:44Z) - Stable Diffusion is a Natural Cross-Modal Decoder for Layered AI-generated Image Compression [7.643300240138419]
We introduce a scalable cross-modal compression framework that incorporates multiple human-comprehensible modalities.<n>Our framework encodes images into a layered bitstream consisting of a semantic layer that delivers high-level semantic information.<n>Our method proficiently restores both semantic and visual details, competing against baseline approaches at extremely lows.
arXiv Detail & Related papers (2024-12-17T15:01:35Z) - Decoder Pre-Training with only Text for Scene Text Recognition [54.93037783663204]
Scene text recognition (STR) pre-training methods have achieved remarkable progress, primarily relying on synthetic datasets.
We introduce a novel method named Decoder Pre-training with only text for STR (DPTR)
DPTR treats text embeddings produced by the CLIP text encoder as pseudo visual embeddings and uses them to pre-train the decoder.
arXiv Detail & Related papers (2024-08-11T06:36:42Z) - Optimization of Image Processing Algorithms for Character Recognition in
Cultural Typewritten Documents [0.8158530638728501]
This paper evaluates the impact of image processing methods and parameter tuning in Optical Character Recognition (OCR)
The approach uses a multi-objective problem formulation to minimize Levenshtein edit distance and maximize the number of words correctly identified with a non-dominated sorting genetic algorithm (NSGA-II)
Our findings suggest that employing image pre-processing algorithms in OCR might be more suitable for typologies where the text recognition task without pre-processing does not produce good results.
arXiv Detail & Related papers (2023-11-27T11:44:46Z) - Perceptual Image Compression with Cooperative Cross-Modal Side
Information [53.356714177243745]
We propose a novel deep image compression method with text-guided side information to achieve a better rate-perception-distortion tradeoff.
Specifically, we employ the CLIP text encoder and an effective Semantic-Spatial Aware block to fuse the text and image features.
arXiv Detail & Related papers (2023-11-23T08:31:11Z) - OCR for TIFF Compressed Document Images Directly in Compressed Domain
Using Text segmentation and Hidden Markov Model [0.0]
We propose a novel idea of developing an OCR for CCITT (The International Telegraph and Telephone Consultative Committee) compressed machine printed TIFF document images directly in the compressed domain.
After segmenting text regions into lines and words, HMM is applied for recognition using three coding modes of CCITT- horizontal, vertical and the pass mode.
arXiv Detail & Related papers (2022-09-13T06:34:26Z) - Implicit Neural Representations for Image Compression [103.78615661013623]
Implicit Neural Representations (INRs) have gained attention as a novel and effective representation for various data types.
We propose the first comprehensive compression pipeline based on INRs including quantization, quantization-aware retraining and entropy coding.
We find that our approach to source compression with INRs vastly outperforms similar prior work.
arXiv Detail & Related papers (2021-12-08T13:02:53Z) - Modeling Lost Information in Lossy Image Compression [72.69327382643549]
Lossy image compression is one of the most commonly used operators for digital images.
We propose a novel invertible framework called Invertible Lossy Compression (ILC) to largely mitigate the information loss problem.
arXiv Detail & Related papers (2020-06-22T04:04:56Z) - Discernible Image Compression [124.08063151879173]
This paper aims to produce compressed images by pursuing both appearance and perceptual consistency.
Based on the encoder-decoder framework, we propose using a pre-trained CNN to extract features of the original and compressed images.
Experiments on benchmarks demonstrate that images compressed by using the proposed method can also be well recognized by subsequent visual recognition and detection models.
arXiv Detail & Related papers (2020-02-17T07:35:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.