DocIQ: A Benchmark Dataset and Feature Fusion Network for Document Image Quality Assessment
- URL: http://arxiv.org/abs/2509.17012v1
- Date: Sun, 21 Sep 2025 10:01:43 GMT
- Title: DocIQ: A Benchmark Dataset and Feature Fusion Network for Document Image Quality Assessment
- Authors: Zhichao Ma, Fan Huang, Lu Zhao, Fengjun Guo, Guangtao Zhai, Xiongkuo Min,
- Abstract summary: We introduce a subjective DIQA dataset DIQA-5000.<n>The DIQA-5000 dataset comprises 5,000 document images, generated by applying multiple document enhancement techniques to 500 real-world images with diverse distortions.<n>Each enhanced image was rated by 15 subjects across three rating dimensions: overall quality, sharpness, and color fidelity.<n>We propose a specialized no-reference DIQA model that exploits document layout features to maintain quality perception at reduced resolutions to lower computational cost.
- Score: 78.21680156380705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Document image quality assessment (DIQA) is an important component for various applications, including optical character recognition (OCR), document restoration, and the evaluation of document image processing systems. In this paper, we introduce a subjective DIQA dataset DIQA-5000. The DIQA-5000 dataset comprises 5,000 document images, generated by applying multiple document enhancement techniques to 500 real-world images with diverse distortions. Each enhanced image was rated by 15 subjects across three rating dimensions: overall quality, sharpness, and color fidelity. Furthermore, we propose a specialized no-reference DIQA model that exploits document layout features to maintain quality perception at reduced resolutions to lower computational cost. Recognizing that image quality is influenced by both low-level and high-level visual features, we designed a feature fusion module to extract and integrate multi-level features from document images. To generate multi-dimensional scores, our model employs independent quality heads for each dimension to predict score distributions, allowing it to learn distinct aspects of document image quality. Experimental results demonstrate that our method outperforms current state-of-the-art general-purpose IQA models on both DIQA-5000 and an additional document image dataset focused on OCR accuracy.
Related papers
- MDIQA: Unified Image Quality Assessment for Multi-dimensional Evaluation and Restoration [76.94293572477379]
We propose a multi-dimensional image quality assessment (MDIQA) framework.<n>We model image quality across various perceptual dimensions, including five technical and four aesthetic dimensions.<n>When the MDIQA model is ready, we can deploy it for a flexible training of image restoration (IR) models.
arXiv Detail & Related papers (2025-08-23T03:17:14Z) - DeQA-Doc: Adapting DeQA-Score to Document Image Quality Assessment [6.922942482129033]
We adapt DeQA-Score, a state-of-the-art MLLM-based image quality scorer, for document quality assessment.<n>We propose DeQA-Doc, a framework that leverages the visual language capabilities of MLLMs and a soft label strategy to regress continuous document quality scores.
arXiv Detail & Related papers (2025-07-17T05:23:53Z) - Q-Ground: Image Quality Grounding with Large Multi-modality Models [61.72022069880346]
We introduce Q-Ground, the first framework aimed at tackling fine-scale visual quality grounding.
Q-Ground combines large multi-modality models with detailed visual quality analysis.
Central to our contribution is the introduction of the QGround-100K dataset.
arXiv Detail & Related papers (2024-07-24T06:42:46Z) - UHD-IQA Benchmark Database: Pushing the Boundaries of Blind Photo Quality Assessment [4.563959812257119]
We introduce a novel Image Quality Assessment dataset comprising 6073 UHD-1 (4K) images, annotated at a fixed width of 3840 pixels.
Ours focuses on highly aesthetic photos of high technical quality, filling a gap in the literature.
The dataset is annotated with perceptual quality ratings obtained through a crowdsourcing study.
arXiv Detail & Related papers (2024-06-25T11:30:31Z) - Descriptive Image Quality Assessment in the Wild [25.503311093471076]
VLM-based Image Quality Assessment (IQA) seeks to describe image quality linguistically to align with human expression.
We introduce Depicted image Quality Assessment in the Wild (DepictQA-Wild)
Our method includes a multi-functional IQA task paradigm that encompasses both assessment and comparison tasks, brief and detailed responses, full-reference and non-reference scenarios.
arXiv Detail & Related papers (2024-05-29T07:49:15Z) - AIGCIQA2023: A Large-scale Image Quality Assessment Database for AI
Generated Images: from the Perspectives of Quality, Authenticity and
Correspondence [42.85549933048976]
We first generate over 2000 images based on 6 state-of-the-art text-to-image generation models using 100 prompts.
Based on these images, a subjective experiment is conducted to assess the human visual preferences for each image from three perspectives.
We conduct a benchmark experiment to evaluate the performance of several state-of-the-art IQA metrics on our constructed database.
arXiv Detail & Related papers (2023-07-01T03:30:31Z) - HQ-50K: A Large-scale, High-quality Dataset for Image Restoration [105.22191357934398]
HQ-50K contains 50,000 high-quality images with rich texture details and semantic diversity.
We analyze existing image restoration datasets from five different perspectives.
HQ-50K considers all of these five aspects during the data curation process and meets all requirements.
arXiv Detail & Related papers (2023-06-08T17:44:21Z) - Blind Image Quality Assessment via Vision-Language Correspondence: A
Multitask Learning Perspective [93.56647950778357]
Blind image quality assessment (BIQA) predicts the human perception of image quality without any reference information.
We develop a general and automated multitask learning scheme for BIQA to exploit auxiliary knowledge from other tasks.
arXiv Detail & Related papers (2023-03-27T07:58:09Z) - MSTRIQ: No Reference Image Quality Assessment Based on Swin Transformer
with Multi-Stage Fusion [8.338999282303755]
We propose a novel algorithm based on the Swin Transformer.
It aggregates information from both local and global features to better predict the quality.
It ranks 2nd in the no-reference track of NTIRE 2022 Perceptual Image Quality Assessment Challenge.
arXiv Detail & Related papers (2022-05-20T11:34:35Z) - Object-QA: Towards High Reliable Object Quality Assessment [71.71188284059203]
In object recognition applications, object images usually appear with different quality levels.
We propose an effective approach named Object-QA to assess high-reliable quality scores for object images.
arXiv Detail & Related papers (2020-05-27T01:46:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.