ESIQA: Perceptual Quality Assessment of Vision-Pro-based Egocentric Spatial Images
- URL: http://arxiv.org/abs/2407.21363v2
- Date: Fri, 21 Feb 2025 06:56:59 GMT
- Title: ESIQA: Perceptual Quality Assessment of Vision-Pro-based Egocentric Spatial Images
- Authors: Xilei Zhu, Liu Yang, Huiyu Duan, Xiongkuo Min, Guangtao Zhai, Patrick Le Callet,
- Abstract summary: Egocentric images and videos are emerging as a compelling form of stereoscopic XR content.<n>The corresponding image quality assessment (IQA) research for egocentric spatial images is still lacking.<n>In this paper, we establish the Egocentric Spatial Images Quality Assessment Database (ESQAD), the first IQA database dedicated for egocentric spatial images.
- Score: 70.68629648595677
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the development of eXtended Reality (XR), photo capturing and display technology based on head-mounted displays (HMDs) have experienced significant advancements and gained considerable attention. Egocentric spatial images and videos are emerging as a compelling form of stereoscopic XR content. The assessment for the Quality of Experience (QoE) of XR content is important to ensure a high-quality viewing experience. Different from traditional 2D images, egocentric spatial images present challenges for perceptual quality assessment due to their special shooting, processing methods, and stereoscopic characteristics. However, the corresponding image quality assessment (IQA) research for egocentric spatial images is still lacking. In this paper, we establish the Egocentric Spatial Images Quality Assessment Database (ESIQAD), the first IQA database dedicated for egocentric spatial images as far as we know. Our ESIQAD includes 500 egocentric spatial images and the corresponding mean opinion scores (MOSs) under three display modes, including 2D display, 3D-window display, and 3D-immersive display. Based on our ESIQAD, we propose a novel mamba2-based multi-stage feature fusion model, termed ESIQAnet, which predicts the perceptual quality of egocentric spatial images under the three display modes. Specifically, we first extract features from multiple visual state space duality (VSSD) blocks, then apply cross attention to fuse binocular view information and use transposed attention to further refine the features. The multi-stage features are finally concatenated and fed into a quality regression network to predict the quality score. Extensive experimental results demonstrate that the ESIQAnet outperforms 22 state-of-the-art IQA models on the ESIQAD under all three display modes. The database and code are available at https://github.com/IntMeGroup/ESIQA.
Related papers
- Max360IQ: Blind Omnidirectional Image Quality Assessment with Multi-axis Attention [30.688264840230755]
We propose a novel and effective blind omnidirectional image quality assessment model with multi-axis attention (Max360IQ)
Max360IQ can proficiently measure not only the quality of uniformly distorted omnidirectional images but also the quality of non-uniformly distorted omnidirectional images.
arXiv Detail & Related papers (2025-02-26T11:01:03Z) - ESVQA: Perceptual Quality Assessment of Egocentric Spatial Videos [71.62145804686062]
We introduce the first Egocentric Spatial Video Quality Assessment Database (ESVQAD), which comprises 600 egocentric spatial videos and their mean opinion scores (MOSs)
We propose a novel multi-dimensional binocular feature fusion model, termed ESVQAnet, which integrates binocular spatial, motion, and semantic features to predict the perceptual quality.
Experimental results demonstrate the ESVQAnet outperforms 16 state-of-the-art VQA models on the embodied perceptual quality assessment task.
arXiv Detail & Related papers (2024-12-29T10:13:30Z) - AI-generated Image Quality Assessment in Visual Communication [72.11144790293086]
AIGI-VC is a quality assessment database for AI-generated images in visual communication.
The dataset consists of 2,500 images spanning 14 advertisement topics and 8 emotion types.
It provides coarse-grained human preference annotations and fine-grained preference descriptions, benchmarking the abilities of IQA methods in preference prediction, interpretation, and reasoning.
arXiv Detail & Related papers (2024-12-20T08:47:07Z) - Visual Verity in AI-Generated Imagery: Computational Metrics and Human-Centric Analysis [0.0]
We introduce and validated a questionnaire called Visual Verity, which measures photorealism, image quality, and text-image alignment.
We also analyzed statistical properties, finding that camera-generated images scored lower in hue, saturation, and brightness.
Our findings highlight the need for refining computational metrics to better capture human visual perception.
arXiv Detail & Related papers (2024-08-22T23:29:07Z) - Perceptual Depth Quality Assessment of Stereoscopic Omnidirectional Images [10.382801621282228]
We develop an objective quality assessment model named depth quality index (DQI) for efficient no-reference (NR) depth quality assessment of stereoscopic omnidirectional images.
Motivated by the perceptual characteristics of the human visual system (HVS), the proposed DQI is built upon multi-color-channel, adaptive viewport selection, and interocular discrepancy features.
arXiv Detail & Related papers (2024-08-19T16:28:05Z) - UHD-IQA Benchmark Database: Pushing the Boundaries of Blind Photo Quality Assessment [4.563959812257119]
We introduce a novel Image Quality Assessment dataset comprising 6073 UHD-1 (4K) images, annotated at a fixed width of 3840 pixels.
Ours focuses on highly aesthetic photos of high technical quality, filling a gap in the literature.
The dataset is annotated with perceptual quality ratings obtained through a crowdsourcing study.
arXiv Detail & Related papers (2024-06-25T11:30:31Z) - AIGCOIQA2024: Perceptual Quality Assessment of AI Generated Omnidirectional Images [70.42666704072964]
We establish a large-scale AI generated omnidirectional image IQA database named AIGCOIQA2024.
A subjective IQA experiment is conducted to assess human visual preferences from three perspectives.
We conduct a benchmark experiment to evaluate the performance of state-of-the-art IQA models on our database.
arXiv Detail & Related papers (2024-04-01T10:08:23Z) - EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with
Visual Queries [68.75400888770793]
We formalize a pipeline that better entangles 3D multiview geometry with 2D object retrieval from egocentric videos.
Specifically, our approach achieves an overall success rate of up to 87.12%, which sets a new state-of-the-art result in the VQ3D task.
arXiv Detail & Related papers (2022-12-14T01:28:12Z) - Perceptual Quality Assessment of Omnidirectional Images [81.76416696753947]
We first establish an omnidirectional IQA (OIQA) database, which includes 16 source images and 320 distorted images degraded by 4 commonly encountered distortion types.
Then a subjective quality evaluation study is conducted on the OIQA database in the VR environment.
The original and distorted omnidirectional images, subjective quality ratings, and the head and eye movement data together constitute the OIQA database.
arXiv Detail & Related papers (2022-07-06T13:40:38Z) - Confusing Image Quality Assessment: Towards Better Augmented Reality
Experience [96.29124666702566]
We consider AR technology as the superimposition of virtual scenes and real scenes, and introduce visual confusion as its basic theory.
A ConFusing Image Quality Assessment (CFIQA) database is established, which includes 600 reference images and 300 distorted images generated by mixing reference images in pairs.
An objective metric termed CFIQA is also proposed to better evaluate the confusing image quality.
arXiv Detail & Related papers (2022-04-11T07:03:06Z) - SelfPose: 3D Egocentric Pose Estimation from a Headset Mounted Camera [97.0162841635425]
We present a solution to egocentric 3D body pose estimation from monocular images captured from downward looking fish-eye cameras installed on the rim of a head mounted VR device.
This unusual viewpoint leads to images with unique visual appearance, with severe self-occlusions and perspective distortions.
We propose an encoder-decoder architecture with a novel multi-branch decoder designed to account for the varying uncertainty in 2D predictions.
arXiv Detail & Related papers (2020-11-02T16:18:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.