Related papers: Appreciate the View: A Task-Aware Evaluation Framework for Novel View Synthesis

Appreciate the View: A Task-Aware Evaluation Framework for Novel View Synthesis

URL: http://arxiv.org/abs/2511.12675v1
Date: Sun, 16 Nov 2025 16:28:08 GMT
Title: Appreciate the View: A Task-Aware Evaluation Framework for Novel View Synthesis
Authors: Saar Stern, Ido Sobol, Or Litany,
Abstract summary: Novel View Synthesis (NVS) aims to generate realistic images of a given content from unseen viewpoints.<n>Existing evaluation metrics struggle to assess whether a generated image is both realistic and faithful to the source view.<n>We introduce two complementary evaluation metrics: a reference-based score, $D_textPRISM$, and a reference-free score, $textMMD_textPRISM$.
Score: 15.922599086027098
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The goal of Novel View Synthesis (NVS) is to generate realistic images of a given content from unseen viewpoints. But how can we trust that a generated image truly reflects the intended transformation? Evaluating its reliability remains a major challenge. While recent generative models, particularly diffusion-based approaches, have significantly improved NVS quality, existing evaluation metrics struggle to assess whether a generated image is both realistic and faithful to the source view and intended viewpoint transformation. Standard metrics, such as pixel-wise similarity and distribution-based measures, often mis-rank incorrect results as they fail to capture the nuanced relationship between the source image, viewpoint change, and generated output. We propose a task-aware evaluation framework that leverages features from a strong NVS foundation model, Zero123, combined with a lightweight tuning step to enhance discrimination. Using these features, we introduce two complementary evaluation metrics: a reference-based score, $D_{\text{PRISM}}$, and a reference-free score, $\text{MMD}_{\text{PRISM}}$. Both reliably identify incorrect generations and rank models in agreement with human preference studies, addressing a fundamental gap in NVS evaluation. Our framework provides a principled and practical approach to assessing synthesis quality, paving the way for more reliable progress in novel view synthesis. To further support this goal, we apply our reference-free metric to six NVS methods across three benchmarks: Toys4K, Google Scanned Objects (GSO), and OmniObject3D, where $\text{MMD}_{\text{PRISM}}$ produces a clear and stable ranking, with lower scores consistently indicating stronger models.

Related papers

GenArena: How Can We Achieve Human-Aligned Evaluation for Visual Generation Tasks? [29.804627410258732]
We introduce a unified evaluation framework that leverages a pairwise comparison paradigm to ensure stable and human-aligned evaluation.<n>Our method boosts evaluation accuracy by over 20% and achieves a Spearman correlation of 0.86 with the authoritative LMArena leaderboard.
arXiv Detail & Related papers (2026-02-05T18:52:48Z)
REVEALER: Reinforcement-Guided Visual Reasoning for Element-Level Text-Image Alignment Evaluation [10.151027538362259]
REVEALER is a unified framework for element-level alignment evaluation based on reinforcement-guided visual reasoning.<n>Our method enables Multimodal Large Language Models (MLLMs) to explicitly localize semantic elements and derive interpretable alignment judgments.
arXiv Detail & Related papers (2025-12-29T03:24:09Z)
Non-Aligned Reference Image Quality Assessment for Novel View Synthesis [8.68364429451164]
We introduce a Non-Temporal Reference (NAR-IQA) framework tailored for Novel View Synthesis (NVS) images.<n>Our model is built on a contrastive learning framework that incorporates LoRA-enhanced DINOv2 embeddings.<n>We conduct a novel user study to gather data on human preferences when viewing non-aligned references in NVS.
arXiv Detail & Related papers (2025-11-11T12:08:12Z)
MUGSQA: Novel Multi-Uncertainty-Based Gaussian Splatting Quality Assessment Method, Dataset, and Benchmarks [50.53294970211443]
Gaussian Splatting (GS) has emerged as a promising technique for 3D object reconstruction, delivering high-quality rendering results with significantly improved reconstruction speed.<n> assessing the perceptual quality of 3D objects reconstructed with different GS-based methods remains an open challenge.<n>We propose a unified multi-distance subjective quality assessment method that closely mimics human viewing behavior for objects reconstructed with GS-based methods in actual applications.<n>We construct two benchmarks: one to evaluate the robustness of various GS-based reconstruction methods under multiple uncertainties, and the other to evaluate the performance of existing quality assessment metrics.
arXiv Detail & Related papers (2025-11-10T08:21:11Z)
OmniQuality-R: Advancing Reward Models Through All-Encompassing Quality Assessment [55.59322229889159]
We propose OmniQuality-R, a unified reward modeling framework that transforms multi-task quality reasoning into continuous and interpretable reward signals.<n>We use a reasoning-enhanced reward modeling dataset to form a reliable chain-of-thought dataset for supervised fine-tuning.<n>We evaluate OmniQuality-R on three key IQA tasks: aesthetic quality assessment, technical quality evaluation, and text-image alignment.
arXiv Detail & Related papers (2025-10-12T13:46:28Z)
EdiVal-Agent: An Object-Centric Framework for Automated, Fine-Grained Evaluation of Multi-Turn Editing [170.71134330650796]
EdiVal-Agent is an object-centric evaluation framework for instruction-based image editing.<n>It is designed to assess not only standard single-turn but also multi-turn instruction-based editing with precision.<n>We build EdiVal-Bench, a benchmark covering 9 instruction types and 13 state-of-the-art editing models spanning in-context, flow-matching, and diffusion paradigms.
arXiv Detail & Related papers (2025-09-16T17:45:39Z)
Test-Time Consistency in Vision Language Models [26.475993408532304]
Vision-Language Models (VLMs) have achieved impressive performance across a wide range of multimodal tasks.<n>Recent benchmarks, such as MM-R3, highlight that even state-of-the-art VLMs can produce divergent predictions across semantically equivalent inputs.<n>We propose a simple and effective test-time consistency framework that enhances semantic consistency without supervised re-training.
arXiv Detail & Related papers (2025-06-27T17:09:44Z)
Leveraging Vision-Language Models to Select Trustworthy Super-Resolution Samples Generated by Diffusion Models [0.026861992804651083]
This paper introduces a robust framework for identifying the most trustworthy SR sample from a diffusion-generated set.<n>We propose a novel Trustworthiness Score (TWS) a hybrid metric that quantifies SR reliability based on semantic similarity.<n>By aligning outputs with human expectations and semantic correctness, this work sets a new benchmark for trustworthiness in generative SR.
arXiv Detail & Related papers (2025-06-25T21:00:44Z)
Hierarchical Scoring with 3D Gaussian Splatting for Instance Image-Goal Navigation [27.040017548286812]
Instance Image-Goal Navigation (IIN) requires autonomous agents to identify and navigate to a target object or location depicted in a reference image captured from any viewpoint.<n>We introduce a novel IIN framework with a hierarchical scoring paradigm that estimates optimal viewpoints for target matching.
arXiv Detail & Related papers (2025-06-09T00:58:14Z)
PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.<n>Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z)
Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models. In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z)
Non-Local Latent Relation Distillation for Self-Adaptive 3D Human Pose Estimation [63.199549837604444]
3D human pose estimation approaches leverage different forms of strong (2D/3D pose) or weak (multi-view or depth) paired supervision. We cast 3D pose learning as a self-supervised adaptation problem that aims to transfer the task knowledge from a labeled source domain to a completely unpaired target. We evaluate different self-adaptation settings and demonstrate state-of-the-art 3D human pose estimation performance on standard benchmarks.
arXiv Detail & Related papers (2022-04-05T03:52:57Z)
TISE: A Toolbox for Text-to-Image Synthesis Evaluation [9.092600296992925]
We conduct a study on state-of-the-art methods for single- and multi-object text-to-image synthesis. We propose a common framework for evaluating these methods.
arXiv Detail & Related papers (2021-12-02T16:39:35Z)
RobustBench: a standardized adversarial robustness benchmark [84.50044645539305]
Key challenge in benchmarking robustness is that its evaluation is often error-prone leading to robustness overestimation. We evaluate adversarial robustness with AutoAttack, an ensemble of white- and black-box attacks. We analyze the impact of robustness on the performance on distribution shifts, calibration, out-of-distribution detection, fairness, privacy leakage, smoothness, and transferability.
arXiv Detail & Related papers (2020-10-19T17:06:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.