Related papers: LMM-PCQA: Assisting Point Cloud Quality Assessment with LMM

LMM-PCQA: Assisting Point Cloud Quality Assessment with LMM

URL: http://arxiv.org/abs/2404.18203v2
Date: Tue, 6 Aug 2024 03:37:31 GMT
Title: LMM-PCQA: Assisting Point Cloud Quality Assessment with LMM
Authors: Zicheng Zhang, Haoning Wu, Yingjie Zhou, Chunyi Li, Wei Sun, Chaofeng Chen, Xiongkuo Min, Xiaohong Liu, Weisi Lin, Guangtao Zhai,
Abstract summary: This study aims to investigate the feasibility of imparting Point Cloud Quality Assessment (PCQA) knowledge to large multi-modality models (LMMs) We transform quality labels into textual descriptions during the fine-tuning phase, enabling LMMs to derive quality rating logits from 2D projections of point clouds. Our experimental results affirm the effectiveness of our approach, showcasing a novel integration of LMMs into PCQA.
Score: 83.98966702271576
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Although large multi-modality models (LMMs) have seen extensive exploration and application in various quality assessment studies, their integration into Point Cloud Quality Assessment (PCQA) remains unexplored. Given LMMs' exceptional performance and robustness in low-level vision and quality assessment tasks, this study aims to investigate the feasibility of imparting PCQA knowledge to LMMs through text supervision. To achieve this, we transform quality labels into textual descriptions during the fine-tuning phase, enabling LMMs to derive quality rating logits from 2D projections of point clouds. To compensate for the loss of perception in the 3D domain, structural features are extracted as well. These quality logits and structural features are then combined and regressed into quality scores. Our experimental results affirm the effectiveness of our approach, showcasing a novel integration of LMMs into PCQA that enhances model understanding and assessment accuracy. We hope our contributions can inspire subsequent investigations into the fusion of LMMs with PCQA, fostering advancements in 3D visual quality analysis and beyond. The code is available at https://github.com/zzc-1998/LMM-PCQA.

Related papers

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency [63.23935582919081]
Chain-of-Thought (CoT) has significantly enhanced the reasoning capabilities of Large Language Models (LLMs) We introduce MME-CoT, a specialized benchmark evaluating the CoT reasoning performance of LMMs. We conduct an in-depth analysis of state-of-the-art LMMs, uncovering several key insights.
arXiv Detail & Related papers (2025-02-13T18:59:46Z)
Q-Ground: Image Quality Grounding with Large Multi-modality Models [61.72022069880346]
We introduce Q-Ground, the first framework aimed at tackling fine-scale visual quality grounding. Q-Ground combines large multi-modality models with detailed visual quality analysis. Central to our contribution is the introduction of the QGround-100K dataset.
arXiv Detail & Related papers (2024-07-24T06:42:46Z)
Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models [84.78457918843165]
Unsolvable Problem Detection (UPD) is a novel task to evaluate the robust understanding capability of Large Multimodal Models (LMMs) UPD assesses the LMM's ability to withhold answers when encountering unsolvable problems of multiple-choice question answering. Our experiments reveal that even most LMMs, which demonstrate adequate performance on existing benchmarks, struggle significantly with MM-UPD.
arXiv Detail & Related papers (2024-03-29T17:59:53Z)
2AFC Prompting of Large Multimodal Models for Image Quality Assessment [38.86162365208038]
Two-alternative forced choice (2AFC) prompting is widely regarded as the most reliable way of collecting human opinions of visual quality. Global quality score of each image estimated by a particular LMM can be efficiently aggregated using the maximum a posterior estimation.
arXiv Detail & Related papers (2024-02-02T06:05:18Z)
Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels [95.44077384918725]
We propose to teach large multi-modality models (LMMs) with text-defined rating levels instead of scores. The proposed Q-Align achieves state-of-the-art performance on image quality assessment (IQA), image aesthetic assessment (IAA) and video quality assessment (VQA) tasks.
arXiv Detail & Related papers (2023-12-28T16:10:25Z)
Q-Boost: On Visual Quality Assessment Ability of Low-level Multi-Modality Foundation Models [80.79438689784958]
We introduce Q-Boost, a strategy designed to enhance low-level MLLMs in image quality assessment (IQA) and video quality assessment (VQA) tasks. Q-Boost innovates by incorporating a middle ground' approach through $neutral$ prompts, allowing for a more balanced and detailed assessment. The experimental results show that the low-level MLLMs exhibit outstanding zeros-shot performance on the IQA/VQA tasks equipped with the Q-Boost strategy.
arXiv Detail & Related papers (2023-12-23T17:02:25Z)
METAL: Metamorphic Testing Framework for Analyzing Large-Language Model Qualities [4.493507573183107]
Large-Language Models (LLMs) have shifted the paradigm of natural language data processing. Recent studies have tested Quality Attributes (QAs) of LLMs by generating adversarial input texts. We propose a MEtamorphic Testing for Analyzing LLMs (METAL) framework to address these issues.
arXiv Detail & Related papers (2023-12-11T01:29:19Z)
On the Robustness of Large Multimodal Models Against Image Adversarial Attacks [81.2935966933355]
We study the impact of visual adversarial attacks on Large Multimodal Models (LMMs) We find that in general LMMs are not robust to visual adversarial inputs. We propose a new approach to real-world image classification which we term query decomposition.
arXiv Detail & Related papers (2023-12-06T04:59:56Z)
Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision [85.6008224440157]
Multi-modality Large Language Models (MLLMs) have catalyzed a shift in computer vision from specialized models to general-purpose foundation models. We present Q-Bench, a holistic benchmark crafted to evaluate potential abilities of MLLMs on three realms: low-level visual perception, low-level visual description, and overall visual quality assessment.
arXiv Detail & Related papers (2023-09-25T14:43:43Z)
Point Cloud Quality Assessment using 3D Saliency Maps [37.290843791053256]
We propose an effective full-reference PCQA metric which makes the first attempt to utilize the saliency information to facilitate quality prediction. Specifically, we first propose a projection-based point cloud saliency map generation method, in which depth information is introduced to better reflect the geometric characteristics of point clouds. Finally, a saliency-based pooling strategy is proposed to generate the final quality score.
arXiv Detail & Related papers (2022-09-30T13:59:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.