Shape and Texture Recognition in Large Vision-Language Models
- URL: http://arxiv.org/abs/2503.23062v1
- Date: Sat, 29 Mar 2025 12:43:29 GMT
- Title: Shape and Texture Recognition in Large Vision-Language Models
- Authors: Sagi Eppel, Mor Bismut, Alona Faktor,
- Abstract summary: This dataset is used to evaluate how effectively leading Large Vision-Language Models (LVLMs) understand shapes, textures, and materials in 2D and 3D scenes.<n>For shape recognition, we test models' ability to match identical shapes that differ in orientation, texture, color, or environment.<n>For texture and material recognition, we evaluate models' ability to identify identical textures and materials across different objects and environments.
- Score: 0.5266869303483376
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Shape and texture recognition is fundamental to visual perception. The ability to identify shapes regardless of orientation, texture, or context, and to recognize textures independently of their associated objects, is essential for general visual understanding of the world. We introduce the Large Shape & Textures dataset (LAS&T), a giant collection of diverse shapes and textures automatically extracted from real-world images. This dataset is used to evaluate how effectively leading Large Vision-Language Models (LVLMs) understand shapes, textures, and materials in both 2D and 3D scenes. For shape recognition, we test models' ability to match identical shapes that differ in orientation, texture, color, or environment. Our results show that LVLMs' shape identification capabilities remain significantly below human performance. Single alterations (orientation, texture) cause minor decreases in matching accuracy, while multiple changes precipitate dramatic drops. LVLMs appear to rely predominantly on high-level and semantic features and struggle with abstract shapes lacking clear class associations. For texture and material recognition, we evaluate models' ability to identify identical textures and materials across different objects and environments. Interestingly, leading LVLMs approach human-level performance in recognizing materials in 3D scenes, yet substantially underperform humans when identifying simpler 2D textures. The LAS&T dataset and benchmark, the largest and most diverse resource for shape and texture evaluation, is freely available with generation and testing scripts.
Related papers
- Real-time Free-view Human Rendering from Sparse-view RGB Videos using Double Unprojected Textures [87.80984588545589]
Real-time free-view human rendering from sparse-view RGB inputs is a challenging task due to the sensor scarcity and the tight time budget.<n>Recent methods leverage 2D CNNs operating in texture space to learn rendering primitives.<n>We present Double Unprojected Textures, which at the core disentangles coarse geometric deformation estimation from appearance synthesis.
arXiv Detail & Related papers (2024-12-17T18:57:38Z) - Do large language vision models understand 3D shapes? [0.6993026261767287]
Large vision language models (LVLM) are the leading A.I approach for achieving a general visual understanding of the world.<n>This work is to test whether LVLMs truly understand 3D shapes by testing the models ability to identify and match objects of the exact same 3D shapes.
arXiv Detail & Related papers (2024-12-14T17:35:27Z) - Textured Mesh Saliency: Bridging Geometry and Texture for Human Perception in 3D Graphics [50.23625950905638]
We present a new dataset for textured mesh saliency, created through an innovative eye-tracking experiment in a six degrees of freedom (6-DOF) VR environment.
Our proposed model predicts saliency maps for textured mesh surfaces by treating each triangular face as an individual unit and assigning a saliency density value to reflect the importance of each local surface region.
arXiv Detail & Related papers (2024-12-11T08:27:33Z) - TextureDreamer: Image-guided Texture Synthesis through Geometry-aware
Diffusion [64.49276500129092]
TextureDreamer is an image-guided texture synthesis method.
It can transfer relightable textures from a small number of input images to target 3D shapes across arbitrary categories.
arXiv Detail & Related papers (2024-01-17T18:55:49Z) - Differentiable Blocks World: Qualitative 3D Decomposition by Rendering
Primitives [70.32817882783608]
We present an approach that produces a simple, compact, and actionable 3D world representation by means of 3D primitives.
Unlike existing primitive decomposition methods that rely on 3D input data, our approach operates directly on images.
We show that the resulting textured primitives faithfully reconstruct the input images and accurately model the visible 3D points.
arXiv Detail & Related papers (2023-07-11T17:58:31Z) - MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D
Segmentation [91.6658845016214]
We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks.
We render a 3D shape from multiple views, and set up a dense correspondence learning task within the contrastive learning framework.
As a result, the learned 2D representations are view-invariant and geometrically consistent.
arXiv Detail & Related papers (2022-08-18T00:48:15Z) - Texturify: Generating Textures on 3D Shape Surfaces [34.726179801982646]
We propose Texturify to learn a 3D shape that predicts texture on the 3D input.
Our method does not require any 3D color supervision to learn 3D objects.
arXiv Detail & Related papers (2022-04-05T18:00:04Z) - Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using
Deep Shape Priors [69.02332607843569]
PriSMONet is a novel approach for learning Multi-Object 3D scene decomposition and representations from single images.
A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image.
We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.
arXiv Detail & Related papers (2020-10-08T14:49:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.