Related papers: Visual Diffusion Models are Geometric Solvers

Visual Diffusion Models are Geometric Solvers

URL: http://arxiv.org/abs/2510.21697v1
Date: Fri, 24 Oct 2025 17:57:31 GMT
Title: Visual Diffusion Models are Geometric Solvers
Authors: Nir Goren, Shai Yehezkel, Omer Dahary, Andrey Voynov, Or Patashnik, Daniel Cohen-Or,
Abstract summary: We show that visual diffusion models can serve as effective geometric solvers by working in pixel space.<n>We first demonstrate this on the Inscribed Square Problem, a long-standing problem in geometry.<n>We extend the approach to two other well-known hard geometric problems: the Steiner Tree Problem and the Simple Polygon Problem.
Score: 54.31602846693932
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In this paper we show that visual diffusion models can serve as effective geometric solvers: they can directly reason about geometric problems by working in pixel space. We first demonstrate this on the Inscribed Square Problem, a long-standing problem in geometry that asks whether every Jordan curve contains four points forming a square. We then extend the approach to two other well-known hard geometric problems: the Steiner Tree Problem and the Simple Polygon Problem. Our method treats each problem instance as an image and trains a standard visual diffusion model that transforms Gaussian noise into an image representing a valid approximate solution that closely matches the exact one. The model learns to transform noisy geometric structures into correct configurations, effectively recasting geometric reasoning as image generation. Unlike prior work that necessitates specialized architectures and domain-specific adaptations when applying diffusion to parametric geometric representations, we employ a standard visual diffusion model that operates on the visual representation of the problem. This simplicity highlights a surprising bridge between generative modeling and geometric problem solving. Beyond the specific problems studied here, our results point toward a broader paradigm: operating in image space provides a general and practical framework for approximating notoriously hard problems, and opens the door to tackling a far wider class of challenging geometric tasks.

Related papers

GeoFM: Enhancing Geometric Reasoning of MLLMs via Synthetic Data Generation through Formal Language [11.134307550723037]
Multi-modal Large Language Models (MLLMs) have gained significant attention in both academia and industry.<n>These models face challenges in mathematical geometric reasoning due to the scarcity of high-quality geometric data.<n>We propose GeoFM, a novel method for synthesizing geometric data.
arXiv Detail & Related papers (2025-10-31T12:56:32Z)
GeoUni: A Unified Model for Generating Geometry Diagrams, Problems and Problem Solutions [9.55713776359176]
We propose GeoUni, the first unified geometry expert model capable of generating problem solutions and diagrams within a single framework.<n>With only 1.5B parameters, GeoUni achieves performance comparable to larger models such as DeepSeek-R1 with 671B parameters in geometric reasoning tasks.<n>GeoUni also excels in generating precise geometric diagrams, surpassing both text-to-image models and unified models, including the GPT-4o image generation.
arXiv Detail & Related papers (2025-04-14T11:56:55Z)
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training [45.42400674977197]
GeoX is a multi-modal large model focusing on geometric understanding and reasoning tasks.<n>We introduce unimodal pre-training to develop a diagram encoder and symbol decoder, enhancing the understanding of geometric images and corpora.<n>We propose a Generator-And-Sampler Transformer (GS-Former) to generate discriminative queries and eliminate uninformative representations from unevenly distributed geometric signals.
arXiv Detail & Related papers (2024-12-16T15:20:03Z)
Geo-LLaVA: A Large Multi-Modal Model for Solving Geometry Math Problems with Meta In-Context Learning [4.4615747404424395]
Geometry mathematics problems pose significant challenges for large language models (LLMs)<n>We collect a geometry question-answer dataset by sourcing geometric data from Chinese high school education websites, referred to as GeoMath.<n>We propose a Large Multi-modal Model (LMM) framework named Geo-LLaVA, which incorporates retrieval augmentation with supervised fine-tuning (SFT) in the training stage, called meta-training, and employs in-context learning (ICL) during inference to improve performance.
arXiv Detail & Related papers (2024-12-12T07:34:09Z)
Disentangled Representation Learning with the Gromov-Monge Gap [65.73194652234848]
Learning disentangled representations from unlabelled data is a fundamental challenge in machine learning.<n>We introduce a novel approach to disentangled representation learning based on quadratic optimal transport.<n>We demonstrate the effectiveness of our approach for quantifying disentanglement across four standard benchmarks.
arXiv Detail & Related papers (2024-07-10T16:51:32Z)
Adaptive Surface Normal Constraint for Geometric Estimation from Monocular Images [56.86175251327466]
We introduce a novel approach to learn geometries such as depth and surface normal from images while incorporating geometric context. Our approach extracts geometric context that encodes the geometric variations present in the input image and correlates depth estimation with geometric constraints. Our method unifies depth and surface normal estimations within a cohesive framework, which enables the generation of high-quality 3D geometry from images.
arXiv Detail & Related papers (2024-02-08T17:57:59Z)
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model [121.07873620883322]
Large language models (LLMs) have shown remarkable proficiency in human-level reasoning and generation capabilities.<n>G-LLaVA demonstrates exceptional performance in solving geometric problems, significantly outperforming GPT-4-V on the MathVista benchmark with only 7B parameters.
arXiv Detail & Related papers (2023-12-18T17:36:20Z)
Geometry of Score Based Generative Models [2.4078030278859113]
We look at Score-based generative models (also called diffusion generative models) from a geometric perspective. We prove that both the forward and backward process of adding noise and generating from noise are Wasserstein gradient flow in the space of probability measures.
arXiv Detail & Related papers (2023-02-09T02:39:11Z)
UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression [127.68780714438103]
Two main geometry problems: calculation and proving, are usually treated as two specific tasks. We construct a large-scale Unified Geometry problem benchmark, UniGeo, which contains 4,998 calculation problems and 9,543 proving problems. We also present a unified multi-task Geometric Transformer framework, Geoformer, to tackle calculation and proving problems simultaneously.
arXiv Detail & Related papers (2022-12-06T04:37:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.