Related papers: Text-Enhanced Panoptic Symbol Spotting in CAD Drawings

Text-Enhanced Panoptic Symbol Spotting in CAD Drawings

URL: http://arxiv.org/abs/2510.11091v1
Date: Mon, 13 Oct 2025 07:41:15 GMT
Title: Text-Enhanced Panoptic Symbol Spotting in CAD Drawings
Authors: Xianlin Liu, Yan Gong, Bohao Li, Jiajing Huang, Bowen Du, Junchen Ye, Liyan Xu,
Abstract summary: Panoptic symbol spotting plays a vital role in enabling downstream applications such as CAD automation and design retrieval.<n>Existing methods primarily focus on geometric primitives within the CAD drawings.<n>We propose a panoptic symbol spotting framework that incorporates textual annotations.
Score: 14.367938077469008
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the widespread adoption of Computer-Aided Design(CAD) drawings in engineering, architecture, and industrial design, the ability to accurately interpret and analyze these drawings has become increasingly critical. Among various subtasks, panoptic symbol spotting plays a vital role in enabling downstream applications such as CAD automation and design retrieval. Existing methods primarily focus on geometric primitives within the CAD drawings to address this task, but they face following major problems: they usually overlook the rich textual annotations present in CAD drawings and they lack explicit modeling of relationships among primitives, resulting in incomprehensive understanding of the holistic drawings. To fill this gap, we propose a panoptic symbol spotting framework that incorporates textual annotations. The framework constructs unified representations by jointly modeling geometric and textual primitives. Then, using visual features extract by pretrained CNN as the initial representations, a Transformer-based backbone is employed, enhanced with a type-aware attention mechanism to explicitly model the different types of spatial dependencies between various primitives. Extensive experiments on the real-world dataset demonstrate that the proposed method outperforms existing approaches on symbol spotting tasks involving textual annotations, and exhibits superior robustness when applied to complex CAD drawings.

Related papers

Beyond Pixels: Visual Metaphor Transfer via Schema-Driven Agentic Reasoning [56.24016465596292]
A visual metaphor constitutes a high-order form of human creativity, employing cross-domain semantic fusion to transform abstract concepts into impactful visual rhetoric.<n>We introduce the task of Visual Metaphor Transfer (VMT), which challenges models to autonomously decouple the "creative essence" from a reference image and re-materialize that abstract logic onto a user-specified subject.<n>Our method significantly outperforms SOTA baselines in metaphor consistency, analogy appropriateness, and visual creativity, paving the way for automated high-impact creative applications in advertising and media.
arXiv Detail & Related papers (2026-02-01T17:01:36Z)
Hierarchical Process Reward Models are Symbolic Vision Learners [56.94353087007494]
Symbolic computer vision represents diagrams through explicit logical rules and structured representations, enabling interpretable understanding in machine vision.<n>This requires fundamentally different learning paradigms from pixel-based visual models.<n>We propose a novel self-supervised auto-encoder that encodes diagrams into primitives and decodes them through our executable engine to reconstruct input diagrams.
arXiv Detail & Related papers (2025-12-02T18:46:40Z)
Seeing through Imagination: Learning Scene Geometry via Implicit Spatial World Modeling [68.14113731953971]
This paper introduces MILO, an Implicit spatIaL wOrld modeling paradigm that simulates human-like imagination.<n>We show that our approach significantly enhances spatial reasoning capabilities across multiple baselines and benchmarks.
arXiv Detail & Related papers (2025-12-01T16:01:41Z)
Large Language Model Agent for Structural Drawing Generation Using ReAct Prompt Engineering and Retrieval Augmented Generation [3.326690511274941]
In civil engineering, structural drawings serve as the main communication tool between architects, engineers, and builders.<n>Despite advances in software capabilities, the task of generating a structural drawing remains labor-intensive and time-consuming.<n>Here we introduce a novel generative AI-based method for generating structural drawings employing a large language model (LLM) agent.
arXiv Detail & Related papers (2025-07-26T03:47:12Z)
Point or Line? Using Line-based Representation for Panoptic Symbol Spotting in CAD Drawings [45.116136045440584]
We study the task of panoptic symbol spotting in computer-aided design (CAD) drawings composed of vector graphical primitives.<n>Existing methods typically rely on imageization, graph construction, or point-based representation.<n>We propose VecFormer, a novel method that addresses these challenges through line-based representation of primitives.
arXiv Detail & Related papers (2025-05-29T12:33:11Z)
CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images [69.7768227804928]
CADCrafter is an image-to-parametric CAD model generation framework that trains solely on synthetic textureless CAD data.<n>We introduce a geometry encoder to accurately capture diverse geometric features.<n>Our approach can robustly handle real unconstrained CAD images, and even generalize to unseen general objects.
arXiv Detail & Related papers (2025-04-07T06:01:35Z)
CADSpotting: Robust Panoptic Symbol Spotting on Large-Scale CAD Drawings [56.05238657033198]
We introduce CADSpotting, an effective method for panoptic symbol spotting in large-scale architectural CAD drawings.<n>We also propose a novel Sliding Window Aggregation (SWA) technique that combines weighted voting and Non-Maximum Suppression (NMS)<n>Experiments on FloorPlanCAD and LS-CAD demonstrate that CADSpotting significantly outperforms existing methods.
arXiv Detail & Related papers (2024-12-10T10:22:17Z)
Biphasic Face Photo-Sketch Synthesis via Semantic-Driven Generative Adversarial Network with Graph Representation Learning [40.544844623958426]
We propose a novel Semantic-Driven Generative Adversarial Network to address the above issues. Considering that human faces have distinct spatial structures, we first inject class-wise semantic layouts into the generator. We construct two types of representational graphs via semantic parsing maps upon input faces, dubbed the IntrA-class Semantic Graph (IASG) and the InteR-class Structure Graph (IRSG)
arXiv Detail & Related papers (2022-01-05T13:14:14Z)
Reconstructing Interactive 3D Scenes by Panoptic Mapping and CAD Model Alignments [81.38641691636847]
We rethink the problem of scene reconstruction from an embodied agent's perspective. We reconstruct an interactive scene using RGB-D data stream. This reconstructed scene replaces the object meshes in the dense panoptic map with part-based articulated CAD models.
arXiv Detail & Related papers (2021-03-30T05:56:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.