GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design
- URL: http://arxiv.org/abs/2409.17045v1
- Date: Wed, 25 Sep 2024 15:57:59 GMT
- Title: GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design
- Authors: Phillip Mueller, Sebastian Mueller, Lars Mikelsons,
- Abstract summary: GeoBiked is curated to contain 4 355 bicycle images, annotated with structural and technical features.
We propose methods to automate data labeling by utilizing large-scale foundation models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We provide a dataset for enabling Deep Generative Models (DGMs) in engineering design and propose methods to automate data labeling by utilizing large-scale foundation models. GeoBiked is curated to contain 4 355 bicycle images, annotated with structural and technical features and is used to investigate two automated labeling techniques: The utilization of consolidated latent features (Hyperfeatures) from image-generation models to detect geometric correspondences (e.g. the position of the wheel center) in structural images and the generation of diverse text descriptions for structural images. GPT-4o, a vision-language-model (VLM), is instructed to analyze images and produce diverse descriptions aligned with the system-prompt. By representing technical images as Diffusion-Hyperfeatures, drawing geometric correspondences between them is possible. The detection accuracy of geometric points in unseen samples is improved by presenting multiple annotated source images. GPT-4o has sufficient capabilities to generate accurate descriptions of technical images. Grounding the generation only on images leads to diverse descriptions but causes hallucinations, while grounding it on categorical labels restricts the diversity. Using both as input balances creativity and accuracy. Successfully using Hyperfeatures for geometric correspondence suggests that this approach can be used for general point-detection and annotation tasks in technical images. Labeling such images with text descriptions using VLMs is possible, but dependent on the models detection capabilities, careful prompt-engineering and the selection of input information. Applying foundation models in engineering design is largely unexplored. We aim to bridge this gap with a dataset to explore training, finetuning and conditioning DGMs in this field and suggesting approaches to bootstrap foundation models to process technical images.
Related papers
- GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation [65.33726478659304]
We introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory.
Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images.
GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms.
arXiv Detail & Related papers (2024-06-21T17:49:31Z) - GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation [15.931398242118073]
GPT-4 and GPT-4V are used to generate basic geometry problems with aligned text and images.
We have produced a dataset of 4.9K geometry problems and combined it with 19K open-source data to form our GeoGPT4V dataset.
Results demonstrate that the GeoGPT4V dataset significantly improves the geometry performance of various models on the MathVista and MathVision benchmarks.
arXiv Detail & Related papers (2024-06-17T13:04:27Z) - GeoDecoder: Empowering Multimodal Map Understanding [3.164495478670176]
GeoDecoder is a dedicated multimodal model designed for processing geospatial information in maps.
Built on the BeitGPT architecture, GeoDecoder incorporates specialized expert modules for image and text processing.
arXiv Detail & Related papers (2024-01-26T02:39:40Z) - GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data
Generation [91.01581867841894]
We propose the GeoDiffusion, a simple framework that can flexibly translate various geometric conditions into text prompts.
Our GeoDiffusion is able to encode not only the bounding boxes but also extra geometric conditions such as camera views in self-driving scenes.
arXiv Detail & Related papers (2023-06-07T17:17:58Z) - CSP: Self-Supervised Contrastive Spatial Pre-Training for
Geospatial-Visual Representations [90.50864830038202]
We present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images.
We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images.
CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.
arXiv Detail & Related papers (2023-05-01T23:11:18Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - Geo-SIC: Learning Deformable Geometric Shapes in Deep Image Classifiers [8.781861951759948]
This paper presents Geo-SIC, the first deep learning model to learn deformable shapes in a deformation space for an improved performance of image classification.
We introduce a newly designed framework that (i) simultaneously derives features from both image and latent shape spaces with large intra-class variations.
We develop a boosted classification network, equipped with an unsupervised learning of geometric shape representations.
arXiv Detail & Related papers (2022-10-25T01:55:17Z) - Conditional Generation of Synthetic Geospatial Images from Pixel-level
and Feature-level Inputs [0.0]
We present a conditional generative model, called VAE-Info-cGAN, for synthesizing semantically rich images simultaneously conditioned on a pixel-level condition (PLC) and a feature-level condition (FLC)
The proposed model can accurately generate various forms of macroscopic aggregates across different geographic locations while conditioned only on atemporal representation of the road network.
arXiv Detail & Related papers (2021-09-11T06:58:19Z) - VAE-Info-cGAN: Generating Synthetic Images by Combining Pixel-level and
Feature-level Geospatial Conditional Inputs [0.0]
We present a conditional generative model for synthesizing semantically rich images simultaneously conditioned on a pixellevel (PLC) and a featurelevel condition (FLC)
Experiments on a GPS dataset show that the proposed model can accurately generate various forms of macroscopic aggregates across different geographic locations.
arXiv Detail & Related papers (2020-12-08T03:46:19Z) - Graph Signal Processing for Geometric Data and Beyond: Theory and
Applications [55.81966207837108]
Graph Signal Processing (GSP) enables processing signals that reside on irregular domains.
GSP methodologies for geometric data in a unified manner by bridging the connections between geometric data and graphs.
Recently developed Graph Neural Networks (GNNs) interpret the operation of these networks from the perspective of GSP.
arXiv Detail & Related papers (2020-08-05T03:20:16Z) - Two-shot Spatially-varying BRDF and Shape Estimation [89.29020624201708]
We propose a novel deep learning architecture with a stage-wise estimation of shape and SVBRDF.
We create a large-scale synthetic training dataset with domain-randomized geometry and realistic materials.
Experiments on both synthetic and real-world datasets show that our network trained on a synthetic dataset can generalize well to real-world images.
arXiv Detail & Related papers (2020-04-01T12:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.