Related papers: Leveraging Vision-Language Models for Manufacturing Feature Recognition in CAD Designs

Leveraging Vision-Language Models for Manufacturing Feature Recognition in CAD Designs

URL: http://arxiv.org/abs/2411.02810v1
Date: Tue, 05 Nov 2024 04:57:55 GMT
Title: Leveraging Vision-Language Models for Manufacturing Feature Recognition in CAD Designs
Authors: Muhammad Tayyab Khan, Lequn Chen, Ye Han Ng, Wenhe Feng, Nicholas Yew Jin Tan, Seung Ki Moon,
Abstract summary: This study investigates vision-language models (VLMs) for automating the recognition of a wide range of manufacturing features in CAD designs. prompt engineering techniques, such as multi-view query images, few-shot learning, sequential reasoning, and chain-of-thought, are applied to enable recognition.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automatic feature recognition (AFR) is essential for transforming design knowledge into actionable manufacturing information. Traditional AFR methods, which rely on predefined geometric rules and large datasets, are often time-consuming and lack generalizability across various manufacturing features. To address these challenges, this study investigates vision-language models (VLMs) for automating the recognition of a wide range of manufacturing features in CAD designs without the need for extensive training datasets or predefined rules. Instead, prompt engineering techniques, such as multi-view query images, few-shot learning, sequential reasoning, and chain-of-thought, are applied to enable recognition. The approach is evaluated on a newly developed CAD dataset containing designs of varying complexity relevant to machining, additive manufacturing, sheet metal forming, molding, and casting. Five VLMs, including three closed-source models (GPT-4o, Claude-3.5-Sonnet, and Claude-3.0-Opus) and two open-source models (LLava and MiniCPM), are evaluated on this dataset with ground truth features labelled by experts. Key metrics include feature quantity accuracy, feature name matching accuracy, hallucination rate, and mean absolute error (MAE). Results show that Claude-3.5-Sonnet achieves the highest feature quantity accuracy (74%) and name-matching accuracy (75%) with the lowest MAE (3.2), while GPT-4o records the lowest hallucination rate (8%). In contrast, open-source models have higher hallucination rates (>30%) and lower accuracies (<40%). This study demonstrates the potential of VLMs to automate feature recognition in CAD designs within diverse manufacturing scenarios.

Related papers

CReFT-CAD: Boosting Orthographic Projection Reasoning for CAD via Reinforcement Fine-Tuning [50.867869718716555]
We introduce CReFT-CAD, a two-stage fine-tuning paradigm that first employs a curriculum-driven reinforcement learning stage with difficulty-aware rewards to build reasoning ability steadily.<n>We release TriView2CAD, the first large-scale, open-source benchmark for orthographic projection reasoning.
arXiv Detail & Related papers (2025-05-31T13:52:56Z)
EfficientLLM: Efficiency in Large Language Models [64.3537131208038]
Large Language Models (LLMs) have driven significant progress, yet their growing counts and context windows incur prohibitive compute, energy, and monetary costs.<n>We introduce EfficientLLM, a novel benchmark and the first comprehensive empirical study evaluating efficiency techniques for LLMs at scale.
arXiv Detail & Related papers (2025-05-20T02:27:08Z)
How Certain are Uncertainty Estimates? Three Novel Earth Observation Datasets for Benchmarking Uncertainty Quantification in Machine Learning [15.306338199978269]
Uncertainty quantification (UQ) is essential for assessing the reliability of Earth observation (EO) products. Various UQ methods do exist for machine learning models, their performance on EO datasets remains largely unevaluated. This article introduces three benchmark datasets specifically designed for UQ in EO machine learning models.
arXiv Detail & Related papers (2024-12-09T12:50:27Z)
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale [66.73529246309033]
multimodal large language models (MLLMs) have shown significant potential in a broad range of multimodal tasks. Existing instruction-tuning datasets only provide phrase-level answers without any intermediate rationales. We introduce a scalable and cost-effective method to construct a large-scale multimodal instruction-tuning dataset with rich intermediate rationales.
arXiv Detail & Related papers (2024-12-06T18:14:24Z)
Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark [62.58869921806019]
We propose a task decomposition evaluation framework based on GPT-4o to automatically construct a new training dataset. We design innovative training strategies to effectively distill GPT-4o's evaluation capabilities into a 7B open-source MLLM, MiniCPM-V-2.6. Experimental results demonstrate that our distilled open-source MLLM significantly outperforms the current state-of-the-art GPT-4o-base baseline.
arXiv Detail & Related papers (2024-11-23T08:06:06Z)
Fine-Tuning Vision-Language Model for Automated Engineering Drawing Information Extraction [0.0]
Florence-2 is an open-source vision-automated model (VLM) It is trained on a dataset of 400 drawings with ground truth annotations provided by domain experts. It achieves a 29.95% increase in precision, a 37.75% increase in recall, a 52.40% improvement in F1-score, and a 43.15% reduction in hallucination rate.
arXiv Detail & Related papers (2024-11-06T07:11:15Z)
On the Effectiveness of LLMs for Manual Test Verifications [1.920300814128832]
This study aims to explore the use of Large Language Models (LLMs) to produce verifications for manual tests. Open-source models Mistral-7B and Phi-3-mini-4k demonstrated effectiveness and consistency comparable to closed-source models. There were also concerns about AI hallucinations, where verifications significantly deviated from expectations.
arXiv Detail & Related papers (2024-09-19T02:03:04Z)
Enhanced Infield Agriculture with Interpretable Machine Learning Approaches for Crop Classification [0.49110747024865004]
This research evaluates four different approaches for crop classification, namely traditional ML with handcrafted feature extraction methods like SIFT, ORB, and Color Histogram; Custom Designed CNN and established DL architecture like AlexNet; transfer learning on five models pre-trained using ImageNet. Xception outperformed all of them in terms of generalization, achieving 98% accuracy on the test data, with a model size of 80.03 MB and a prediction time of 0.0633 seconds.
arXiv Detail & Related papers (2024-08-22T14:20:34Z)
RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness [94.03511733306296]
We introduce RLAIF-V, a framework that aligns MLLMs in a fully open-source paradigm for super GPT-4V trustworthiness. RLAIF-V maximally exploits the open-source feedback from two perspectives, including high-quality feedback data and online feedback learning algorithm. Experiments show that RLAIF-V substantially enhances the trustworthiness of models without sacrificing performance on other tasks.
arXiv Detail & Related papers (2024-05-27T14:37:01Z)
Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding [55.32861154245772]
Calib3D is a pioneering effort to benchmark and scrutinize the reliability of 3D scene understanding models. We evaluate 28 state-of-the-art models across 10 diverse 3D datasets. We introduce DeptS, a novel depth-aware scaling approach aimed at enhancing 3D model calibration.
arXiv Detail & Related papers (2024-03-25T17:59:59Z)
Large Language Models as Automated Aligners for benchmarking Vision-Language Models [48.4367174400306]
Vision-Language Models (VLMs) have reached a new level of sophistication, showing notable competence in executing intricate cognition and reasoning tasks. Existing evaluation benchmarks, primarily relying on rigid, hand-crafted datasets, face significant limitations in assessing the alignment of these increasingly anthropomorphic models with human intelligence. In this work, we address the limitations via Auto-Bench, which delves into exploring LLMs as proficient curation, measuring the alignment betweenVLMs and human intelligence and value through automatic data curation and assessment.
arXiv Detail & Related papers (2023-11-24T16:12:05Z)
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets. Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z)
Large Language Models in the Workplace: A Case Study on Prompt Engineering for Job Type Classification [58.720142291102135]
This case study investigates the task of job classification in a real-world setting. The goal is to determine whether an English-language job posting is appropriate for a graduate or entry-level position.
arXiv Detail & Related papers (2023-03-13T14:09:53Z)
Generative Multi-Stream Architecture For American Sign Language Recognition [15.717424753251674]
Training on datasets with low feature-richness for complex applications limit optimal convergence below human performance. We propose a generative multistream architecture, eliminating the need for additional hardware with the intent to improve feature convergence without risking impracticability. Our methods have achieved 95.62% validation accuracy with a variance of 1.42% from training, outperforming past models by 0.45% in validation accuracy and 5.53% in variance.
arXiv Detail & Related papers (2020-03-09T21:04:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.