Fugu-MT 論文翻訳(概要): SldprtNet: A Large-Scale Multimodal Dataset for CAD Generation in Language-Driven 3D Design

論文の概要: SldprtNet: A Large-Scale Multimodal Dataset for CAD Generation in Language-Driven 3D Design

arxiv url: http://arxiv.org/abs/2603.13098v1
Date: Fri, 13 Mar 2026 15:47:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-16 17:38:12.167867
Title: SldprtNet: A Large-Scale Multimodal Dataset for CAD Generation in Language-Driven 3D Design
Title（参考訳）: SldprtNet:言語駆動3D設計におけるCAD生成のための大規模マルチモーダルデータセット
Authors: Ruogu Li, Sikai Li, Yao Mu, Mingyu Ding,
Abstract要約: 我々は242,000以上の産業部品からなる大規模データセットであるSldprtNetを紹介した。このデータセットは、さまざまなトレーニングとテストをサポートするために、.stepと.sldprtフォーマットの両方で3Dモデルを提供する。慎重に選択された実世界の産業部品を備え、スケーラブルなデータセット拡張のためのツールをサポートする。
参考スコア（独自算出の注目度）: 26.634272863620975
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We introduce SldprtNet, a large-scale dataset comprising over 242,000 industrial parts, designed for semantic-driven CAD modeling, geometric deep learning, and the training and fine-tuning of multimodal models for 3D design. The dataset provides 3D models in both .step and .sldprt formats to support diverse training and testing. To enable parametric modeling and facilitate dataset scalability, we developed supporting tools, an encoder and a decoder, which support 13 types of CAD commands and enable lossless transformation between 3D models and a structured text representation. Additionally, each sample is paired with a composite image created by merging seven rendered views from different viewpoints of the 3D model, effectively reducing input token length and accelerating inference. By combining this image with the parameterized text output from the encoder, we employ the lightweight multimodal language model Qwen2.5-VL-7B to generate a natural language description of each part's appearance and functionality. To ensure accuracy, we manually verified and aligned the generated descriptions, rendered images, and 3D models. These descriptions, along with the parameterized modeling scripts, rendered images, and 3D model files, are fully aligned to construct SldprtNet. To assess its effectiveness, we fine-tuned baseline models on a dataset subset, comparing image-plus-text inputs with text-only inputs. Results confirm the necessity and value of multimodal datasets for CAD generation. It features carefully selected real-world industrial parts, supporting tools for scalable dataset expansion, diverse modalities, and ensured diversity in model complexity and geometric features, making it a comprehensive multimodal dataset built for semantic-driven CAD modeling and cross-modal learning.
Abstract（参考訳）: SldprtNetは242,000以上の産業部品からなる大規模データセットであり、意味駆動CADモデリング、幾何学的深層学習、および3次元デザインのためのマルチモーダルモデルの訓練と微調整のために設計されている。データセットはどちらも3Dモデルを提供する。ステップ・アンド・多様なトレーニングとテストをサポートする。 13種類のCADコマンドをサポートし、3次元モデルと構造化テキスト表現間のロスレス変換を可能にする。さらに、各サンプルは、3Dモデルの異なる視点から7つのレンダリングビューをマージして生成された合成画像とペアリングされ、入力トークン長を効果的に低減し、推論を加速する。この画像をエンコーダから出力されるパラメータ化テキストと組み合わせることで、軽量なマルチモーダル言語モデルQwen2.5-VL-7Bを用いて、各部分の外観と機能に関する自然言語記述を生成する。精度を確保するために、生成した記述、レンダリング画像、および3Dモデルを手動で検証し、アライメントした。これらの記述は、パラメータ化されたモデリングスクリプト、レンダリングされた画像、および3Dモデルファイルとともに、SldprtNetを構築するために完全に整列されている。その有効性を評価するため、データセットサブセット上でベースラインモデルを微調整し、画像+テキスト入力とテキストのみ入力を比較した。結果はCAD生成のためのマルチモーダルデータセットの必要性と価値を確認する。それは、慎重に選択された実世界の産業部品、スケーラブルなデータセット拡張のためのツールのサポート、多様なモダリティ、モデルの複雑さと幾何学的特徴の多様性を保証すること、セマンティック駆動CADモデリングとクロスモーダル学習のための包括的なマルチモーダルデータセットである。

論文の概要: SldprtNet: A Large-Scale Multimodal Dataset for CAD Generation in Language-Driven 3D Design

関連論文リスト