Fugu-MT 論文翻訳(概要): Ishigaki-IDS-Bench: A Benchmark for Generating Information Delivery Specification from BIM Information Requirements

論文の概要: Ishigaki-IDS-Bench: A Benchmark for Generating Information Delivery Specification from BIM Information Requirements

arxiv url: http://arxiv.org/abs/2605.22079v1
Date: Thu, 21 May 2026 07:19:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-22 16:35:42.133442
Title: Ishigaki-IDS-Bench: A Benchmark for Generating Information Delivery Specification from BIM Information Requirements
Title（参考訳）: 石垣IDS-Bench:BIM情報要求から情報配信仕様を生成するベンチマーク
Authors: Ryo Kanazawa, Koyo Hidaka, Teppei Miyamoto, Takayuki Kato, Tomoki Ando, Chenguang Wang, Dayuan Jiang, Naofumi Fujita, Shuhei Saitoh, Atomu Kondo, Koki Arakawa, Daiho Nishioka,
Abstract要約: 本稿では,IDS(Information Delivery Specification)XMLを生成する能力を評価するベンチマークであるIshigaki-IDS-Benchについて述べる。ベンチマークには166人のBIM/IDS専門家による、検証済みのサンプルが含まれている。 IDSAuditToolベースのProcessability、Structure、Content監査と、ゴールドIDSファイルに対するコンテンツ収集評価を組み合わせる。
参考スコア（独自算出の注目度）: 1.6608087520579546
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) are widely used to generate structured outputs such as JSON, SQL, and code, yet public resources remain limited for evaluating generation that must simultaneously satisfy industry-standard XML and domain vocabulary constraints. This paper presents Ishigaki-IDS-Bench, a benchmark for evaluating the ability to generate Information Delivery Specification (IDS) XML from Building Information Modeling (BIM) information requirements. The benchmark contains 166 BIM/IDS expert-authored and verified examples created by expanding 83 practical scenarios into Japanese and English, corresponding gold IDS files, and metadata for input format, language, turn setting, IFC version, and construction domain. Its evaluation combines IDSAuditTool-based Processability, Structure, and Content audits with content-agreement evaluation against gold IDS files. In zero-shot evaluation over 10 LLMs, the best model reaches 65.6% macro F1 for content agreement, while only 27.7% of outputs pass the Content audit. These results show that current LLMs can express part of the information requirements as IDS, but still struggle to stably generate XML that satisfies the IDS standard and IFC vocabulary constraints. Ishigaki-IDS-Bench supports comparative evaluation, failure analysis, and the development of constrained structured generation methods that conform to domain standards. We release the evaluation scripts and benchmark data under the CC BY 4.0 license on GitHub and Hugging Face.
Abstract（参考訳）: 大規模言語モデル(LLM)は、JSON、SQL、コードなどの構造化された出力を生成するために広く使用されているが、業界標準のXMLとドメイン語彙の制約を同時に満たさなければならない生成を評価するために、公開リソースは限られている。本稿では,ビルディング情報モデリング(BIM)情報要求からIDS(Information Delivery Specification)XMLを生成する能力を評価するベンチマークであるIshigaki-IDS-Benchを提案する。ベンチマークには、83の実践シナリオを日本語と英語に拡張した166のBIM/IDS専門家による検証例、対応するゴールドIDSファイル、入力形式、言語、ターン設定、IFCバージョン、構築ドメインのメタデータが含まれている。その評価は、IDSAuditToolベースのProcessability、Structure、Content auditsと、ゴールドIDSファイルに対するコンテンツ収集評価を組み合わせたものである。 10 LLMのゼロショット評価では、最高のモデルがコンテント契約で65.6%のマクロF1に達し、コンテント監査に合格したのは27.7%である。これらの結果から、現在のLLMはIDSとして情報要求の一部を表現できるが、IDS標準とIFC語彙の制約を満たすXMLを安定的に生成するのは難しいことが分かる。 Ishigaki-IDS-Benchは、比較評価、故障解析、およびドメイン標準に準拠した制約付き構造化生成手法の開発をサポートする。評価スクリプトとベンチマークデータを、GitHubとHugging FaceのCC BY 4.0ライセンスでリリースしています。

論文の概要: Ishigaki-IDS-Bench: A Benchmark for Generating Information Delivery Specification from BIM Information Requirements

関連論文リスト