Related papers: Domain-Specific Fine-Tuning and Prompt-Based Learning: A Comparative Study for developing Natural Language-Based BIM Information Retrieval Systems

Domain-Specific Fine-Tuning and Prompt-Based Learning: A Comparative Study for developing Natural Language-Based BIM Information Retrieval Systems

URL: http://arxiv.org/abs/2508.05676v1
Date: Tue, 05 Aug 2025 08:51:51 GMT
Title: Domain-Specific Fine-Tuning and Prompt-Based Learning: A Comparative Study for developing Natural Language-Based BIM Information Retrieval Systems
Authors: Han Gao, Timo Hartmann, Botao Zhong, Kai Lia, Hanbin Luo,
Abstract summary: Natural Language Interface (NLI) systems are increasingly explored as user-friendly tools for information retrieval in Building Information Modeling environments.<n>Despite their potential, accurately extracting BIM-related data through natural language queries remains a persistent challenge.<n>This study presents a comparative analysis of two prominent approaches for developing NLI-based BIM information retrieval systems.
Score: 2.686558478755501
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Building Information Modeling (BIM) is essential for managing building data across the entire lifecycle, supporting tasks from design to maintenance. Natural Language Interface (NLI) systems are increasingly explored as user-friendly tools for information retrieval in Building Information Modeling (BIM) environments. Despite their potential, accurately extracting BIM-related data through natural language queries remains a persistent challenge due to the complexity use queries and specificity of domain knowledge. This study presents a comparative analysis of two prominent approaches for developing NLI-based BIM information retrieval systems: domain-specific fine-tuning and prompt-based learning using large language models (LLMs). A two-stage framework consisting of intent recognition and table-based question answering is implemented to evaluate the effectiveness of both approaches. To support this evaluation, a BIM-specific dataset of 1,740 annotated queries of varying types across 69 models is constructed. Experimental results show that domain-specific fine-tuning delivers superior performance in intent recognition tasks, while prompt-based learning, particularly with GPT-4o, shows strength in table-based question answering. Based on these findings, this study identify a hybrid configuration that combines fine-tuning for intent recognition with prompt-based learning for question answering, achieving more balanced and robust performance across tasks. This integrated approach is further tested through case studies involving BIM models of varying complexity. This study provides a systematic analysis of the strengths and limitations of each approach and discusses the applicability of the NLI to real-world BIM scenarios. The findings offer insights for researchers and practitioners in designing intelligent, language-driven BIM systems.

Related papers

A Framework for Generating Artificial Datasets to Validate Absolute and Relative Position Concepts [2.0391237204597368]
The framework focuses on fundamental concepts such as object recognition, absolute and relative positions, and attribute identification.<n>The proposed framework offers a valuable instrument for generating diverse and comprehensive datasets.
arXiv Detail & Related papers (2025-09-17T18:37:24Z)
Advancing AI Research Assistants with Expert-Involved Learning [84.30323604785646]
Large language models (LLMs) and large multimodal models (LMMs) promise to accelerate biomedical discovery, yet their reliability remains unclear.<n>We introduce ARIEL (AI Research Assistant for Expert-in-the-Loop Learning), an open-source evaluation and optimization framework.<n>We find that state-of-the-art models generate fluent but incomplete summaries, whereas LMMs struggle with detailed visual reasoning.
arXiv Detail & Related papers (2025-05-03T14:21:48Z)
Evaluating Multi-Hop Reasoning in Large Language Models: A Chemistry-Centric Case Study [0.9424565541639368]
We introduce a new benchmark consisting of a curated dataset and a defined evaluation process to assess the compositional reasoning capabilities of large language models within the chemistry domain.<n>Our approach integrates OpenAI reasoning models with named entity recognition (NER) systems to extract chemical entities from recent literature, which are then augmented with external knowledge bases to form a knowledge graph.<n>Our experiments reveal that even state-of-the-art models face significant challenges in multi-hop compositional reasoning.
arXiv Detail & Related papers (2025-04-23T04:36:19Z)
LLM-based Bi-level Multi-interest Learning Framework for Sequential Recommendation [54.396000434574454]
We propose a novel multi-interest SR framework combining implicit behavioral and explicit semantic perspectives.<n>It includes two modules: the Implicit Behavioral Interest Module and the Explicit Semantic Interest Module.<n>Experiments on four real-world datasets validate the framework's effectiveness and practicality.
arXiv Detail & Related papers (2024-11-14T13:00:23Z)
Aggregated Knowledge Model: Enhancing Domain-Specific QA with Fine-Tuned and Retrieval-Augmented Generation Models [0.0]
This paper introduces a novel approach to enhancing closed-domain Question Answering (QA) systems. It focuses on the specific needs of the Lawrence Berkeley National Laboratory (LBL) Science Information Technology (ScienceIT) domain.
arXiv Detail & Related papers (2024-10-24T00:49:46Z)
BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data [61.936320820180875]
Large language models (LLMs) have become increasingly pivotal across various domains. BabelBench is an innovative benchmark framework that evaluates the proficiency of LLMs in managing multimodal multistructured data with code execution. Our experimental findings on BabelBench indicate that even cutting-edge models like ChatGPT 4 exhibit substantial room for improvement.
arXiv Detail & Related papers (2024-10-01T15:11:24Z)
NeedleBench: Evaluating LLM Retrieval and Reasoning Across Varying Information Densities [51.07379913779232]
NeedleBench is a framework for assessing retrieval and reasoning performance in long-context tasks.<n>It embeds key data points at varying depths to rigorously test model capabilities.<n>Our experiments reveal that reasoning models like Deep-R1 and OpenAI's o3 struggle with continuous retrieval and reasoning in information-dense scenarios.
arXiv Detail & Related papers (2024-07-16T17:59:06Z)
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering. Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z)
STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases [93.96463520716759]
We develop STARK, a large-scale Semi-structure retrieval benchmark on Textual and Knowledge Bases. Our benchmark covers three domains: product search, academic paper search, and queries in precision medicine. We design a novel pipeline to synthesize realistic user queries that integrate diverse relational information and complex textual properties.
arXiv Detail & Related papers (2024-04-19T22:54:54Z)
On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering [25.57202500348071]
This study introduces a new long-form database question answering dataset designed to evaluate how Large Language Models interact with a database. The task requires LLMs to strategically generate multiplesql queries to retrieve sufficient data from a database, to reason with the acquired context, and to synthesize them into a comprehensive analytical narrative. We propose and evaluate two interaction strategies, and provide a fine-grained analysis of the individual stages within the interaction.
arXiv Detail & Related papers (2023-11-16T09:55:07Z)
DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge. Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z)
Dual Semantic Knowledge Composed Multimodal Dialog Systems [114.52730430047589]
We propose a novel multimodal task-oriented dialog system named MDS-S2. It acquires the context related attribute and relation knowledge from the knowledge base. We also devise a set of latent query variables to distill the semantic information from the composed response representation.
arXiv Detail & Related papers (2023-05-17T06:33:26Z)
An ontology-aided, natural language-based approach for multi-constraint BIM model querying [0.0]
This paper presents a novel ontology-aided semantic to automatically map natural language queries (NLQs) that contain different constraints into computer-readable codes for querying complex BIM models. A case study about the design-checking of a real-world residential building demonstrates the practical value of the proposed approach in the construction industry.
arXiv Detail & Related papers (2023-03-27T11:35:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.