Related papers: Automatic Multi-level Feature Tree Construction for Domain-Specific Reusable Artifacts Management

Automatic Multi-level Feature Tree Construction for Domain-Specific Reusable Artifacts Management

URL: http://arxiv.org/abs/2506.03946v2
Date: Sun, 06 Jul 2025 06:21:15 GMT
Title: Automatic Multi-level Feature Tree Construction for Domain-Specific Reusable Artifacts Management
Authors: Dongming Jin, Zhi Jin, Nianyu Li, Kai Yang, Linyu Li, Suijing Guan,
Abstract summary: This paper proposes an automatic multi-level feature tree construction framework named FTBUILDER.<n>It automatically crawls domain-specific software repositories and merges their metadata to construct a structured artifact library.<n>It can save developers more time in selecting artifacts by 26% and improve the accuracy of artifact recommendations with GPT-4 by 235%.
Score: 15.822095826931942
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the rapid growth of open-source ecosystems (e.g., Linux) and domain-specific software projects (e.g., aerospace), efficient management of reusable artifacts is becoming increasingly crucial for software reuse. The multi-level feature tree enables semantic management based on functionality and supports requirements-driven artifact selection. However, constructing such a tree heavily relies on domain expertise, which is time-consuming and labor-intensive. To address this issue, this paper proposes an automatic multi-level feature tree construction framework named FTBUILDER, which consists of three stages. It automatically crawls domain-specific software repositories and merges their metadata to construct a structured artifact library. It employs clustering algorithms to identify a set of artifacts with common features. It constructs a prompt and uses LLMs to summarize their common features. FTBUILDER recursively applies the identification and summarization stages to construct a multi-level feature tree from the bottom up. To validate FTBUILDER, we conduct experiments from multiple aspects (e.g., tree quality and time cost) using the Linux distribution ecosystem. Specifically, we first simultaneously develop and evaluate 24 alternative solutions in the FTBUILDER. We then construct a three-level feature tree using the best solution among them. Compared to the official feature tree, our tree exhibits higher quality, with a 9% improvement in the silhouette coefficient and an 11% increase in GValue. Furthermore, it can save developers more time in selecting artifacts by 26% and improve the accuracy of artifact recommendations with GPT-4 by 235%. FTBUILDER can be extended to other open-source software communities and domain-specific industrial enterprises.

Related papers

ReTreever: Tree-based Coarse-to-Fine Representations for Retrieval [64.44265315244579]
We propose a tree-based method for organizing and representing reference documents at various granular levels.<n>Our method, called ReTreever, jointly learns a routing function per internal node of a binary tree such that query and reference documents are assigned to similar tree branches.<n>Our evaluations show that ReTreever generally preserves full representation accuracy.
arXiv Detail & Related papers (2025-02-11T21:35:13Z)
LLMs for Generation of Architectural Components: An Exploratory Empirical Study in the Serverless World [0.0]
This paper studies the capability of Large Language Models to generate architectural components for Functions as a Service (F)<n>The small size of their architectural components make this architectural style amenable for generation using current LLMs.<n>We evaluate correctness through existing tests present in the repositories and use metrics from the Software Engineering (SE) and Natural Language Processing (NLP) domains.
arXiv Detail & Related papers (2025-02-04T18:06:04Z)
CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models [106.11371409170818]
Large language models (LLMs) can act as agents with capabilities to self-refine and improve generated code autonomously. We propose CodeTree, a framework for LLM agents to efficiently explore the search space in different stages of the code generation process. Specifically, we adopted a unified tree structure to explicitly explore different coding strategies, generate corresponding coding solutions, and subsequently refine the solutions.
arXiv Detail & Related papers (2024-11-07T00:09:54Z)
Supporting Software Maintenance with Dynamically Generated Document Hierarchies [41.407915858583344]
We present HGEN, a fully automated pipeline that transforms source code through a series of six stages into a well-organized hierarchy of formatted documents. We evaluate HGEN both quantitatively and qualitatively. Results show that HGEN produces artifact hierarchies similar in quality to manually constructed documentation, with much higher coverage of the core concepts than the baseline approach.
arXiv Detail & Related papers (2024-08-11T17:11:14Z)
Tree-of-Traversals: A Zero-Shot Reasoning Algorithm for Augmenting Black-box Language Models with Knowledge Graphs [72.89652710634051]
Knowledge graphs (KGs) complement Large Language Models (LLMs) by providing reliable, structured, domain-specific, and up-to-date external knowledge. We introduce Tree-of-Traversals, a novel zero-shot reasoning algorithm that enables augmentation of black-box LLMs with one or more KGs.
arXiv Detail & Related papers (2024-07-31T06:01:24Z)
AdaTreeFormer: Few Shot Domain Adaptation for Tree Counting from a Single High-Resolution Image [11.649568595318307]
This paper proposes a framework that is learnt from the source domain with sufficient labeled trees. It is adapted to the target domain with only a limited number of labeled trees. Experimental results show that AdaTreeFormer significantly surpasses the state of the art.
arXiv Detail & Related papers (2024-02-05T12:34:03Z)
Enhancing Open-Domain Task-Solving Capability of LLMs via Autonomous Tool Integration from GitHub [79.31134731122462]
We introduce OpenAct benchmark to evaluate the open-domain task-solving capability, built on human expert consultation and repositories in GitHub.<n>We present OpenAgent, a novel LLM-based agent system that can tackle evolving queries in open domains through autonomously integrating specialized tools from GitHub.
arXiv Detail & Related papers (2023-12-28T15:47:30Z)
Flexible Modeling and Multitask Learning using Differentiable Tree Ensembles [6.037383467521294]
We propose a flexible framework for learning tree ensembles to support arbitrary loss functions, missing responses, and multi-task learning. Our framework builds on differentiable tree ensembles, which can be trained using first-order methods. We show that our framework can lead to 100x more compact and 23% more expressive tree ensembles than those by popular toolkits.
arXiv Detail & Related papers (2022-05-19T17:30:49Z)
Simplified DOM Trees for Transferable Attribute Extraction from the Web [15.728164692696689]
Given a web page, extracting a structured object along with various attributes of interest can facilitate a variety of downstream applications. Existing approaches formulate the problem as a DOM tree node tagging task. We propose a novel transferable method, SimpDOM, to tackle the problem by efficiently retrieving useful context for each node.
arXiv Detail & Related papers (2021-01-07T07:41:55Z)
Rethinking Learnable Tree Filter for Generic Feature Transform [71.77463476808585]
Learnable Tree Filter presents a remarkable approach to model structure-preserving relations for semantic segmentation. To relax the geometric constraint, we give the analysis by reformulating it as a Markov Random Field and introduce a learnable unary term. For semantic segmentation, we achieve leading performance (82.1% mIoU) on the Cityscapes benchmark without bells-and-whistles.
arXiv Detail & Related papers (2020-12-07T07:16:47Z)
MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search. Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.