iScript: A Domain-Adapted Large Language Model and Benchmark for Physical Design Tcl Script Generation
- URL: http://arxiv.org/abs/2603.04476v1
- Date: Wed, 04 Mar 2026 15:20:35 GMT
- Title: iScript: A Domain-Adapted Large Language Model and Benchmark for Physical Design Tcl Script Generation
- Authors: Ning Xu, Zhaoyang Zhang, Senlin Shu, Lei Qi, Jiaqi Lv, Wensuo Wang, Tianhao Zhao, Chao Zhang, Zhaoliang Yang, Xiangyu Li, Zhaorui Su, Jingshan Li, Xin Geng,
- Abstract summary: iScript is a domain-adapted Qwen3-8B model for Innovus Tcl script generation.<n>iScript shows higher pass@k scores than currently state-of-the-art LLMs on average.
- Score: 48.502477318243386
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern EDA flows rely heavily on Tcl scripting, yet general LLMs perform poorly in this domain due to extreme data scarcity, domain-specific semantics, and the high reliability required in physical design. We present iScript, a domain-adapted Qwen3-8B model for Innovus Tcl script generation, and iScript-Bench, a comprehensive benchmark covering five task categories and three difficulty levels. To overcome the lack of training data, we introduce a multi-stage data synthesis pipeline that integrates command extraction, static linting, requirement back-inference, and Chain-of-Thought generation, producing a 10K-tuple (requirement, CoT, script) dataset. iScript is trained through a two-stage strategy combining domain-adaptive pretraining and supervised fine-tuning. To evaluate script correctness efficiently, we further propose a two-step verification framework consisting of static syntax verification and LLM-based functional evaluation. On our benchmark, iScript shows higher pass@k scores than currently state-of-the-art LLMs on average. These results demonstrate the effectiveness of domain adaptation and data synthesis for EDA scripting tasks.
Related papers
- ARISE -- Adaptive Refinement and Iterative Scenario Engineering [6.001986980495572]
We introduce ARISE - Adaptive Refinement and Iterative Scenario Engineering.<n>It converts natural language prompts into executable Scenic scripts.<n>ARISE outperforms the baseline in generating semantically accurate and executable traffic scenarios.
arXiv Detail & Related papers (2026-01-21T07:57:24Z) - TIT: A Tree-Structured Instruction Tuning Approach for LLM-Based Code Translation [11.882496324328905]
We propose TIT, a Tree-structured Instruction Tuning paradigm for LLM-based code translation.<n>To mitigate syntactic confusion, the syntactic information representation module integrates language-agnostic syntactic features.<n>To generate high-quality fine-grained parallel data, the fine-grained parallel dataset augmentation module aligns nodes with code segments.
arXiv Detail & Related papers (2025-10-10T13:53:46Z) - IDA-Bench: Evaluating LLMs on Interactive Guided Data Analysis [60.32962597618861]
IDA-Bench is a novel benchmark evaluating large language models in multi-round interactive scenarios.<n>Agent performance is judged by comparing its final numerical output to the human-derived baseline.<n>Even state-of-the-art coding agents (like Claude-3.7-thinking) succeed on 50% of the tasks, highlighting limitations not evident in single-turn tests.
arXiv Detail & Related papers (2025-05-23T09:37:52Z) - EcomScriptBench: A Multi-task Benchmark for E-commerce Script Planning via Step-wise Intention-Driven Product Association [83.4879773429742]
This paper defines the task of E-commerce Script Planning (EcomScript) as three sequential subtasks.<n>We propose a novel framework that enables the scalable generation of product-enriched scripts by associating products with each step.<n>We construct the very first large-scale EcomScript dataset, EcomScriptBench, which includes 605,229 scripts sourced from 2.4 million products.
arXiv Detail & Related papers (2025-05-21T07:21:38Z) - DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing [10.712756715779822]
Large Language Models (LLMs) have shown promise in data processing.<n>These frameworks focus on reducing cost when executing user-specified operations.<n>This is problematic for complex tasks and data.<n>We present DocETL, a system that optimize complex document processing pipelines.
arXiv Detail & Related papers (2024-10-16T03:22:35Z) - ScriptSmith: A Unified LLM Framework for Enhancing IT Operations via Automated Bash Script Generation, Assessment, and Refinement [3.685819758139424]
This paper presents an innovative approach to action automation using large language models (LLMs) for script generation, assessment, and refinement.
Our experiments focus on Bash scripts, a commonly used tool in SRE, and involve the CodeSift dataset of 100 tasks and the InterCode dataset of 153 tasks.
Results demonstrate that the framework shows an overall improvement of 7-10% in script generation.
arXiv Detail & Related papers (2024-09-12T15:11:43Z) - DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph [70.79413606968814]
We introduce Dynamic Evaluation of LLMs via Adaptive Reasoning Graph Evolvement (DARG) to dynamically extend current benchmarks with controlled complexity and diversity.
Specifically, we first extract the reasoning graphs of data points in current benchmarks and then perturb the reasoning graphs to generate novel testing data.
Such newly generated test samples can have different levels of complexity while maintaining linguistic diversity similar to the original benchmarks.
arXiv Detail & Related papers (2024-06-25T04:27:53Z) - proScript: Partially Ordered Scripts Generation via Pre-trained Language
Models [49.03193243699244]
We demonstrate for the first time that pre-trained neural language models (LMs) can be finetuned to generate high-quality scripts.
We collected a large (6.4k), crowdsourced partially ordered scripts (named proScript)
Our experiments show that our models perform well (e.g., F1=75.7 in task (i)), illustrating a new approach to overcoming previous barriers to script collection.
arXiv Detail & Related papers (2021-04-16T17:35:10Z) - Conversational Question Reformulation via Sequence-to-Sequence
Architectures and Pretrained Language Models [56.268862325167575]
This paper presents an empirical study of conversational question reformulation (CQR) with sequence-to-sequence architectures and pretrained language models (PLMs)
We leverage PLMs to address the strong token-to-token independence assumption made in the common objective, maximum likelihood estimation, for the CQR task.
We evaluate fine-tuned PLMs on the recently-introduced CANARD dataset as an in-domain task and validate the models using data from the TREC 2019 CAsT Track as an out-domain task.
arXiv Detail & Related papers (2020-04-04T11:07:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.