Related papers: AMSbench: A Comprehensive Benchmark for Evaluating MLLM Capabilities in AMS Circuits

AMSbench: A Comprehensive Benchmark for Evaluating MLLM Capabilities in AMS Circuits

URL: http://arxiv.org/abs/2505.24138v1
Date: Fri, 30 May 2025 02:17:45 GMT
Title: AMSbench: A Comprehensive Benchmark for Evaluating MLLM Capabilities in AMS Circuits
Authors: Yichen Shi, Ze Zhang, Hongyang Wang, Zhuofu Tao, Zhongyi Li, Bingyu Chen, Yaxin Wang, Zhiping Yu, Ting-Jung Lin, Lei He,
Abstract summary: Automated Analog/Mixed-Signal (AMS) circuit design has remained a longstanding challenge due to its difficulty and complexity.<n>Recent advances in Multi-modal Large Language Models (MLLMs) offer promising potential for supporting AMS circuit analysis and design.<n>We introduce AMSbench, a benchmark suite designed to evaluate MLLM performance across critical tasks including circuit schematic perception, circuit analysis, and circuit design.
Score: 11.372367666471442
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Analog/Mixed-Signal (AMS) circuits play a critical role in the integrated circuit (IC) industry. However, automating Analog/Mixed-Signal (AMS) circuit design has remained a longstanding challenge due to its difficulty and complexity. Recent advances in Multi-modal Large Language Models (MLLMs) offer promising potential for supporting AMS circuit analysis and design. However, current research typically evaluates MLLMs on isolated tasks within the domain, lacking a comprehensive benchmark that systematically assesses model capabilities across diverse AMS-related challenges. To address this gap, we introduce AMSbench, a benchmark suite designed to evaluate MLLM performance across critical tasks including circuit schematic perception, circuit analysis, and circuit design. AMSbench comprises approximately 8000 test questions spanning multiple difficulty levels and assesses eight prominent models, encompassing both open-source and proprietary solutions such as Qwen 2.5-VL and Gemini 2.5 Pro. Our evaluation highlights significant limitations in current MLLMs, particularly in complex multi-modal reasoning and sophisticated circuit design tasks. These results underscore the necessity of advancing MLLMs' understanding and effective application of circuit-specific knowledge, thereby narrowing the existing performance gap relative to human expertise and moving toward fully automated AMS circuit design workflows. Our data is released at https://huggingface.co/datasets/wwhhyy/AMSBench

Related papers

MMCircuitEval: A Comprehensive Multimodal Circuit-Focused Benchmark for Evaluating LLMs [25.945493464645548]
multimodal large language models (MLLMs) present promising opportunities for automation and enhancement in Electronic Design Automation (EDA)<n>We introduce MMCircuitEval, the first multimodal benchmark specifically designed to assess MLLM performance across diverse EDA tasks.<n> MMCircuitEval comprises 3614 meticulously curated question-answer (QA) pairs spanning digital and analog circuits across critical EDA stages.
arXiv Detail & Related papers (2025-07-20T05:46:32Z)
Evaluating Large Language Models for Real-World Engineering Tasks [75.97299249823972]
This paper introduces a curated database comprising over 100 questions derived from authentic, production-oriented engineering scenarios.<n>Using this dataset, we evaluate four state-of-the-art Large Language Models (LLMs)<n>Our results show that LLMs demonstrate strengths in basic temporal and structural reasoning but struggle significantly with abstract reasoning, formal modeling, and context-sensitive engineering logic.
arXiv Detail & Related papers (2025-05-12T14:05:23Z)
LLM-based AI Agent for Sizing of Analog and Mixed Signal Circuit [2.979579757819132]
Large Language Models (LLMs) have demonstrated significant potential across various fields.<n>In this work, we propose an LLM-based AI agent for AMS circuit design to assist in the sizing process.
arXiv Detail & Related papers (2025-04-14T22:18:16Z)
Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness [61.87055159919641]
Multi-modal semantic segmentation (MMSS) addresses the limitations of single-modality data by integrating complementary information across modalities.<n>Despite notable progress, a significant gap persists between research and real-world deployment due to variability and uncertainty in multi-modal data quality.<n>We introduce a robustness benchmark that evaluates MMSS models under three scenarios: Entire-Missing Modality (EMM), Random-Missing Modality (RMM), and Noisy Modality (NM)
arXiv Detail & Related papers (2025-03-24T08:46:52Z)
Why Do Multi-Agent LLM Systems Fail? [91.39266556855513]
We present MAST (Multi-Agent System Failure taxonomy), the first empirically grounded taxonomy designed to understand MAS failures.<n>We analyze seven popular MAS frameworks across over 200 tasks, involving six expert human annotators.<n>We identify 14 unique failure modes, organized into 3 overarching categories, (i) specification issues, (ii) inter-agent misalignment, and (iii) task verification.
arXiv Detail & Related papers (2025-03-17T19:04:38Z)
EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents [57.4686961979566]
EmbodiedEval is a comprehensive and interactive evaluation benchmark for MLLMs with embodied tasks.<n>It covers a broad spectrum of existing embodied AI tasks with significantly enhanced diversity.<n>We evaluated the state-of-the-art MLLMs on EmbodiedEval and found that they have a significant shortfall compared to human level on embodied tasks.
arXiv Detail & Related papers (2025-01-21T03:22:10Z)
AMSnet-KG: A Netlist Dataset for LLM-based AMS Circuit Auto-Design Using Knowledge Graph RAG [15.61553255884534]
Large language models (LLMs) have emerged as powerful tools for Electronic Design Automation (EDA) applications. This paper introduces AMSnet-KG, a dataset encompassing various AMS circuit schematics and netlists. We propose an automated AMS circuit generation framework that utilizes the comprehensive knowledge embedded in LLMs.
arXiv Detail & Related papers (2024-11-07T02:49:53Z)
EEE-Bench: A Comprehensive Multimodal Electrical And Electronics Engineering Benchmark [10.265704144939503]
Large language models (LLMs) and large multimodal models (LMMs) have demonstrated promising skills in various domains including science and mathematics.<n>We propose EEE-Bench, a multimodal benchmark aimed at assessing LMMs' capabilities in solving practical engineering tasks.<n>Our benchmark consists of 2860 carefully curated problems spanning 10 essential such as analog circuits, control systems, etc.
arXiv Detail & Related papers (2024-11-03T09:17:56Z)
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.<n>We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.<n>Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z)
LLM4EDA: Emerging Progress in Large Language Models for Electronic Design Automation [74.7163199054881]
Large Language Models (LLMs) have demonstrated their capability in context understanding, logic reasoning and answer generation. We present a systematic study on the application of LLMs in the EDA field. We highlight the future research direction, focusing on applying LLMs in logic synthesis, physical design, multi-modal feature extraction and alignment of circuits.
arXiv Detail & Related papers (2023-12-28T15:09:14Z)
Analog/Mixed-Signal Circuit Synthesis Enabled by the Advancements of Circuit Architectures and Machine Learning Algorithms [0.0]
We will focus on using neural-network-based surrogate models to expedite the circuit design parameter search and layout iterations. Lastly, we will demonstrate the rapid synthesis of several AMS circuit examples from specification to silicon prototype, with significantly reduced human intervention.
arXiv Detail & Related papers (2021-12-15T01:47:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.