DarwinWafer: A Wafer-Scale Neuromorphic Chip
- URL: http://arxiv.org/abs/2509.16213v1
- Date: Sat, 30 Aug 2025 00:22:09 GMT
- Title: DarwinWafer: A Wafer-Scale Neuromorphic Chip
- Authors: Xiaolei Zhu, Xiaofei Jin, Ziyang Kang, Chonghui Sun, Junjie Feng, Dingwen Hu, Zengyi Wang, Hanyue Zhuang, Qian Zheng, Huajin Tang, Shi Gu, Xin Du, De Ma, Gang Pan,
- Abstract summary: We present a hyperscale system-on-wafer that replaces off-chip interconnects with wafer-scale, high-density integration of 64 Darwin3 chiplets on a 300 mm silicon interposer.<n>A GALS NoC within each chiplet and an AER-based asynchronous wafer fabric with hierarchical time-step synchronization provide low-latency, coherent operation across the wafer.<n>DarwinWafer consumes 100 W and achieves 4.9 pJ/SOP, with 64 TSOPS peak throughput (0.64 TSOPS/W)
- Score: 43.876109856399886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neuromorphic computing promises brain-like efficiency, yet today's multi-chip systems scale over PCBs and incur orders-of-magnitude penalties in bandwidth, latency, and energy, undermining biological algorithms and system efficiency. We present DarwinWafer, a hyperscale system-on-wafer that replaces off-chip interconnects with wafer-scale, high-density integration of 64 Darwin3 chiplets on a 300 mm silicon interposer. A GALS NoC within each chiplet and an AER-based asynchronous wafer fabric with hierarchical time-step synchronization provide low-latency, coherent operation across the wafer. Each chiplet implements 2.35 M neurons and 0.1 B synapses, yielding 0.15 B neurons and 6.4 B synapses per wafer.At 333 MHz and 0.8 V, DarwinWafer consumes ~100 W and achieves 4.9 pJ/SOP, with 64 TSOPS peak throughput (0.64 TSOPS/W). Realization is enabled by a holistic chiplet-interposer co-design flow (including an in-house interposer-bump planner with early SI/PI and electro-thermal closure) and a warpage-tolerant assembly that fans out I/O via PCBlets and compliant pogo-pin connections, enabling robust, demountable wafer-to-board integration. Measurements confirm 10 mV supply droop and a uniform thermal profile (34-36 {\deg}C) under ~100 W. Application studies demonstrate whole-brain simulations: two zebrafish brains per chiplet with high connectivity fidelity (Spearman r = 0.896) and a mouse brain mapped across 32 chiplets (r = 0.645). To our knowledge, DarwinWafer represents a pioneering demonstration of wafer-scale neuromorphic computing, establishing a viable and scalable path toward large-scale, brain-like computation on silicon by replacing PCB-level interconnects with high-density, on-wafer integration.
Related papers
- STEP3-VL-10B Technical Report [115.89015065130127]
STEP3-VL-10B is a lightweight foundation model designed to redefine the trade-off between compact efficiency and frontier-level multimodal intelligence.<n>We implement Parallel Coordinated Reasoning (PaCoRe) to scale test-time compute, allocating resources to scalable perceptual reasoning.<n>It records 92.2% on MMBench and 80.11% on MMMU, while excelling in complex reasoning with 94.43% on AIME2025 and 75.95% on MathVision.
arXiv Detail & Related papers (2026-01-14T17:58:24Z) - Hardware-Efficient Bosonic Module for Entangling Superconducting Quantum Processors via Optical Networks [5.181101976838568]
Microwave-to-optical (M2O) transducers pose challenges due to frequency mismatches and qubit decoherence.<n>We propose a modular architecture using SNAIL-based parametric coupling to interface Brillouin M2O transducers with long-lived 3D cavities.<n>Our cavity-based approach outperforms transmon schemes, providing a practical pathway for distributed superconducting quantum computing.
arXiv Detail & Related papers (2025-11-13T15:27:57Z) - Ultracoherent superconducting cavity-based multiqudit platform with error-resilient control [27.481637966670217]
Superconducting radio-frequency (SRF) cavities offer a promising platform for quantum computing.<n>We report a multimode quantum system based on a 2-cell elliptical shaped SRF cavity, comprising two cavity modes weakly coupled to an ancillary transmon circuit.<n>We achieve single-photon lifetimes of 20.6 ms and 15.6 ms for the two modes, and a pure dephasing time exceeding 40 ms.
arXiv Detail & Related papers (2025-06-03T18:18:46Z) - Performance Characterization of a Multi-Module Quantum Processor with Static Inter-Chip Couplers [63.42120407991982]
Three-dimensional integration technologies such as flip-chip bonding are a key prerequisite to realize large-scale superconducting quantum processors.<n>We present a design for a multi-chip module comprising one carrier chip and four qubit modules.<n>Measuring two of the qubits, we analyze the readout performance, finding a mean three-level state-assignment error of $9 times 10-3$ in 200 ns.<n>We demonstrate a controlled-Z two-qubit gate in 100 ns with an error of $7 times 10-3$ extracted from interleaved randomized benchmarking.
arXiv Detail & Related papers (2025-03-16T18:32:44Z) - Joint Transmit and Pinching Beamforming for Pinching Antenna Systems (PASS): Optimization-Based or Learning-Based? [89.05848771674773]
A novel antenna system ()-enabled downlink multi-user multiple-input single-output (MISO) framework is proposed.<n>It consists of multiple waveguides, which equip numerous low-cost antennas, named (PAs)<n>The positions of PAs can be reconfigured to both spanning large-scale path and space.
arXiv Detail & Related papers (2025-02-12T18:54:10Z) - InfiniteHBD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers [37.89954553921228]
High-Bandwidth Domains (HBDs) are critical for communication-intensive parallelism like Parallelism.<n>Switch-centric HBDs incur prohibitive scaling costs, while GPU-centric HBDs suffer from severe fault propagation.<n>We propose InfiniteHBD, a transceiver-centric HBD architecture that integrates connectivity and dynamic switching at the transceiver level.
arXiv Detail & Related papers (2025-02-06T09:01:24Z) - Mem-elements based Neuromorphic Hardware for Neural Network Application [0.0]
The thesis investigates the utilization of memristive and memcapacitive crossbar arrays in low-power machine learning accelerators, offering a comprehensive co-design framework for deep neural networks (DNN)
The model, implemented through a hybrid Python and PyTorch approach, accounts for various non-idealities, achieving exceptional training accuracies of 90.02% and 91.03% for the CIFAR-10 dataset with memristive and memcapacitive crossbar arrays on an 8-layer VGG network.
arXiv Detail & Related papers (2024-03-05T14:28:40Z) - Scaling Up 3D Kernels with Bayesian Frequency Re-parameterization for
Medical Image Segmentation [25.62587471067468]
RepUX-Net is a pure CNN architecture with a simple large kernel block design.
Inspired by the spatial frequency in the human visual system, we extend to vary the kernel convergence into element-wise setting.
arXiv Detail & Related papers (2023-03-10T08:38:34Z) - Fuzzy temporal convolutional neural networks in P300-based
Brain-computer interface for smart home interaction [3.726817037277484]
EEG patterns exhibit high variability across time and uncertainty due to noise.
It is a significant problem to be addressed in P300-based Brain Computer Interface for smart home interaction.
We propose a sequential unification of temporal convolutional networks (TCNs) modified to EEG signals, LSTM cells, with a fuzzy neural block (FNB)
arXiv Detail & Related papers (2022-04-09T00:35:35Z) - Improving Efficiency in Large-Scale Decentralized Distributed Training [58.80224380923698]
We propose techniques to accelerate (A)D-PSGD based training by improving the spectral gap while minimizing the communication cost.
We demonstrate the effectiveness of our proposed techniques by running experiments on the 2000-hour Switchboard speech recognition task and the ImageNet computer vision task.
arXiv Detail & Related papers (2020-02-04T04:29:09Z) - Near-degenerate quadrature-squeezed vacuum generation on a
silicon-nitride chip [54.87128096861778]
In this Letter, we demonstrate the generation of quadrature-phase squeezed states in the radio-frequency carrier sideband using a small-footprint silicon-nitride microresonator with a dual-pumped four-wave-mixing process.
It is critical to account for the nonlinear behavior of the pump fields to properly predict the squeezing that can be generated in this system.
arXiv Detail & Related papers (2020-02-04T01:41:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.