Related papers: DarwinWafer: A Wafer-Scale Neuromorphic Chip

DarwinWafer: A Wafer-Scale Neuromorphic Chip

URL: http://arxiv.org/abs/2509.16213v1
Date: Sat, 30 Aug 2025 00:22:09 GMT
Title: DarwinWafer: A Wafer-Scale Neuromorphic Chip
Authors: Xiaolei Zhu, Xiaofei Jin, Ziyang Kang, Chonghui Sun, Junjie Feng, Dingwen Hu, Zengyi Wang, Hanyue Zhuang, Qian Zheng, Huajin Tang, Shi Gu, Xin Du, De Ma, Gang Pan,
Abstract summary: We present a hyperscale system-on-wafer that replaces off-chip interconnects with wafer-scale, high-density integration of 64 Darwin3 chiplets on a 300 mm silicon interposer.<n>A GALS NoC within each chiplet and an AER-based asynchronous wafer fabric with hierarchical time-step synchronization provide low-latency, coherent operation across the wafer.<n>DarwinWafer consumes 100 W and achieves 4.9 pJ/SOP, with 64 TSOPS peak throughput (0.64 TSOPS/W)
Score: 43.876109856399886
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neuromorphic computing promises brain-like efficiency, yet today's multi-chip systems scale over PCBs and incur orders-of-magnitude penalties in bandwidth, latency, and energy, undermining biological algorithms and system efficiency. We present DarwinWafer, a hyperscale system-on-wafer that replaces off-chip interconnects with wafer-scale, high-density integration of 64 Darwin3 chiplets on a 300 mm silicon interposer. A GALS NoC within each chiplet and an AER-based asynchronous wafer fabric with hierarchical time-step synchronization provide low-latency, coherent operation across the wafer. Each chiplet implements 2.35 M neurons and 0.1 B synapses, yielding 0.15 B neurons and 6.4 B synapses per wafer.At 333 MHz and 0.8 V, DarwinWafer consumes ~100 W and achieves 4.9 pJ/SOP, with 64 TSOPS peak throughput (0.64 TSOPS/W). Realization is enabled by a holistic chiplet-interposer co-design flow (including an in-house interposer-bump planner with early SI/PI and electro-thermal closure) and a warpage-tolerant assembly that fans out I/O via PCBlets and compliant pogo-pin connections, enabling robust, demountable wafer-to-board integration. Measurements confirm 10 mV supply droop and a uniform thermal profile (34-36 {\deg}C) under ~100 W. Application studies demonstrate whole-brain simulations: two zebrafish brains per chiplet with high connectivity fidelity (Spearman r = 0.896) and a mouse brain mapped across 32 chiplets (r = 0.645). To our knowledge, DarwinWafer represents a pioneering demonstration of wafer-scale neuromorphic computing, establishing a viable and scalable path toward large-scale, brain-like computation on silicon by replacing PCB-level interconnects with high-density, on-wafer integration.

Related papers

STEP3-VL-10B Technical Report [115.89015065130127]
STEP3-VL-10B is a lightweight foundation model designed to redefine the trade-off between compact efficiency and frontier-level multimodal intelligence.<n>We implement Parallel Coordinated Reasoning (PaCoRe) to scale test-time compute, allocating resources to scalable perceptual reasoning.<n>It records 92.2% on MMBench and 80.11% on MMMU, while excelling in complex reasoning with 94.43% on AIME2025 and 75.95% on MathVision.
arXiv Detail & Related papers (2026-01-14T17:58:24Z)
Hardware-Efficient Bosonic Module for Entangling Superconducting Quantum Processors via Optical Networks [5.181101976838568]
Microwave-to-optical (M2O) transducers pose challenges due to frequency mismatches and qubit decoherence.<n>We propose a modular architecture using SNAIL-based parametric coupling to interface Brillouin M2O transducers with long-lived 3D cavities.<n>Our cavity-based approach outperforms transmon schemes, providing a practical pathway for distributed superconducting quantum computing.
arXiv Detail & Related papers (2025-11-13T15:27:57Z)
Ultracoherent superconducting cavity-based multiqudit platform with error-resilient control [27.481637966670217]
Superconducting radio-frequency (SRF) cavities offer a promising platform for quantum computing.<n>We report a multimode quantum system based on a 2-cell elliptical shaped SRF cavity, comprising two cavity modes weakly coupled to an ancillary transmon circuit.<n>We achieve single-photon lifetimes of 20.6 ms and 15.6 ms for the two modes, and a pure dephasing time exceeding 40 ms.
arXiv Detail & Related papers (2025-06-03T18:18:46Z)
Performance Characterization of a Multi-Module Quantum Processor with Static Inter-Chip Couplers [63.42120407991982]
Three-dimensional integration technologies such as flip-chip bonding are a key prerequisite to realize large-scale superconducting quantum processors.<n>We present a design for a multi-chip module comprising one carrier chip and four qubit modules.<n>Measuring two of the qubits, we analyze the readout performance, finding a mean three-level state-assignment error of $9 times 10-3$ in 200 ns.<n>We demonstrate a controlled-Z two-qubit gate in 100 ns with an error of $7 times 10-3$ extracted from interleaved randomized benchmarking.
arXiv Detail & Related papers (2025-03-16T18:32:44Z)
Joint Transmit and Pinching Beamforming for Pinching Antenna Systems (PASS): Optimization-Based or Learning-Based? [89.05848771674773]
A novel antenna system ()-enabled downlink multi-user multiple-input single-output (MISO) framework is proposed.<n>It consists of multiple waveguides, which equip numerous low-cost antennas, named (PAs)<n>The positions of PAs can be reconfigured to both spanning large-scale path and space.
arXiv Detail & Related papers (2025-02-12T18:54:10Z)
InfiniteHBD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers [37.89954553921228]
High-Bandwidth Domains (HBDs) are critical for communication-intensive parallelism like Parallelism.<n>Switch-centric HBDs incur prohibitive scaling costs, while GPU-centric HBDs suffer from severe fault propagation.<n>We propose InfiniteHBD, a transceiver-centric HBD architecture that integrates connectivity and dynamic switching at the transceiver level.
arXiv Detail & Related papers (2025-02-06T09:01:24Z)
Mem-elements based Neuromorphic Hardware for Neural Network Application [0.0]
The thesis investigates the utilization of memristive and memcapacitive crossbar arrays in low-power machine learning accelerators, offering a comprehensive co-design framework for deep neural networks (DNN) The model, implemented through a hybrid Python and PyTorch approach, accounts for various non-idealities, achieving exceptional training accuracies of 90.02% and 91.03% for the CIFAR-10 dataset with memristive and memcapacitive crossbar arrays on an 8-layer VGG network.
arXiv Detail & Related papers (2024-03-05T14:28:40Z)
Scaling Up 3D Kernels with Bayesian Frequency Re-parameterization for Medical Image Segmentation [25.62587471067468]
RepUX-Net is a pure CNN architecture with a simple large kernel block design. Inspired by the spatial frequency in the human visual system, we extend to vary the kernel convergence into element-wise setting.
arXiv Detail & Related papers (2023-03-10T08:38:34Z)
Fuzzy temporal convolutional neural networks in P300-based Brain-computer interface for smart home interaction [3.726817037277484]
EEG patterns exhibit high variability across time and uncertainty due to noise. It is a significant problem to be addressed in P300-based Brain Computer Interface for smart home interaction. We propose a sequential unification of temporal convolutional networks (TCNs) modified to EEG signals, LSTM cells, with a fuzzy neural block (FNB)
arXiv Detail & Related papers (2022-04-09T00:35:35Z)
Improving Efficiency in Large-Scale Decentralized Distributed Training [58.80224380923698]
We propose techniques to accelerate (A)D-PSGD based training by improving the spectral gap while minimizing the communication cost. We demonstrate the effectiveness of our proposed techniques by running experiments on the 2000-hour Switchboard speech recognition task and the ImageNet computer vision task.
arXiv Detail & Related papers (2020-02-04T04:29:09Z)
Near-degenerate quadrature-squeezed vacuum generation on a silicon-nitride chip [54.87128096861778]
In this Letter, we demonstrate the generation of quadrature-phase squeezed states in the radio-frequency carrier sideband using a small-footprint silicon-nitride microresonator with a dual-pumped four-wave-mixing process. It is critical to account for the nonlinear behavior of the pump fields to properly predict the squeezing that can be generated in this system.
arXiv Detail & Related papers (2020-02-04T01:41:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.