Quantum Knowledge Distillation for Large Language Models
- URL: http://arxiv.org/abs/2505.13205v2
- Date: Fri, 01 Aug 2025 06:53:55 GMT
- Title: Quantum Knowledge Distillation for Large Language Models
- Authors: Lingxiao Li, Yihao Wang, Jiacheng Fan, Jing Li, Sujuan Qin, Qiaoyan Wen, Fei Gao,
- Abstract summary: We propose a Quantum knowledge Distillation model for Large Language Models (QD-LLM)<n>In classical simulation, QD-LLM outperforms several mainstream distillation methods on multiple text classification tasks.<n>We deploy the obtained circuits on the Baihua superconducting quantum processor via the Quafu platform to assess practical feasibility.
- Score: 10.023534560183919
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As foundational tools in natural language processing, Large Language Models (LLMs) have immense parameter scales, which makes deployment and inference increasingly prohibitive, especially in resource-constrained devices. Therefore, knowledge distillation for LLMs, i.e., compressing the LLM to a smaller model, is meaningful. With strong parameter representation capacity, quantum computing is regarded as a promising solution. Here, we propose a Quantum knowledge Distillation model for LLMs (QD-LLM) that leverages variational quantum circuits to learn from LLMs. In classical simulation, QD-LLM outperforms several mainstream distillation methods on multiple text classification tasks in terms of both accuracy and efficiency using only 11 qubits. The results reveal an interesting phenomenon that the simulation of quantum student models may be regarded as a new class of quantum-inspired classical algorithms. Remarkably, we deploy the obtained circuits on the Baihua superconducting quantum processor via the Quafu platform to assess practical feasibility. The model maintains stable inference performance despite hardware constraints such as decoherence and finite sampling. In summary, QD-LLM marks a foundational step in connecting quantum computing with LLMs, demonstrating the feasibility of quantum-native approaches that aim to compress and deploy models of increasingly larger scales. The code of this article has been open-sourced at https://github.com/Lilingxiao-bupt/QD-LLM.
Related papers
- VQC-MLPNet: An Unconventional Hybrid Quantum-Classical Architecture for Scalable and Robust Quantum Machine Learning [60.996803677584424]
Variational Quantum Circuits (VQCs) offer a novel pathway for quantum machine learning.<n>Their practical application is hindered by inherent limitations such as constrained linear expressivity, optimization challenges, and acute sensitivity to quantum hardware noise.<n>This work introduces VQC-MLPNet, a scalable and robust hybrid quantum-classical architecture designed to overcome these obstacles.
arXiv Detail & Related papers (2025-06-12T01:38:15Z) - Q-Fusion: Diffusing Quantum Circuits [2.348041867134616]
We propose a diffusion-based algorithm leveraging the LayerDAG framework to generate new quantum circuits.<n>Our results demonstrate that the proposed model consistently generates 100% valid quantum circuit outputs.
arXiv Detail & Related papers (2025-04-29T14:10:10Z) - Quantizing Large Language Models for Code Generation: A Differentiated Replication [51.85505914274633]
Large Language Models (LLMs) have shown an impressive capability in code generation and, specifically, to automatically implement requirements described in natural language.<n>LLMs pose significant challenges related to their memory (and, consequently, carbon) footprint.<n>New frontier for LLM quantization is 4-bit precision, resulting in an average memory footprint reduction of 70%.
arXiv Detail & Related papers (2025-03-10T09:26:08Z) - Learning to Measure Quantum Neural Networks [10.617463958884528]
We introduce a novel approach that makes the observable of the quantum system-specifically, the Hermitian matrix-learnable.<n>Our method features an end-to-end differentiable learning framework, where the parameterized observable is trained alongside the ordinary quantum circuit parameters.<n>Using numerical simulations, we show that the proposed method can identify observables for variational quantum circuits that lead to improved outcomes.
arXiv Detail & Related papers (2025-01-10T02:28:19Z) - A learning agent-based approach to the characterization of open quantum systems [0.0]
We introduce the open Quantum Model Learning Agent (oQMLA) framework to account for Markovian noise through the Liouvillian formalism.<n>By simultaneously learning the Hamiltonian and jump operators, oQMLA independently captures both the coherent and incoherent dynamics of a system.<n>We validate our implementation in simulated scenarios of increasing complexity, demonstrating its robustness to hardware-induced measurement errors.
arXiv Detail & Related papers (2025-01-09T16:25:17Z) - Quantum Kernel-Based Long Short-term Memory [0.30723404270319693]
We introduce the Quantum Kernel-Based Long Short-Term Memory (QK-LSTM) network to capture complex, non-linear patterns in sequential data.
This quantum-enhanced architecture demonstrates efficient convergence, robust loss minimization, and model compactness.
Benchmark comparisons reveal that QK-LSTM achieves performance on par with classical LSTM models, yet with fewer parameters.
arXiv Detail & Related papers (2024-11-20T11:39:30Z) - Leveraging Pre-Trained Neural Networks to Enhance Machine Learning with Variational Quantum Circuits [48.33631905972908]
We introduce an innovative approach that utilizes pre-trained neural networks to enhance Variational Quantum Circuits (VQC)
This technique effectively separates approximation error from qubit count and removes the need for restrictive conditions.
Our results extend to applications such as human genome analysis, demonstrating the broad applicability of our approach.
arXiv Detail & Related papers (2024-11-13T12:03:39Z) - Learning Density Functionals from Noisy Quantum Data [0.0]
noisy intermediate-scale quantum (NISQ) devices are used to generate training data for machine learning (ML) models.
We show that a neural-network ML model can successfully generalize from small datasets subject to noise typical of NISQ algorithms.
Our findings suggest a promising pathway for leveraging NISQ devices in practical quantum simulations.
arXiv Detail & Related papers (2024-09-04T17:59:55Z) - Designing Large Foundation Models for Efficient Training and Inference: A Survey [35.40505841618305]
This paper focuses on modern efficient training and inference technologies on foundation models.<n>Model and System Design optimize LLM training and inference from different aspects to save computational resources.
arXiv Detail & Related papers (2024-09-03T15:35:01Z) - SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models [67.67135738642547]
Post-training quantization (PTQ) is a powerful compression technique investigated in large language models (LLMs)
Existing PTQ methods are not ideal in terms of accuracy and efficiency, especially with below 4 bit-widths.
This paper presents a Salience-Driven Mixed-Precision Quantization scheme for LLMs, namely SliM-LLM.
arXiv Detail & Related papers (2024-05-23T16:21:48Z) - Feature Importance and Explainability in Quantum Machine Learning [0.0]
Many Machine Learning (ML) models are referred to as black box models, providing no real insights into why a prediction is made.
This article explores feature importance and explainability in Quantum Machine Learning (QML) compared to Classical ML models.
arXiv Detail & Related papers (2024-05-14T19:12:32Z) - LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit [55.73370804397226]
Quantization, a key compression technique, can effectively mitigate these demands by compressing and accelerating large language models.
We present LLMC, a plug-and-play compression toolkit, to fairly and systematically explore the impact of quantization.
Powered by this versatile toolkit, our benchmark covers three key aspects: calibration data, algorithms (three strategies), and data formats.
arXiv Detail & Related papers (2024-05-09T11:49:05Z) - QKSAN: A Quantum Kernel Self-Attention Network [53.96779043113156]
A Quantum Kernel Self-Attention Mechanism (QKSAM) is introduced to combine the data representation merit of Quantum Kernel Methods (QKM) with the efficient information extraction capability of SAM.
A Quantum Kernel Self-Attention Network (QKSAN) framework is proposed based on QKSAM, which ingeniously incorporates the Deferred Measurement Principle (DMP) and conditional measurement techniques.
Four QKSAN sub-models are deployed on PennyLane and IBM Qiskit platforms to perform binary classification on MNIST and Fashion MNIST.
arXiv Detail & Related papers (2023-08-25T15:08:19Z) - OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models [57.27101446992148]
Large language models (LLMs) have revolutionized natural language processing tasks.
Recent post-training quantization (PTQ) methods are effective in reducing memory footprint and improving the computational efficiency of LLM.
We introduce an Omnidirectionally calibrated Quantization technique for LLMs, which achieves good performance in diverse quantization settings.
arXiv Detail & Related papers (2023-08-25T02:28:35Z) - Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM
Inference with Transferable Prompt [96.24800696597707]
We introduce a new perspective to optimize this trade-off by prompting compressed models.
We propose a soft prompt learning method where we expose the compressed model to the prompt learning process.
Our experimental analysis suggests our soft prompt strategy greatly improves the performance of the 8x compressed LLaMA-7B model.
arXiv Detail & Related papers (2023-05-17T20:45:13Z) - Explaining Quantum Circuits with Shapley Values: Towards Explainable Quantum Machine Learning [1.0984331138780683]
Methods of artificial intelligence (AI) and especially machine learning (ML) have been growing ever more complex, and at the same time have more and more impact on people's lives.<n>In parallel, quantum machine learning (QML) is emerging with the ongoing improvement of quantum computing hardware combined with its increasing availability via cloud services.<n>QML enables quantum-enhanced ML in which quantum mechanics is exploited to facilitate ML tasks, typically in the form of quantum-classical hybrid algorithms that combine quantum and classical resources.
arXiv Detail & Related papers (2023-01-22T15:17:12Z) - QSAN: A Near-term Achievable Quantum Self-Attention Network [73.15524926159702]
Self-Attention Mechanism (SAM) is good at capturing the internal connections of features.
A novel Quantum Self-Attention Network (QSAN) is proposed for image classification tasks on near-term quantum devices.
arXiv Detail & Related papers (2022-07-14T12:22:51Z) - Quantum-tailored machine-learning characterization of a superconducting
qubit [50.591267188664666]
We develop an approach to characterize the dynamics of a quantum device and learn device parameters.
This approach outperforms physics-agnostic recurrent neural networks trained on numerically generated and experimental data.
This demonstration shows how leveraging domain knowledge improves the accuracy and efficiency of this characterization task.
arXiv Detail & Related papers (2021-06-24T15:58:57Z) - Quantum Federated Learning with Quantum Data [87.49715898878858]
Quantum machine learning (QML) has emerged as a promising field that leans on the developments in quantum computing to explore large complex machine learning problems.
This paper proposes the first fully quantum federated learning framework that can operate over quantum data and, thus, share the learning of quantum circuit parameters in a decentralized manner.
arXiv Detail & Related papers (2021-05-30T12:19:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.