QFFN-BERT: An Empirical Study of Depth, Performance, and Data Efficiency in Hybrid Quantum-Classical Transformers
- URL: http://arxiv.org/abs/2507.02364v1
- Date: Thu, 03 Jul 2025 06:52:44 GMT
- Title: QFFN-BERT: An Empirical Study of Depth, Performance, and Data Efficiency in Hybrid Quantum-Classical Transformers
- Authors: Pilsung Kang,
- Abstract summary: pediaized quantum circuits (PQCs) have emerged as promising components for enhancing the expressibility of neural architectures.<n>We introduce QFFN-BERT, a hybrid quantum-classical transformer where the feedforward network (FFN) modules of a compact BERT variant are replaced by PQC-based layers.
- Score: 4.309517184057254
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Parameterized quantum circuits (PQCs) have recently emerged as promising components for enhancing the expressibility of neural architectures. In this work, we introduce QFFN-BERT, a hybrid quantum-classical transformer where the feedforward network (FFN) modules of a compact BERT variant are replaced by PQC-based layers. This design is motivated by the dominant parameter contribution of FFNs, which account for approximately two-thirds of the parameters within standard Transformer encoder blocks. While prior studies have primarily integrated PQCs into self-attention modules, our work focuses on the FFN and systematically investigates the trade-offs between PQC depth, expressibility, and trainability. Our final PQC architecture incorporates a residual connection, both $R_Y$ and $R_Z$ rotations, and an alternating entanglement strategy to ensure stable training and high expressibility. Our experiments, conducted on a classical simulator, on the SST-2 and DBpedia benchmarks demonstrate two key findings. First, a carefully configured QFFN-BERT achieves up to 102.0% of the baseline accuracy, surpassing its classical counterpart in a full-data setting while reducing FFN-specific parameters by over 99%. Second, our model exhibits a consistent and competitive edge in few-shot learning scenarios, confirming its potential for superior data efficiency. These results, supported by an ablation study on a non-optimized PQC that failed to learn, confirm that PQCs can serve as powerful and parameter-efficient alternatives to classical FFNs when co-designed with foundational deep learning principles.
Related papers
- TensoMeta-VQC: A Tensor-Train-Guided Meta-Learning Framework for Robust and Scalable Variational Quantum Computing [60.996803677584424]
TensoMeta-VQC is a novel tensor-train (TT)-guided meta-learning framework designed to improve the robustness and scalability of VQC significantly.<n>Our framework fully delegates the generation of quantum circuit parameters to a classical TT network, effectively decoupling optimization from quantum hardware.
arXiv Detail & Related papers (2025-08-01T23:37:55Z) - AdeptHEQ-FL: Adaptive Homomorphic Encryption for Federated Learning of Hybrid Classical-Quantum Models with Dynamic Layer Sparing [0.0]
Federated Learning (FL) faces inherent challenges in balancing model performance, privacy preservation, and communication efficiency.<n>We introduce AdeptHEQ-FL, a unified hybrid classical-quantum FL framework that integrates a CNN-PQC architecture for expressive decentralized learning.<n>We establish formal privacy guarantees, provide convergence analysis, and conduct extensive experiments on the CIFAR-10, SVHN, and Fashion-MNIST datasets.
arXiv Detail & Related papers (2025-07-09T22:29:02Z) - HQCC: A Hybrid Quantum-Classical Classifier with Adaptive Structure [7.836610894905161]
We propose a Hybrid Quantum-Classical (HQCC) to advance Quantum Machine Learning (QML)<n>HQCC adaptively optimize the Quantum Circuits (PQCs) through a Long ShortTerm Memory (LSTM) driven dynamic circuit generator.<n>We run simulations on the MNIST and Fashion MNIST datasets, achieving up to 97.12% accuracy.
arXiv Detail & Related papers (2025-04-02T22:49:00Z) - ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.<n>Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning [89.04776523010409]
This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics.
In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward mapping.
We establish the first convergence analysis with provable generalization guarantees for SF-DQN with GPI.
arXiv Detail & Related papers (2024-05-24T20:30:14Z) - Real-Time Adaptive Safety-Critical Control with Gaussian Processes in
High-Order Uncertain Models [14.790031018404942]
This paper presents an adaptive online learning framework for systems with uncertain parameters.
We first integrate a forgetting factor to refine a variational sparse GP algorithm.
In the second phase, we propose a safety filter based on high-order control barrier functions.
arXiv Detail & Related papers (2024-02-29T08:25:32Z) - Federated Quantum Long Short-term Memory (FedQLSTM) [58.50321380769256]
Quantum federated learning (QFL) can facilitate collaborative learning across multiple clients using quantum machine learning (QML) models.
No prior work has focused on developing a QFL framework that utilizes temporal data to approximate functions.
A novel QFL framework that is the first to integrate quantum long short-term memory (QLSTM) models with temporal data is proposed.
arXiv Detail & Related papers (2023-12-21T21:40:47Z) - Federated Conformal Predictors for Distributed Uncertainty
Quantification [83.50609351513886]
Conformal prediction is emerging as a popular paradigm for providing rigorous uncertainty quantification in machine learning.
In this paper, we extend conformal prediction to the federated learning setting.
We propose a weaker notion of partial exchangeability, better suited to the FL setting, and use it to develop the Federated Conformal Prediction framework.
arXiv Detail & Related papers (2023-05-27T19:57:27Z) - When BERT Meets Quantum Temporal Convolution Learning for Text
Classification in Heterogeneous Computing [75.75419308975746]
This work proposes a vertical federated learning architecture based on variational quantum circuits to demonstrate the competitive performance of a quantum-enhanced pre-trained BERT model for text classification.
Our experiments on intent classification show that our proposed BERT-QTC model attains competitive experimental results in the Snips and ATIS spoken language datasets.
arXiv Detail & Related papers (2022-02-17T09:55:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.