QuantumLLMInstruct: A 500k LLM Instruction-Tuning Dataset with Problem-Solution Pairs for Quantum Computing
- URL: http://arxiv.org/abs/2412.20956v1
- Date: Mon, 30 Dec 2024 13:53:51 GMT
- Title: QuantumLLMInstruct: A 500k LLM Instruction-Tuning Dataset with Problem-Solution Pairs for Quantum Computing
- Authors: Shlomo Kashani,
- Abstract summary: We present QuantumLLMInstruct (QLMMI), the largest and most comprehensive dataset of its kind.
QLMMI features over 500,000 meticulously curated instruction-following problem-solution pairs designed specifically for quantum computing.
- Score: 1.90365714903665
- License:
- Abstract: We present QuantumLLMInstruct (QLMMI), an innovative dataset featuring over 500,000 meticulously curated instruction-following problem-solution pairs designed specifically for quantum computing - the largest and most comprehensive dataset of its kind. Originating from over 90 primary seed domains and encompassing hundreds of subdomains autonomously generated by LLMs, QLMMI marks a transformative step in the diversity and richness of quantum computing datasets. Designed for instruction fine-tuning, QLMMI seeks to significantly improve LLM performance in addressing complex quantum computing challenges across a wide range of quantum physics topics. While Large Language Models (LLMs) have propelled advancements in computational science with datasets like Omni-MATH and OpenMathInstruct, these primarily target Olympiad-level mathematics, leaving quantum computing largely unexplored. The creation of QLMMI follows a rigorous four-stage methodology. Initially, foundational problems are developed using predefined templates, focusing on critical areas such as synthetic Hamiltonians, QASM code generation, Jordan-Wigner transformations, and Trotter-Suzuki quantum circuit decompositions. Next, detailed and domain-specific solutions are crafted to ensure accuracy and relevance. In the third stage, the dataset is enriched through advanced reasoning techniques, including Chain-of-Thought (CoT) and Task-Oriented Reasoning and Action (ToRA), which enhance problem-solution diversity while adhering to strict mathematical standards. Lastly, a zero-shot Judge LLM performs self-assessments to validate the dataset's quality and reliability, minimizing human oversight requirements.
Related papers
- Quantum Bayesian Networks for Machine Learning in Oil-Spill Detection [3.9554540293311864]
This paper introduces a novel Bayesian approach using Quantum Bayesian Networks (QBNs) to classify imbalanced datasets.
We effectively address the challenge of integrating quantum enhancements with classical machine learning architectures.
Our study demonstrates significant advances in detecting and classifying anomalies, contributing to more effective and precise environmental monitoring and management.
arXiv Detail & Related papers (2024-12-24T15:44:26Z) - QCircuitNet: A Large-Scale Hierarchical Dataset for Quantum Algorithm Design [17.747641494506087]
We introduce QCircuitNet, the first benchmark and test dataset designed to evaluate AI's capability in designing and implementing quantum algorithms.
Unlike using AI for writing traditional codes, this task is fundamentally different and significantly more complicated due to highly flexible design space and intricate manipulation of qubits.
arXiv Detail & Related papers (2024-10-10T14:24:30Z) - ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection [60.297079601066784]
We introduce ErrorRadar, the first benchmark designed to assess MLLMs' capabilities in error detection.
ErrorRadar evaluates two sub-tasks: error step identification and error categorization.
It consists of 2,500 high-quality multimodal K-12 mathematical problems, collected from real-world student interactions.
Results indicate significant challenges still remain, as GPT-4o with best performance is still around 10% behind human evaluation.
arXiv Detail & Related papers (2024-10-06T14:59:09Z) - Generalization Error Bound for Quantum Machine Learning in NISQ Era -- A Survey [37.69303106863453]
We conduct a Systematic Mapping Study (SMS) to explore the state-of-the-art generalization bound for supervised Quantum Machine Learning (QML) in the Noisy Intermediate-Scale Quantum (NISQ) era.
Our study systematically summarizes the existing computational platforms with quantum hardware, datasets, optimization techniques, and the common properties of the bounds found in the literature.
The SMS also highlights the limitations and challenges in QML in the NISQ era and discusses future research directions to advance the field.
arXiv Detail & Related papers (2024-09-11T21:17:30Z) - SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - Efficient Learning for Linear Properties of Bounded-Gate Quantum Circuits [63.733312560668274]
Given a quantum circuit containing d tunable RZ gates and G-d Clifford gates, can a learner perform purely classical inference to efficiently predict its linear properties?
We prove that the sample complexity scaling linearly in d is necessary and sufficient to achieve a small prediction error, while the corresponding computational complexity may scale exponentially in d.
We devise a kernel-based learning model capable of trading off prediction error and computational complexity, transitioning from exponential to scaling in many practical settings.
arXiv Detail & Related papers (2024-08-22T08:21:28Z) - Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative Models [1.8213213818713139]
We introduce and use the Qiskit HumanEval dataset to benchmark the ability of Large Language Models to produce quantum code.
This dataset consists of more than 100 quantum computing tasks, each accompanied by a prompt, a canonical solution, and a difficulty scale to evaluate the correctness of the generated solutions.
arXiv Detail & Related papers (2024-06-20T20:14:22Z) - Quantum algorithms: A survey of applications and end-to-end complexities [90.05272647148196]
The anticipated applications of quantum computers span across science and industry.
We present a survey of several potential application areas of quantum algorithms.
We outline the challenges and opportunities in each area in an "end-to-end" fashion.
arXiv Detail & Related papers (2023-10-04T17:53:55Z) - QKSAN: A Quantum Kernel Self-Attention Network [53.96779043113156]
A Quantum Kernel Self-Attention Mechanism (QKSAM) is introduced to combine the data representation merit of Quantum Kernel Methods (QKM) with the efficient information extraction capability of SAM.
A Quantum Kernel Self-Attention Network (QKSAN) framework is proposed based on QKSAM, which ingeniously incorporates the Deferred Measurement Principle (DMP) and conditional measurement techniques.
Four QKSAN sub-models are deployed on PennyLane and IBM Qiskit platforms to perform binary classification on MNIST and Fashion MNIST.
arXiv Detail & Related papers (2023-08-25T15:08:19Z) - QDataset: Quantum Datasets for Machine Learning [1.160208922584163]
The QDataSet is a quantum dataset designed specifically to facilitate the training and development of QML algorithms.
The datasets are structured to provide a wealth of information to enable machine learning practitioners to use the QDataSet to solve problems in applied quantum computation.
Accompanying the datasets on the associated GitHub repository are a set of demonstrating the use of the QDataSet in a range of optimisation contexts.
arXiv Detail & Related papers (2021-08-15T05:30:59Z) - Quantum Federated Learning with Quantum Data [87.49715898878858]
Quantum machine learning (QML) has emerged as a promising field that leans on the developments in quantum computing to explore large complex machine learning problems.
This paper proposes the first fully quantum federated learning framework that can operate over quantum data and, thus, share the learning of quantum circuit parameters in a decentralized manner.
arXiv Detail & Related papers (2021-05-30T12:19:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.