BigCQ: A large-scale synthetic dataset of competency question patterns
formalized into SPARQL-OWL query templates
- URL: http://arxiv.org/abs/2105.09574v1
- Date: Thu, 20 May 2021 07:59:59 GMT
- Title: BigCQ: A large-scale synthetic dataset of competency question patterns
formalized into SPARQL-OWL query templates
- Authors: Dawid Wi\'sniewski and J\k{e}drzej Potoniec and Agnieszka
{\L}awrynowicz
- Abstract summary: BigCQ is the biggest dataset of CQ templates with their formalizations into SPARQL-OWL query templates.
We describe the dataset in detail, provide a description of the process leading to the creation of the dataset and analyze how well the dataset covers real-world examples.
- Score: 0.06445605125467574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Competency Questions (CQs) are used in many ontology engineering
methodologies to collect requirements and track the completeness and
correctness of an ontology being constructed. Although they are frequently
suggested by ontology engineering methodologies, the publicly available
datasets of CQs and their formalizations in ontology query languages are very
scarce. Since first efforts to automate processes utilizing CQs are being made,
it is of high importance to provide large and diverse datasets to fuel these
solutions. In this paper, we present BigCQ, the biggest dataset of CQ templates
with their formalizations into SPARQL-OWL query templates. BigCQ is created
automatically from a dataset of frequently used axiom shapes. These pairs of CQ
templates and query templates can be then materialized as actual CQs and
SPARQL-OWL queries if filled with resource labels and IRIs from a given
ontology. We describe the dataset in detail, provide a description of the
process leading to the creation of the dataset and analyze how well the dataset
covers real-world examples. We also publish the dataset as well as scripts
transforming axiom shapes into pairs of CQ patterns and SPARQL-OWL templates,
to make engineers able to adapt the process to their particular needs.
Related papers
- Effective Instruction Parsing Plugin for Complex Logical Query Answering on Knowledge Graphs [51.33342412699939]
Knowledge Graph Query Embedding (KGQE) aims to embed First-Order Logic (FOL) queries in a low-dimensional KG space for complex reasoning over incomplete KGs.
Recent studies integrate various external information (such as entity types and relation context) to better capture the logical semantics of FOL queries.
We propose an effective Query Instruction Parsing (QIPP) that captures latent query patterns from code-like query instructions.
arXiv Detail & Related papers (2024-10-27T03:18:52Z) - CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation [51.2289822267563]
We propose Corpus Retrieval and Augmentation for Fine-Tuning (CRAFT), a method for generating synthetic datasets.
We use large-scale public web-crawled corpora and similarity-based document retrieval to find other relevant human-written documents.
We demonstrate that CRAFT can efficiently generate large-scale task-specific training datasets for four diverse tasks.
arXiv Detail & Related papers (2024-09-03T17:54:40Z) - TrustUQA: A Trustful Framework for Unified Structured Data Question Answering [45.480862651323115]
We propose UnifiedTQA, a trustful QA framework that can simultaneously support multiple types of structured data in a unified way.
We have evaluated UnifiedTQA with 5 benchmarks covering 3 types of structured data.
It outperforms 2 existing unified structured data QA methods and in comparison with the baselines that are specific to a data type, it achieves state-of-the-art on 2 of them.
arXiv Detail & Related papers (2024-06-27T06:13:05Z) - UQE: A Query Engine for Unstructured Databases [71.49289088592842]
We investigate the potential of Large Language Models to enable unstructured data analytics.
We propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data collections.
arXiv Detail & Related papers (2024-06-23T06:58:55Z) - IQLS: Framework for leveraging Metadata to enable Large Language Model based queries to complex, versatile Data [0.20482269513546458]
The Intelligent Query and Learning System (IQLS) simplifies the process by allowing natural language use to simplify data retrieval.
It maps structured data into a framework based on the available metadata and available data models.
The IQLS enables the agent to fulfill tasks given by the user query through interfaces.
arXiv Detail & Related papers (2024-05-04T13:44:05Z) - NL2KQL: From Natural Language to Kusto Query [1.7931930942711818]
NL2KQL is an innovative framework that uses large language models (LLMs) to convert natural language queries (NLQs) to Kusto Query Language (KQL) queries.
To validate NL2KQL's performance, we utilize an array of online (based on query execution) and offline (based on query parsing) metrics.
arXiv Detail & Related papers (2024-04-03T01:09:41Z) - LMGQS: A Large-scale Dataset for Query-focused Summarization [77.6179359525065]
We convert four generic summarization benchmarks into a new QFS benchmark dataset, LMGQS.
We establish baselines with state-of-the-art summarization models.
We achieve state-of-the-art zero-shot and supervised performance on multiple existing QFS benchmarks.
arXiv Detail & Related papers (2023-05-22T14:53:45Z) - NQE: N-ary Query Embedding for Complex Query Answering over
Hyper-Relational Knowledge Graphs [1.415350927301928]
Complex query answering is an essential task for logical reasoning on knowledge graphs.
We propose a novel N-ary Query Embedding (NQE) model for CQA over hyper-relational knowledge graphs (HKGs)
NQE utilizes a dual-heterogeneous Transformer encoder and fuzzy logic theory to satisfy all n-ary FOL queries.
We generate a new CQA dataset WD50K-NFOL, including diverse n-ary FOL queries over WD50K.
arXiv Detail & Related papers (2022-11-24T08:26:18Z) - Exploring Sequence-to-Sequence Models for SPARQL Pattern Composition [0.5639451539396457]
A booming amount of information is continuously added to the Internet as structured and unstructured data, feeding knowledge bases such as DBpedia and Wikidata.
The aim of Question Answering systems is to allow lay users to access such data using natural language without needing to write formal queries.
We show that sequence-to-sequence models are a viable and promising option to transform long utterances into complex SPARQL queries.
arXiv Detail & Related papers (2020-10-21T11:12:01Z) - Generating Diverse and Consistent QA pairs from Contexts with
Information-Maximizing Hierarchical Conditional VAEs [62.71505254770827]
We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts.
Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
arXiv Detail & Related papers (2020-05-28T08:26:06Z) - Template-Based Question Generation from Retrieved Sentences for Improved
Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data.
We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.