From Text to CQL: Bridging Natural Language and Corpus Search Engine
- URL: http://arxiv.org/abs/2402.13740v1
- Date: Wed, 21 Feb 2024 12:11:28 GMT
- Title: From Text to CQL: Bridging Natural Language and Corpus Search Engine
- Authors: Luming Lu, Jiyuan An, Yujie Wang, Liner yang, Cunliang Kong, Zhenghao
Liu, Shuo Wang, Haozhe Lin, Mingwei Fang, Yaping Huang and Erhong Yang
- Abstract summary: Corpus Query Language (CQL) is a critical tool for linguistic research and detailed analysis within text corpora.
This paper presents the first text-to-CQL task that aims to automate the translation of natural language into CQL.
- Score: 27.56738323943742
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural Language Processing (NLP) technologies have revolutionized the way we
interact with information systems, with a significant focus on converting
natural language queries into formal query languages such as SQL. However, less
emphasis has been placed on the Corpus Query Language (CQL), a critical tool
for linguistic research and detailed analysis within text corpora. The manual
construction of CQL queries is a complex and time-intensive task that requires
a great deal of expertise, which presents a notable challenge for both
researchers and practitioners. This paper presents the first text-to-CQL task
that aims to automate the translation of natural language into CQL. We present
a comprehensive framework for this task, including a specifically curated
large-scale dataset and methodologies leveraging large language models (LLMs)
for effective text-to-CQL task. In addition, we established advanced evaluation
metrics to assess the syntactic and semantic accuracy of the generated queries.
We created innovative LLM-based conversion approaches and detailed experiments.
The results demonstrate the efficacy of our methods and provide insights into
the complexities of text-to-CQL task.
Related papers
- MST5 -- Multilingual Question Answering over Knowledge Graphs [1.6470999044938401]
Knowledge Graph Question Answering (KGQA) simplifies querying vast amounts of knowledge stored in a graph-based model using natural language.
Existing multilingual KGQA systems face challenges in achieving performance comparable to English systems.
We propose a simplified approach to enhance multilingual KGQA systems by incorporating linguistic context and entity information directly into the processing pipeline of a language model.
arXiv Detail & Related papers (2024-07-08T15:37:51Z) - UQE: A Query Engine for Unstructured Databases [71.49289088592842]
We investigate the potential of Large Language Models to enable unstructured data analytics.
We propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data collections.
arXiv Detail & Related papers (2024-06-23T06:58:55Z) - TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools [51.576974932743596]
Large Language Models (LLMs) often do not perform well on queries that require the aggregation of information across texts.
To better evaluate this setting and facilitate modeling efforts, we introduce TACT - Text And Calculations through Tables.
TACT contains challenging instructions that demand stitching information scattered across one or more texts, and performing complex integration on this information to generate the answer.
arXiv Detail & Related papers (2024-06-05T20:32:56Z) - STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases [93.96463520716759]
We develop STARK, a large-scale Semi-structure retrieval benchmark on Textual and K nowledge Bases.
Our benchmark covers three domains/datasets: product search, academic paper search, and queries in precision medicine.
We design a novel pipeline to synthesize realistic user queries that integrate diverse relational information and complex textual properties.
arXiv Detail & Related papers (2024-04-19T22:54:54Z) - NL2KQL: From Natural Language to Kusto Query [1.7931930942711818]
NL2KQL is an innovative framework that uses large language models (LLMs) to convert natural language queries (NLQs) to Kusto Query Language (KQL) queries.
To validate NL2KQL's performance, we utilize an array of online (based on query execution) and offline (based on query parsing) metrics.
arXiv Detail & Related papers (2024-04-03T01:09:41Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - Text-to-OverpassQL: A Natural Language Interface for Complex Geodata
Querying of OpenStreetMap [17.01783992725517]
We present Text-to-OverpassQL, a task designed to facilitate a natural language interface for querying geodata from OpenStreetMap (OSM)
Generating Overpass queries from natural language input serves multiple use-cases.
arXiv Detail & Related papers (2023-08-30T14:33:25Z) - Prompting Language Models for Linguistic Structure [73.11488464916668]
We present a structured prompting approach for linguistic structured prediction tasks.
We evaluate this approach on part-of-speech tagging, named entity recognition, and sentence chunking.
We find that while PLMs contain significant prior knowledge of task labels due to task leakage into the pretraining corpus, structured prompting can also retrieve linguistic structure with arbitrary labels.
arXiv Detail & Related papers (2022-11-15T01:13:39Z) - SCROLLS: Standardized CompaRison Over Long Language Sequences [62.574959194373264]
We introduce SCROLLS, a suite of tasks that require reasoning over long texts.
SCROLLS contains summarization, question answering, and natural language inference tasks.
We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods.
arXiv Detail & Related papers (2022-01-10T18:47:15Z) - SPBERT: Pre-training BERT on SPARQL Queries for End-to-end Question
Answering over Knowledge Graphs [1.1775939485654976]
SPBERT is a Transformer-based language model pre-trained on massive SPARQL query logs.
We investigate how SPBERT and encoder-decoder architecture can be adapted for Knowledge-based QA corpora.
arXiv Detail & Related papers (2021-06-18T08:39:26Z) - ColloQL: Robust Cross-Domain Text-to-SQL Over Search Queries [10.273545005890496]
We introduce data augmentation techniques and a sampling-based content-aware BERT model (ColloQL)
ColloQL achieves 84.9% (execution) and 90.7% (execution) accuracy on the Wikilogical dataset.
arXiv Detail & Related papers (2020-10-19T23:53:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.