Related papers: From Text to CQL: Bridging Natural Language and Corpus Search Engine

From Text to CQL: Bridging Natural Language and Corpus Search Engine

URL: http://arxiv.org/abs/2402.13740v1
Date: Wed, 21 Feb 2024 12:11:28 GMT
Title: From Text to CQL: Bridging Natural Language and Corpus Search Engine
Authors: Luming Lu, Jiyuan An, Yujie Wang, Liner yang, Cunliang Kong, Zhenghao Liu, Shuo Wang, Haozhe Lin, Mingwei Fang, Yaping Huang and Erhong Yang
Abstract summary: Corpus Query Language (CQL) is a critical tool for linguistic research and detailed analysis within text corpora. This paper presents the first text-to-CQL task that aims to automate the translation of natural language into CQL.
Score: 27.56738323943742
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Natural Language Processing (NLP) technologies have revolutionized the way we interact with information systems, with a significant focus on converting natural language queries into formal query languages such as SQL. However, less emphasis has been placed on the Corpus Query Language (CQL), a critical tool for linguistic research and detailed analysis within text corpora. The manual construction of CQL queries is a complex and time-intensive task that requires a great deal of expertise, which presents a notable challenge for both researchers and practitioners. This paper presents the first text-to-CQL task that aims to automate the translation of natural language into CQL. We present a comprehensive framework for this task, including a specifically curated large-scale dataset and methodologies leveraging large language models (LLMs) for effective text-to-CQL task. In addition, we established advanced evaluation metrics to assess the syntactic and semantic accuracy of the generated queries. We created innovative LLM-based conversion approaches and detailed experiments. The results demonstrate the efficacy of our methods and provide insights into the complexities of text-to-CQL task.

Related papers

Unleashing the Power of LLMs in Dense Retrieval with Query Likelihood Modeling [69.84963245729826]
Large language models (LLMs) have shown compelling semantic understanding capabilities. Dense retrieval is a crucial task in Information Retrieval (IR) and is the foundation for downstream tasks as re-ranking. We introduce an auxiliary task of QL estimation to yield a better backbone for contrast learning a discriminative retriever.
arXiv Detail & Related papers (2025-04-07T16:03:59Z)
Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents [61.41316121093604]
We present InsCoQA, a novel benchmark for evaluating large language models (LLMs) in the context of conversational question answering (CQA) Sourced from extensive, encyclopedia-style instructional content, InsCoQA assesses models on their ability to retrieve, interpret, and accurately summarize procedural guidance from multiple documents. We also propose InsEval, an LLM-assisted evaluator that measures the integrity and accuracy of generated responses and procedural instructions.
arXiv Detail & Related papers (2024-10-01T09:10:00Z)
Assessing SPARQL capabilities of Large Language Models [0.0]
We focus on measuring out-of-the box capabilities of Large Language Models to work with SPARQL. We implement benchmarking tasks in the LLM-KG-Bench framework for automated execution and evaluation. Our findings indicate that working with SPARQL SELECT queries is still challenging for LLMs.
arXiv Detail & Related papers (2024-09-09T08:29:39Z)
MST5 -- Multilingual Question Answering over Knowledge Graphs [1.6470999044938401]
Knowledge Graph Question Answering (KGQA) simplifies querying vast amounts of knowledge stored in a graph-based model using natural language. Existing multilingual KGQA systems face challenges in achieving performance comparable to English systems. We propose a simplified approach to enhance multilingual KGQA systems by incorporating linguistic context and entity information directly into the processing pipeline of a language model.
arXiv Detail & Related papers (2024-07-08T15:37:51Z)
UQE: A Query Engine for Unstructured Databases [71.49289088592842]
We investigate the potential of Large Language Models to enable unstructured data analytics. We propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data collections.
arXiv Detail & Related papers (2024-06-23T06:58:55Z)
STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases [93.96463520716759]
We develop STARK, a large-scale Semi-structure retrieval benchmark on Textual and Knowledge Bases. Our benchmark covers three domains: product search, academic paper search, and queries in precision medicine. We design a novel pipeline to synthesize realistic user queries that integrate diverse relational information and complex textual properties.
arXiv Detail & Related papers (2024-04-19T22:54:54Z)
NL2KQL: From Natural Language to Kusto Query [1.7931930942711818]
NL2KQL is an innovative framework that uses large language models (LLMs) to convert natural language queries (NLQs) to Kusto Query Language (KQL) queries. To validate NL2KQL's performance, we utilize an array of online (based on query execution) and offline (based on query parsing) metrics.
arXiv Detail & Related papers (2024-04-03T01:09:41Z)
DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge. Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z)
Text-to-OverpassQL: A Natural Language Interface for Complex Geodata Querying of OpenStreetMap [17.01783992725517]
We present Text-to-OverpassQL, a task designed to facilitate a natural language interface for querying geodata from OpenStreetMap (OSM) Generating Overpass queries from natural language input serves multiple use-cases.
arXiv Detail & Related papers (2023-08-30T14:33:25Z)
Prompting Language Models for Linguistic Structure [73.11488464916668]
We present a structured prompting approach for linguistic structured prediction tasks. We evaluate this approach on part-of-speech tagging, named entity recognition, and sentence chunking. We find that while PLMs contain significant prior knowledge of task labels due to task leakage into the pretraining corpus, structured prompting can also retrieve linguistic structure with arbitrary labels.
arXiv Detail & Related papers (2022-11-15T01:13:39Z)
SCROLLS: Standardized CompaRison Over Long Language Sequences [62.574959194373264]
We introduce SCROLLS, a suite of tasks that require reasoning over long texts. SCROLLS contains summarization, question answering, and natural language inference tasks. We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods.
arXiv Detail & Related papers (2022-01-10T18:47:15Z)
ColloQL: Robust Cross-Domain Text-to-SQL Over Search Queries [10.273545005890496]
We introduce data augmentation techniques and a sampling-based content-aware BERT model (ColloQL) ColloQL achieves 84.9% (execution) and 90.7% (execution) accuracy on the Wikilogical dataset.
arXiv Detail & Related papers (2020-10-19T23:53:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.