Predictive Query Language: A Domain-Specific Language for Predictive Modeling on Relational Databases
- URL: http://arxiv.org/abs/2602.09572v2
- Date: Mon, 16 Feb 2026 10:18:23 GMT
- Title: Predictive Query Language: A Domain-Specific Language for Predictive Modeling on Relational Databases
- Authors: Vid Kocijan, Jinu Sunil, Jan Eric Lenssen, Viman Deb, Xinwei Xe, Federico Reyes Gomez, Matthias Fey, Jure Leskovec,
- Abstract summary: Predictive Query Language (PQL) allows specifying a predictive task in a single declarative query.<n>PQL is already successfully integrated and used in a collection of use cases as part of a predictive AI platform.<n>We demonstrate its versatility through two implementations; one for small-scale, low-latency use and one that can handle large-scale databases.
- Score: 45.647010182417205
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The purpose of predictive modeling on relational data is to predict future or missing values in a relational database, for example, future purchases of a user, risk of readmission of the patient, or the likelihood that a financial transaction is fraudulent. Typically powered by machine learning methods, predictive models are used in recommendations, financial fraud detection, supply chain optimization, and other systems, providing billions of predictions every day. However, training a machine learning model requires manual work to extract the required training examples - prediction entities and target labels - from the database, which is slow, laborious, and prone to mistakes. Here, we present the Predictive Query Language (PQL), an SQL-inspired declarative language for defining predictive tasks on relational databases. PQL allows specifying a predictive task in a single declarative query, enabling the automatic computation of training labels for a large variety of machine learning tasks, such as regression, classification, time-series forecasting, and recommender systems. PQL is already successfully integrated and used in a collection of use cases as part of a predictive AI platform. The versatility of the language can be demonstrated through its many ongoing use cases, including financial fraud, item recommendations, and workload prediction. We demonstrate its versatile design through two implementations; one for small-scale, low-latency use and one that can handle large-scale databases.
Related papers
- Beyond Next Word Prediction: Developing Comprehensive Evaluation Frameworks for measuring LLM performance on real world applications [3.686808512438363]
Large Language Models (LLMs) have numerous use-cases, and have already acquired a significant degree of enterprise adoption.<n>This paper provides the basis for a more comprehensive evaluation framework, based upon a traditional game and tool-based architecture.
arXiv Detail & Related papers (2025-03-05T06:44:38Z) - AutoElicit: Using Large Language Models for Expert Prior Elicitation in Predictive Modelling [53.54623137152208]
We introduce AutoElicit to extract knowledge from large language models and construct priors for predictive models.<n>We show these priors are informative and can be refined using natural language.<n>We find that AutoElicit yields priors that can substantially reduce error over uninformative priors, using fewer labels, and consistently outperform in-context learning.
arXiv Detail & Related papers (2024-11-26T10:13:39Z) - A Simple Baseline for Predicting Events with Auto-Regressive Tabular Transformers [70.20477771578824]
Existing approaches to event prediction include time-aware positional embeddings, learned row and field encodings, and oversampling methods for addressing class imbalance.
We propose a simple but flexible baseline using standard autoregressive LLM-style transformers with elementary positional embeddings and a causal language modeling objective.
Our baseline outperforms existing approaches across popular datasets and can be employed for various use-cases.
arXiv Detail & Related papers (2024-10-14T15:59:16Z) - LLM-Select: Feature Selection with Large Language Models [64.5099482021597]
Large language models (LLMs) are capable of selecting the most predictive features, with performance rivaling the standard tools of data science.<n>Our findings suggest that LLMs may be useful not only for selecting the best features for training but also for deciding which features to collect in the first place.
arXiv Detail & Related papers (2024-07-02T22:23:40Z) - Can Language Models Use Forecasting Strategies? [14.332379032371612]
We describe experiments using a novel dataset of real world events and associated human predictions.
We find that models still struggle to make accurate predictions about the future.
arXiv Detail & Related papers (2024-06-06T19:01:42Z) - Predictive Querying for Autoregressive Neural Sequence Models [23.85426261235507]
We introduce a general typology for predictive queries in neural autoregressive sequence models.
We show that such queries can be systematically represented by sets of elementary building blocks.
We leverage this typology to develop new query estimation methods.
arXiv Detail & Related papers (2022-10-12T17:59:36Z) - Ambiguity in Sequential Data: Predicting Uncertain Futures with
Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data.
We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z) - Multi-layer Optimizations for End-to-End Data Analytics [71.05611866288196]
We introduce Iterative Functional Aggregate Queries (IFAQ), a framework that realizes an alternative approach.
IFAQ treats the feature extraction query and the learning task as one program given in the IFAQ's domain-specific language.
We show that a Scala implementation of IFAQ can outperform mlpack, Scikit, and specialization by several orders of magnitude for linear regression and regression tree models over several relational datasets.
arXiv Detail & Related papers (2020-01-10T16:14:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.