Related papers: Universal Scalability in Declarative Program Analysis (with Choice-Based Combination Pruning)

Universal Scalability in Declarative Program Analysis (with Choice-Based Combination Pruning)

URL: http://arxiv.org/abs/2503.05945v1
Date: Fri, 07 Mar 2025 21:23:02 GMT
Title: Universal Scalability in Declarative Program Analysis (with Choice-Based Combination Pruning)
Authors: Anastasios Antoniadis, Ilias Tsatiris, Nevill Grech, Yannis Smaragdakis,
Abstract summary: We show a near-universal construction that allows the choice construct to flexibly limit evaluation of predicates.<n>We apply the technique to probably the largest, pre-existing Datalog analysis frameworks in existence: Doop (for Javacode) and the main client analyses from the Gigahorse framework.
Score: 1.3874486202578669
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this work, we present a simple, uniform, and elegant solution to the problem, with stunning practical effectiveness and application to virtually any Datalog-based analysis. The approach consists of leveraging the choice construct, supported natively in modern Datalog engines like Souffl\'e. The choice construct allows the definition of functional dependencies in a relation and has been used in the past for expressing worklist algorithms. We show a near-universal construction that allows the choice construct to flexibly limit evaluation of predicates. The technique is applicable to practically any analysis architecture imaginable, since it adaptively prunes evaluation results when a (programmer-controlled) projection of a relation exceeds a desired cardinality. We apply the technique to probably the largest, pre-existing Datalog analysis frameworks in existence: Doop (for Java bytecode) and the main client analyses from the Gigahorse framework (for Ethereum smart contracts). Without needing to understand the existing analysis logic and with minimal, local-only changes, the performance of each framework increases dramatically, by over 20x for the hardest inputs, with near-negligible sacrifice in completeness.

Related papers

CORG: Generating Answers from Complex, Interrelated Contexts [57.213304718157985]
In a real-world corpus, knowledge frequently recurs across documents but often contains inconsistencies due to ambiguous naming, outdated information, or errors. Previous research has shown that language models struggle with these complexities, typically focusing on single factors in isolation. We introduce Context Organizer (CORG), a framework that organizes multiple contexts into independently processed groups.
arXiv Detail & Related papers (2025-04-25T02:40:48Z)
Personalized Top-k Set Queries Over Predicted Scores [21.74740893966611]
This work studies the applicability of expensive external oracles in answering top-k queries over predicted scores. We propose a generic computational framework that handles arbitrary set-based scoring functions.
arXiv Detail & Related papers (2025-02-18T16:19:08Z)
EpiCoder: Encompassing Diversity and Complexity in Code Generation [49.170195362149386]
We introduce a novel feature tree-based synthesis framework inspired by Abstract Syntax Trees (AST)<n>Unlike AST, which captures syntactic structure of code, our framework models semantic relationships between code elements.<n>We fine-tuned widely-used base models to create the EpiCoder series, achieving state-of-the-art performance at both the function and file levels.
arXiv Detail & Related papers (2025-01-08T18:58:15Z)
StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs [78.84060166851805]
StructTest is a novel benchmark that evaluates large language models on their ability to produce structured outputs.<n>We demonstrate that StructTest serves as a good proxy for general reasoning abilities.
arXiv Detail & Related papers (2024-12-23T22:08:40Z)
UQE: A Query Engine for Unstructured Databases [71.49289088592842]
We investigate the potential of Large Language Models to enable unstructured data analytics. We propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data collections.
arXiv Detail & Related papers (2024-06-23T06:58:55Z)
SymbolicAI: A framework for logic-based approaches combining generative models and solvers [9.841285581456722]
We introduce SymbolicAI, a versatile and modular framework employing a logic-based approach to concept learning and flow management in generative processes. We treat large language models (LLMs) as semantic solvers that execute tasks based on both natural and formal language instructions.
arXiv Detail & Related papers (2024-02-01T18:50:50Z)
Learning to Branch in Combinatorial Optimization with Graph Pointer Networks [17.729352126574902]
This paper proposes a graph pointer network model for learning the variable selection policy in the branch-and-bound. The proposed model, which combines the graph neural network and the pointer mechanism, can effectively map from the solver state to the branching variable decisions.
arXiv Detail & Related papers (2023-07-04T01:56:07Z)
Hexatagging: Projective Dependency Parsing as Tagging [63.5392760743851]
We introduce a novel dependency, the hexatagger, that constructs dependency trees by tagging the words in a sentence with elements from a finite set of possible tags. Our approach is fully parallelizable at training time, i.e., the structure-building actions needed to build a dependency parse can be predicted in parallel to each other. We achieve state-of-the-art performance of 96.4 LAS and 97.4 UAS on the Penn Treebank test set.
arXiv Detail & Related papers (2023-06-08T18:02:07Z)
Structured Sentiment Analysis as Transition-based Dependency Parsing [0.40611352512781856]
Structured sentiment analysis aims to automatically extract people's opinions from a text in natural language. One of the most accurate methods for performing SSA was recently proposed and consists of approaching it as a dependency parsing task. We present the first transition-based method to address SSA as dependency parsing.
arXiv Detail & Related papers (2023-05-09T10:03:34Z)
Domain Specific Question Answering Over Knowledge Graphs Using Logical Programming and Large Language Models [10.258158633354686]
Our approach integrates classic logical programming languages into large language models (LLMs) Our experimental results demonstrate that our method achieves accurate identification of correct answer entities for all test questions, even when trained on a small fraction of annotated data.
arXiv Detail & Related papers (2023-03-03T20:35:38Z)
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models. We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z)
X2Parser: Cross-Lingual and Cross-Domain Framework for Task-Oriented Compositional Semantic Parsing [51.81533991497547]
Task-oriented compositional semantic parsing (TCSP) handles complex nested user queries. We present X2 compared a transferable Cross-lingual and Cross-domain for TCSP. We propose to predict flattened intents and slots representations separately and cast both prediction tasks into sequence labeling problems.
arXiv Detail & Related papers (2021-06-07T16:40:05Z)
Comparative Code Structure Analysis using Deep Learning for Performance Prediction [18.226950022938954]
This paper aims to assess the feasibility of using purely static information (e.g., abstract syntax tree or AST) of applications to predict performance change based on the change in code structure. Our evaluations of several deep embedding learning methods demonstrate that tree-based Long Short-Term Memory (LSTM) models can leverage the hierarchical structure of source-code to discover latent representations and achieve up to 84% (individual problem) and 73% (combined dataset with multiple of problems) accuracy in predicting the change in performance.
arXiv Detail & Related papers (2021-02-12T16:59:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.