Desyan: A Platform for Seamless Value-Flow and Symbolic Analysis
- URL: http://arxiv.org/abs/2508.00508v1
- Date: Fri, 01 Aug 2025 10:39:09 GMT
- Title: Desyan: A Platform for Seamless Value-Flow and Symbolic Analysis
- Authors: Panagiotis Diamantakis, Thanassis Avgerinos, Yannis Smaragdakis,
- Abstract summary: Desyan is a platform for writing program analyses with seamless integration of value-flow and symbolic reasoning.<n>For value-flow analysis, the engine is the best-in-class Datalog evaluator.<n>For applications that require full SMT, the engine is leveraging the leading SMT solvers.
- Score: 1.7205106391379026
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Over the past two decades, two different types of static analyses have emerged as dominant paradigms both in academia and industry: value-flow analysis (e.g., data-flow analysis or points-to analysis) and symbolic analysis (e.g., symbolic execution). Despite their individual successes in numerous application fields, the two approaches have remained largely separate; an artifact of the simple reality that there is no broadly adopted unifying platform for effortless and efficient integration of symbolic techniques with high-performance data-flow reasoning. To bridge this gap, we introduce Desyan: a platform for writing program analyses with seamless integration of value-flow and symbolic reasoning. Desyan expands a production-ready Datalog fixpoint engine (Souffl\'e) with full-fledged SMT solving invoking industry-leading SMT engines. Desyan provides constructs for automatically (and efficiently!) handling typical patterns that come up in program analysis. At the same time, the integration is agnostic with respect to the solving technology, and supports Datalog-native symbolic reasoning, via a bottom-up algebraic reasoning module. The result is an engine that allows blending different kinds of reasoning, as needed for the underlying analysis. For value-flow analysis, the engine is the best-in-class Datalog evaluator (often by a factor of over 20x in execution time); for applications that require full SMT (e.g., a concolic execution engine or other symbolic evaluator that needs to solve arbitrarily complex conditions), the engine is leveraging the leading SMT solvers; for lightweight symbolic evaluation (e.g., solving simple conditionals in the context of a path-sensitive analysis), the engine can use Datalog-native symbolic reasoning, achieving large speedups (often of over 2x) compared to eagerly appealing to an SMT solver.
Related papers
- FABLE: A Novel Data-Flow Analysis Benchmark on Procedural Text for Large Language Model Evaluation [5.866040886735852]
We introduce FABLE, a benchmark designed to assess large language models' understanding of data flow using structured, procedural text.<n>We evaluate three types of LLMs: a reasoning-focused model (DeepSeek-R1 8B), a general-purpose model (LLaMA 3.1 8B), and a code-specific model (Granite Code 8B)<n>Results show that the reasoning model achieves higher accuracy, but at the cost of over 20 times slower inference compared to the other models.
arXiv Detail & Related papers (2025-05-30T06:32:34Z) - Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition [95.54406667705999]
Pangu Embedded is an efficient Large Language Model (LLM) reasoner developed on Ascend Neural Processing Units (NPUs)<n>It addresses the significant computational costs and inference latency challenges prevalent in existing reasoning-optimized LLMs.<n>It delivers rapid responses and state-of-the-art reasoning quality within a single, unified model architecture.
arXiv Detail & Related papers (2025-05-28T14:03:02Z) - Manalyzer: End-to-end Automated Meta-analysis with Multi-agent System [48.093356587573666]
Meta-analysis is a systematic research methodology that synthesizes data from multiple existing studies to derive comprehensive conclusions.<n>Traditional meta-analysis involves a complex multi-stage pipeline including literature retrieval, paper screening, and data extraction.<n>We propose a multi-agent system, Manalyzer, which achieves end-to-end automated meta-analysis through tool calls.
arXiv Detail & Related papers (2025-05-22T07:25:31Z) - Universal Scalability in Declarative Program Analysis (with Choice-Based Combination Pruning) [1.3874486202578669]
We show a near-universal construction that allows the choice construct to flexibly limit evaluation of predicates.<n>We apply the technique to probably the largest, pre-existing Datalog analysis frameworks in existence: Doop (for Javacode) and the main client analyses from the Gigahorse framework.
arXiv Detail & Related papers (2025-03-07T21:23:02Z) - Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring.
Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations.
Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z) - LogParser-LLM: Advancing Efficient Log Parsing with Large Language Models [19.657278472819588]
We introduce Log-LLM, a novel log integrated with LLM capabilities.
We address the intricate challenge of parsing granularity, proposing a new metric to allow users to calibrate granularity to their specific needs.
Our method's efficacy is empirically demonstrated through evaluations on the Loghub-2k and the large-scale LogPub benchmark.
arXiv Detail & Related papers (2024-08-25T05:34:24Z) - LLMDFA: Analyzing Dataflow in Code with Large Language Models [8.92611389987991]
This paper presents LLMDFA, a compilation-free and customizable dataflow analysis framework.
We decompose the problem into several subtasks and introduce a series of novel strategies.
On average, LLMDFA achieves 87.10% precision and 80.77% recall, surpassing existing techniques with F1 score improvements of up to 0.35.
arXiv Detail & Related papers (2024-02-16T15:21:35Z) - GEqO: ML-Accelerated Semantic Equivalence Detection [3.5521901508676774]
Common computation is crucial for efficient cluster resource utilization and reducing job execution time.
detecting equivalence on large-scale analytics engines requires efficient and scalable solutions that are fully automated.
We propose GEqO, a portable and lightweight machine-learning-based framework for efficiently identifying semantically equivalent computations at scale.
arXiv Detail & Related papers (2024-01-02T16:37:42Z) - Analytical Engines With Context-Rich Processing: Towards Efficient
Next-Generation Analytics [12.317930859033149]
We envision an analytical engine co-optimized with components that enable context-rich analysis.
We aim for a holistic pipeline cost- and rule-based optimization across relational and model-based operators.
arXiv Detail & Related papers (2022-12-14T21:46:33Z) - Linear Temporal Logic Modulo Theories over Finite Traces (Extended
Version) [72.38188258853155]
This paper studies Linear Temporal Logic over Finite Traces (LTLf)
proposition letters are replaced with first-order formulas interpreted over arbitrary theories.
The resulting logic, called Satisfiability Modulo Theories (LTLfMT), is semi-decidable.
arXiv Detail & Related papers (2022-04-28T17:57:33Z) - High-performance symbolic-numerics via multiple dispatch [52.77024349608834]
Symbolics.jl is an extendable symbolic system which uses dynamic multiple dispatch to change behavior depending on the domain needs.
We show that by formalizing a generic API on actions independent of implementation, we can retroactively add optimized data structures to our system.
We demonstrate the ability to swap between classical term-rewriting simplifiers and e-graph-based term-rewriting simplifiers.
arXiv Detail & Related papers (2021-05-09T14:22:43Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.