Related papers: Can LLMs Solve ASP Problems? Insights from a Benchmarking Study (Extended Version)

Can LLMs Solve ASP Problems? Insights from a Benchmarking Study (Extended Version)

URL: http://arxiv.org/abs/2507.19749v1
Date: Sat, 26 Jul 2025 02:46:08 GMT
Title: Can LLMs Solve ASP Problems? Insights from a Benchmarking Study (Extended Version)
Authors: Lin Ren, Guohui Xiao, Guilin Qi, Yishuai Geng, Haohan Xue,
Abstract summary: Large language models (LLMs) have demonstrated promising capabilities in logical reasoning.<n>LLMs struggle with answer set computation, which is the core of ASP solving.<n>This highlights the need for new approaches that integrate symbolic reasoning capabilities more effectively.
Score: 8.29485811981654
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Answer Set Programming (ASP) is a powerful paradigm for non-monotonic reasoning. Recently, large language models (LLMs) have demonstrated promising capabilities in logical reasoning. Despite this potential, current evaluations of LLM capabilities in ASP are often limited. Existing works normally employ overly simplified ASP programs, do not support negation, disjunction, or multiple answer sets. Furthermore, there is a lack of benchmarks that introduce tasks specifically designed for ASP solving. To bridge this gap, we introduce ASPBench, a comprehensive ASP benchmark, including three ASP specific tasks: ASP entailment, answer set verification, and answer set computation. Our extensive evaluations on ASPBench reveal that while 14 state-of-the-art LLMs, including \emph{deepseek-r1}, \emph{o4-mini}, and \emph{gemini-2.5-flash-thinking}, perform relatively well on the first two simpler tasks, they struggle with answer set computation, which is the core of ASP solving. These findings offer insights into the current limitations of LLMs in ASP solving. This highlights the need for new approaches that integrate symbolic reasoning capabilities more effectively. The code and dataset are available at https://github.com/HomuraT/ASPBench.

Related papers

ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models [67.75439511654078]
Large Vision-Language Models (LVLMs) have introduced a new paradigm for understanding and reasoning about image input through textual responses.<n>They face the persistent challenge of hallucination, which introduces practical weaknesses and raises concerns about their reliable deployment in real-world applications.<n>We propose ONLY, a training-free decoding approach that requires only a single query and a one-layer intervention during decoding, enabling efficient real-time deployment.
arXiv Detail & Related papers (2025-07-01T16:01:08Z)
Hybrid Answer Set Programming: Foundations and Applications [0.0]
We introduce the Logic of Here-and-There with constraints (HT_c) as an extension of the Logic of Here-and-There (HT) and its non-monotone extension Equilibrium Logic.<n>The idea is that HTC (and other extensions) play an analogous role for hybrid ASP.<n>Having a formal understanding of these hybrid logics is also needed to better understand the inherent structure of the (real-world) problems they are applied to and to improve their representations in ASP.
arXiv Detail & Related papers (2025-02-13T11:53:57Z)
PTD-SQL: Partitioning and Targeted Drilling with LLMs in Text-to-SQL [54.304872649870575]
Large Language Models (LLMs) have emerged as powerful tools for Text-to-sense tasks. In this study, we propose that employing query group partitioning allows LLMs to focus on learning the thought processes specific to a single problem type.
arXiv Detail & Related papers (2024-09-21T09:33:14Z)
LLASP: Fine-tuning Large Language Models for Answer Set Programming [6.261151680007598]
Large Language Models (LLMs) have showcased their potential in various natural language processing tasks, including code generation. We propose LLASP, a fine-tuned lightweight model specifically trained to encode fundamental ASP program patterns. Our experiments demonstrate that the quality of ASP programs generated by LLASP is remarkable.
arXiv Detail & Related papers (2024-07-26T13:18:42Z)
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning. Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z)
Finite Groundings for ASP with Functions: A Journey through Consistency [21.53198582611571]
It is known that enhancing ASP with function symbols makes basic reasoning problems highly undecidable. We show reductions that give an intuition for the high level of undecidability. These insights allow for a more fine-grained analysis where we characterize ASP programs as "frugal" and "non-proliferous"
arXiv Detail & Related papers (2024-05-08T11:50:08Z)
VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding? [115.60866817774641]
Multimodal Large Language models (MLLMs) have shown promise in web-related tasks. evaluating their performance in the web domain remains a challenge due to the lack of comprehensive benchmarks. bench is a multimodal benchmark designed to assess the capabilities of MLLMs across a variety of web tasks.
arXiv Detail & Related papers (2024-04-09T02:29:39Z)
Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [76.76046657162306]
Large language models (LLMs) have emerged as a new paradigm for Text-to- task. Large language models (LLMs) have emerged as a new paradigm for Text-to- task.
arXiv Detail & Related papers (2023-08-29T14:59:54Z)
Allies: Prompting Large Language Model with Beam Search [107.38790111856761]
In this work, we propose a novel method called ALLIES. Given an input query, ALLIES leverages LLMs to iteratively generate new queries related to the original query. By iteratively refining and expanding the scope of the original query, ALLIES captures and utilizes hidden knowledge that may not be directly through retrieval.
arXiv Detail & Related papers (2023-05-24T06:16:44Z)
SatLM: Satisfiability-Aided Language Models Using Declarative Prompting [68.40726892904286]
We propose a new satisfiability-aided language modeling (SatLM) approach for improving the reasoning capabilities of large language models (LLMs) We use an LLM to generate a declarative task specification rather than an imperative program and leverage an off-the-shelf automated theorem prover to derive the final answer. We evaluate SATLM on 8 different datasets and show that it consistently outperforms program-aided LMs in the imperative paradigm.
arXiv Detail & Related papers (2023-05-16T17:55:51Z)
A Preliminary Data-driven Analysis of Common Errors Encountered by Novice SPARC Programmers [0.0]
This study focuses on the types and difficulty of programming errors encountered by K-12 students using ASP. From error messages in this dataset, we identify a collection of error classes, and measure how frequently each class occurs and how difficult it is to resolve.
arXiv Detail & Related papers (2022-08-05T10:48:25Z)
LP2PB: Translating Answer Set Programs into Pseudo-Boolean Theories [0.0]
We present a new tool LP2PB that translates ASP programs into pseudo-Boolean theories. We evaluate our tool, and the potential of cutting-plane-based solving for ASP on traditional ASP benchmarks.
arXiv Detail & Related papers (2020-09-22T00:50:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.