Related papers: VIRO: Robust and Efficient Neuro-Symbolic Reasoning with Verification for Referring Expression Comprehension

VIRO: Robust and Efficient Neuro-Symbolic Reasoning with Verification for Referring Expression Comprehension

URL: http://arxiv.org/abs/2601.12781v1
Date: Mon, 19 Jan 2026 07:21:19 GMT
Title: VIRO: Robust and Efficient Neuro-Symbolic Reasoning with Verification for Referring Expression Comprehension
Authors: Hyejin Park, Junhyuk Kwon, Suha Kwak, Jungseul Ok,
Abstract summary: Referring Expression (REC) aims to localize the image region corresponding to a natural-language query.<n>Recent neuro-symbolic REC approaches leverage large language models (LLMs) and vision-language models (VLMs) to perform compositional reasoning.<n>We introduce VIRO, a neuro-symbolic framework that embeds lightweight operator-level verifiers within reasoning steps.
Score: 51.76841625486355
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Referring Expression Comprehension (REC) aims to localize the image region corresponding to a natural-language query. Recent neuro-symbolic REC approaches leverage large language models (LLMs) and vision-language models (VLMs) to perform compositional reasoning, decomposing queries 4 structured programs and executing them step-by-step. While such approaches achieve interpretable reasoning and strong zero-shot generalization, they assume that intermediate reasoning steps are accurate. However, this assumption causes cascading errors: false detections and invalid relations propagate through the reasoning chain, yielding high-confidence false positives even when no target is present in the image. To address this limitation, we introduce Verification-Integrated Reasoning Operators (VIRO), a neuro-symbolic framework that embeds lightweight operator-level verifiers within reasoning steps. Each operator executes and validates its output, such as object existence or spatial relationship, thereby allowing the system to robustly handle no-target cases when verification conditions are not met. Our framework achieves state-of-the-art performance, reaching 61.1% balanced accuracy across target-present and no-target settings, and demonstrates generalization to real-world egocentric data. Furthermore, VIRO shows superior computational efficiency in terms of throughput, high reliability with a program failure rate of less than 0.3%, and scalability through decoupled program generation from execution.

Related papers

Act, Think or Abstain: Complexity-Aware Adaptive Inference for Vision-Language-Action Models [7.802379200026965]
We propose an adaptive framework that dynamically routes VLA execution based on the complexity of the perceived state.<n>Our approach transforms the VLA's vision-language backbone into an active detection tool by projecting latent embeddings into an ensemble of parametric and non-parametric estimators.
arXiv Detail & Related papers (2026-03-05T13:14:41Z)
Toward Faithful and Complete Answer Construction from a Single Document [1.0742675209112622]
We present EVE, a structured framework for document-grounded reasoning.<n>Unlike free-form prompting, EVE constrains generation to a structured, verifiable pipeline that decomposes high-rigor reasoning into extraction, validation, and enumeration.
arXiv Detail & Related papers (2026-02-05T18:22:08Z)
Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification [49.506412445511934]
Large Language Models (LLMs) show remarkable capabilities, yet their next-token prediction creates logical inconsistencies and reward hacking.<n>We introduce a formal logic verification-guided framework that dynamically interleaves formal symbolic verification with the natural language generation process.<n>We operationalize this framework via a novel two-stage training pipeline that synergizes formal logic verification-guided supervised fine-tuning and policy optimization.
arXiv Detail & Related papers (2026-01-30T07:01:25Z)
Formal that "Floats" High: Formal Verification of Floating Point Arithmetic [0.0]
This paper presents a scalable methodology for verifying floating-point arithmetic using direct RTL-to-RTL model checking against a golden reference model.<n>The methodology is extended with agentic AI-based formal property generation, integrating large language model (LLM)-driven automation with Human-in-the-Loop (HITL) refinement.<n>Results show that direct RTL-to-RTL model checking achieves higher coverage efficiency and requires fewer assertions than standalone verification.
arXiv Detail & Related papers (2025-12-07T14:03:44Z)
BRIDGE: Building Representations In Domain Guided Program Verification [67.36686119518441]
BRIDGE decomposes verification into three interconnected domains: Code, Specifications, and Proofs.<n>We show that this approach substantially improves both accuracy and efficiency beyond standard error feedback methods.
arXiv Detail & Related papers (2025-11-26T06:39:19Z)
RHINO: Guided Reasoning for Mapping Network Logs to Adversarial Tactics and Techniques with Large Language Models [9.065322387043546]
We introduce RHINO, a framework that decomposes Large Language Models into three interpretable phases mirroring human reasoning.<n> RHINO bridges the semantic gap between low-level observations and adversarial intent while improving output reliability through structured reasoning.<n>Our results demonstrate that RHINO significantly enhances the interpretability and scalability of threat analysis, offering a blueprint for deploying LLMs in operational security settings.
arXiv Detail & Related papers (2025-10-16T02:25:46Z)
Do What? Teaching Vision-Language-Action Models to Reject the Impossible [53.40183895299108]
Vision-Language-Action (VLA) models have demonstrated strong performance on a range of robotic tasks.<n>We propose Instruct-Verify-and-Act (IVA), a framework that detects when an instruction cannot be executed due to a false premise.<n>Our experiments show that IVA improves false premise detection accuracy by 97.56% over baselines.
arXiv Detail & Related papers (2025-08-22T10:54:33Z)
The Knowledge-Reasoning Dissociation: Fundamental Limitations of LLMs in Clinical Natural Language Inference [13.59675117792588]
Large language models are often assumed to acquire increasingly structured, generalizable internal representations simply by scaling data and parameters.<n>We interrogate this assumption by introducing a Clinical Trial Natural Language In Attribution benchmark comprising four reasoning families.<n>Each item is paired with a targeted Ground Knowledge and Meta-Level Reasoning Verification probe, allowing us to dissociate failures of factual access from failures of inference.
arXiv Detail & Related papers (2025-08-14T16:01:10Z)
Foundation Models for Logistics: Toward Certifiable, Conversational Planning Interfaces [59.80143393787701]
Large language models (LLMs) can handle uncertainty and promise to accelerate replanning while lowering the barrier to entry.<n>We introduce a neurosymbolic framework that pairs the accessibility of natural-language dialogue with verifiable guarantees on goal interpretation.<n>A lightweight model, fine-tuned on just 100 uncertainty-filtered examples, surpasses the zero-shot performance of GPT-4.1 while cutting inference latency by nearly 50%.
arXiv Detail & Related papers (2025-07-15T14:24:01Z)
Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning [53.25336975467293]
We present the first theoretical error decomposition analysis of methods such as perplexity and self-consistency.<n>Our analysis reveals a fundamental trade-off: perplexity methods suffer from substantial model error due to the absence of a proper consistency function.<n>We propose Reasoning-Pruning Perplexity Consistency (RPC), which integrates perplexity with self-consistency, and Reasoning Pruning, which eliminates low-probability reasoning paths.
arXiv Detail & Related papers (2025-02-01T18:09:49Z)
Improving Retrieval Augmented Language Model with Self-Reasoning [20.715106330314605]
We propose a novel self-reasoning framework aimed at improving the reliability and traceability of RALMs.<n>The framework involves constructing self-reason trajectories with three processes: a relevance-aware process, an evidence-aware selective process, and a trajectory analysis process.<n>We have evaluated our framework across four public datasets to demonstrate the superiority of our method.
arXiv Detail & Related papers (2024-07-29T09:05:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.