Related papers: BOOST: Bootstrapping Strategy-Driven Reasoning Programs for Program-Guided Fact-Checking

BOOST: Bootstrapping Strategy-Driven Reasoning Programs for Program-Guided Fact-Checking

URL: http://arxiv.org/abs/2504.02467v3
Date: Fri, 01 Aug 2025 06:31:39 GMT
Title: BOOST: Bootstrapping Strategy-Driven Reasoning Programs for Program-Guided Fact-Checking
Authors: Qisheng Hu, Quanyu Long, Wenya Wang,
Abstract summary: BOOST is a bootstrapping approach for automated few-shot reasoning program generation.<n>It iteratively refines explicit, data-driven guidelines as meta-rules for guiding demonstration creation.<n>It enables a seamless transition from zero-shot to few-shot program-guided learning, enhancing interpretability and effectiveness.
Score: 16.655011153015202
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large language model pipelines have improved automated fact-checking for complex claims, yet many approaches rely on few-shot in-context learning with demonstrations that require substantial human effort and domain expertise. Among these, program-guided reasoning, by decomposing claims into function calls and executing reasoning programs, which has shown particular promise, but remains limited by the need for manually crafted demonstrations. Fundamentally, the underlying principles of effective reasoning program generation still remain underexplored. In this work, we introduce BOOST, a bootstrapping approach for automated few-shot reasoning program generation. BOOST iteratively refines explicit, data-driven guidelines as meta-rules for guiding demonstration creation, using a critique-refine loop that eliminates the need for human intervention. This enables a seamless transition from zero-shot to few-shot program-guided learning, enhancing interpretability and effectiveness. Experimental results show that BOOST outperforms prior few-shot baselines in both zero-shot and few-shot settings for complex claim verification.

Related papers

ConciseHint: Boosting Efficient Reasoning via Continuous Concise Hints during Generation [53.149817480019834]
Recent advancements in large reasoning models (LRMs) have achieved notable performance enhancements on complex reasoning tasks by scaling up the generation length by Chain-of-Thought (CoT)<n>We propose a framework dubbed ConciseHint, which continuously encourages the reasoning model to speak concisely by injecting the textual hint during the token generation of the reasoning process.<n>Experiments on the state-of-the-art LRMs, including DeepSeek-R1 and Qwen-3 series, demonstrate that our method can effectively produce concise reasoning processes while maintaining performance well.
arXiv Detail & Related papers (2025-06-23T16:20:44Z)
Code Execution as Grounded Supervision for LLM Reasoning [36.97199200274124]
Training large language models with chain-of-thought (CoT) supervision has proven effective for enhancing their reasoning abilities.<n>We propose a scalable method for generating a high-quality CoT supervision dataset by leveraging the determinism of program execution.<n>Our approach extracts verifiable, step-by-step reasoning traces from code execution and transforms them into a natural language CoT reasoning.
arXiv Detail & Related papers (2025-06-12T04:36:57Z)
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models [54.04678363287392]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks.<n>Recent advancements in OpenAI o1 and DeepSeek-R1 have further improved performance in System-2 reasoning domains.
arXiv Detail & Related papers (2025-03-20T17:59:38Z)
The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models [69.798277882245]
We introduce Unsupervised Prefix Fine-Tuning (UPFT) to enhance large language models' reasoning efficiency. UPFT removes the need for labeled data or exhaustive sampling. Experiments show that UPFT matches the performance of supervised methods.
arXiv Detail & Related papers (2025-03-04T18:56:03Z)
Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights [49.42133807824413]
We examine the reasoning and planning capabilities of large language models (LLMs) in solving complex tasks. Recent advances in inference-time techniques demonstrate the potential to enhance LLM reasoning without additional training. OpenAI's o1 model shows promising performance through its novel use of multi-step reasoning and verification.
arXiv Detail & Related papers (2025-02-18T04:11:29Z)
BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning [78.63421517563056]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks. We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model. We introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps.
arXiv Detail & Related papers (2025-01-31T02:39:07Z)
Reasoning-Oriented and Analogy-Based Methods for Locating and Editing in Zero-Shot Event-Relational Reasoning [1.0373115083302502]
We propose Reasoning-Oriented Locating and Editing (ROLE) and Analogy-Based Locating and Editing (ABLE) ROLE locates and edits the key modules of the language model for reasoning about event relations, enhancing interpretability and also resource-efficiently optimizing the reasoning ability. ABLE exploits the similarities and differences between tasks to optimize the zero-shot reasoning capability.
arXiv Detail & Related papers (2025-01-01T11:02:08Z)
Think Beyond Size: Adaptive Prompting for More Effective Reasoning [0.0]
We introduce Adaptive Prompting, a dynamic and iterative framework designed to enhance reasoning by incorporating real-time adjustments to prompt structures and validation mechanisms.<n>Results demonstrate that Adaptive Prompting significantly improves performance on diverse reasoning benchmarks, including arithmetic reasoning (GSM8K, MultiArithm), logical reasoning and commonsense tasks.<n>Our approach enables smaller models to achieve competitive performance with larger counterparts, such as GPT-4, while maintaining computational efficiency.
arXiv Detail & Related papers (2024-10-10T17:14:36Z)
SEED: Accelerating Reasoning Tree Construction via Scheduled Speculative Decoding [16.380389806465733]
Large Language Models (LLMs) demonstrate remarkable emergent abilities across various tasks, yet fall short of complex reasoning and planning tasks.<n>This paper introduces SeeD, a novel and efficient inference framework to optimize runtime speed and GPU memory management concurrently.
arXiv Detail & Related papers (2024-06-26T09:33:41Z)
Learning From Correctness Without Prompting Makes LLM Efficient Reasoner [30.203952806009717]
Large language models (LLMs) have demonstrated outstanding performance across various tasks, yet they still exhibit limitations such as hallucination, unfaithful reasoning, and toxic content. We introduce an intrinsic self-correct reasoning framework for LLMs that eliminates the need for human feedback, external tools, and handcraft prompts.
arXiv Detail & Related papers (2024-03-28T02:12:49Z)
Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing [61.98556945939045]
We propose a framework to learn planning-based reasoning through Direct Preference Optimization (DPO) on collected trajectories. Our results on challenging logical reasoning benchmarks demonstrate the effectiveness of our learning framework.
arXiv Detail & Related papers (2024-02-01T15:18:33Z)
Provable Representation with Efficient Planning for Partial Observable Reinforcement Learning [74.67655210734338]
In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption. We develop a representation-based perspective that leads to a coherent framework and tractable algorithmic approach for practical reinforcement learning from partial observations. We empirically demonstrate the proposed algorithm can surpass state-of-the-art performance with partial observations across various benchmarks.
arXiv Detail & Related papers (2023-11-20T23:56:58Z)
Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs [58.94569213396991]
We propose a hierarchical programmatic reinforcement learning framework to produce program policies. By learning to compose programs, our proposed framework can produce program policies that describe out-of-distributionally complex behaviors. The experimental results in the Karel domain show that our proposed framework outperforms baselines.
arXiv Detail & Related papers (2023-01-30T14:50:46Z)
Rationale-Augmented Ensembles in Language Models [53.45015291520658]
We reconsider rationale-augmented prompting for few-shot in-context learning. We identify rationale sampling in the output space as the key component to robustly improve performance. We demonstrate that rationale-augmented ensembles achieve more accurate and interpretable results than existing prompting approaches.
arXiv Detail & Related papers (2022-07-02T06:20:57Z)
Learning to Synthesize Programs as Interpretable and Generalizable Policies [25.258598215642067]
We present a framework that learns to synthesize a program, which details the procedure to solve a task in a flexible and expressive manner. Experimental results demonstrate that the proposed framework not only learns to reliably synthesize task-solving programs but also outperforms DRL and program synthesis baselines.
arXiv Detail & Related papers (2021-08-31T07:03:06Z)
Learning from Executions for Semantic Parsing [86.94309120789396]
We focus on the task of semi-supervised learning where a limited amount of annotated data is available. We propose to encourage executable programs for unlabeled utterances.
arXiv Detail & Related papers (2021-04-12T21:07:53Z)
Program Enhanced Fact Verification with Verbalization and Graph Attention Network [25.33739187395408]
We present a Program-enhanced Verbalization and Graph Attention Network (ProgVGAT) to integrate programs and execution into textual inference models. We construct the graph attention verification networks, which are designed to fuse different sources of evidences from verbalized program execution, program structures, and the original statements and tables. Experimental results show that the proposed framework achieves the new state-of-the-art performance, a 74.4% accuracy, on the benchmark dataset TABFACT.
arXiv Detail & Related papers (2020-10-06T23:29:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.