Related papers: INPROVF: Leveraging Large Language Models to Repair High-level Robot Controllers from Assumption Violations

INPROVF: Leveraging Large Language Models to Repair High-level Robot Controllers from Assumption Violations

URL: http://arxiv.org/abs/2503.13660v1
Date: Mon, 17 Mar 2025 19:08:36 GMT
Title: INPROVF: Leveraging Large Language Models to Repair High-level Robot Controllers from Assumption Violations
Authors: Qian Meng, Jin Peng Zhou, Kilian Q. Weinberger, Hadas Kress-Gazit,
Abstract summary: INPROVF is an automatic framework that combines large language models (LLMs) and formal methods to speed up the repair process of high-level robot controllers.
Score: 33.6861334936808
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents INPROVF, an automatic framework that combines large language models (LLMs) and formal methods to speed up the repair process of high-level robot controllers. Previous approaches based solely on formal methods are computationally expensive and cannot scale to large state spaces. In contrast, INPROVF uses LLMs to generate repair candidates, and formal methods to verify their correctness. To improve the quality of these candidates, our framework first translates the symbolic representations of the environment and controllers into natural language descriptions. If a candidate fails the verification, INPROVF provides feedback on potential unsafe behaviors or unsatisfied tasks, and iteratively prompts LLMs to generate improved solutions. We demonstrate the effectiveness of INPROVF through 12 violations with various workspaces, tasks, and state space sizes.

Related papers

Automatic High-quality Verilog Assertion Generation through Subtask-Focused Fine-Tuned LLMs and Iterative Prompting [0.0]
We present a large language model (LLM) -based flow to automatically generate high-quality SystemVerilog Assertions (SVA) We introduce a novel sub-task-focused fine-tuning approach, leading to a remarkable 7.3-fold increase in the number of functionally correct assertions. Experiments demonstrate a 26% increase in the number of assertions free from syntax errors using this approach.
arXiv Detail & Related papers (2024-11-23T03:52:32Z)
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [49.362750475706235]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks.<n>We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model.<n> Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
Evolutionary Prompt Design for LLM-Based Post-ASR Error Correction [22.27432554538809]
generative error correction (GEC) has emerged as a promising paradigm that can elevate the performance of modern automatic speech recognition (ASR) systems. It is yet unknown whether the existing prompts are the most effective ones for the task of post-ASR error correction. This paper first explores alternative prompts to identify an initial set of effective prompts, and then proposes to employ an evolutionary prompt optimization algorithm to refine the initial prompts.
arXiv Detail & Related papers (2024-07-23T10:38:49Z)
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [83.90988015005934]
Uncertainty quantification is a key element of machine learning applications.<n>We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines.<n>We conduct a large-scale empirical investigation of UQ and normalization techniques across eleven tasks, identifying the most effective approaches.
arXiv Detail & Related papers (2024-06-21T20:06:31Z)
Aligning Large Language Models with Representation Editing: A Control Perspective [38.71496554018039]
Fine-tuning large language models (LLMs) to align with human objectives is crucial for real-world applications. Test-time alignment techniques, such as prompting and guided decoding, do not modify the underlying model. We propose aligning LLMs through representation editing.
arXiv Detail & Related papers (2024-06-10T01:21:31Z)
Aligning Language Models with Demonstrated Feedback [58.834937450242975]
Demonstration ITerated Task Optimization (DITTO) directly aligns language model outputs to a user's demonstrated behaviors. We evaluate DITTO's ability to learn fine-grained style and task alignment across domains such as news articles, emails, and blog posts.
arXiv Detail & Related papers (2024-06-02T23:13:56Z)
An Empirical Evaluation of Pre-trained Large Language Models for Repairing Declarative Formal Specifications [5.395614997568524]
This paper presents a systematic investigation into the capacity of Large Language Models (LLMs) for repairing declarative specifications in Alloy. We propose a novel repair pipeline that integrates a dual-agent LLM framework, comprising a Repair Agent and a Prompt Agent. Our study reveals that LLMs, particularly GPT-4 variants, outperform existing techniques in terms of repair efficacy, albeit with a marginal increase in runtime and token usage.
arXiv Detail & Related papers (2024-04-17T03:46:38Z)
InferAligner: Inference-Time Alignment for Harmlessness through Cross-Model Guidance [56.184255657175335]
We develop textbfInferAligner, a novel inference-time alignment method that utilizes cross-model guidance for harmlessness alignment. Experimental results show that our method can be very effectively applied to domain-specific models in finance, medicine, and mathematics. It significantly diminishes the Attack Success Rate (ASR) of both harmful instructions and jailbreak attacks, while maintaining almost unchanged performance in downstream tasks.
arXiv Detail & Related papers (2024-01-20T10:41:03Z)
Leveraging Large Language Models for Exploiting ASR Uncertainty [16.740712975166407]
Large language models must either rely on off-the-shelf automatic speech recognition systems for transcription, or be equipped with an in-built speech modality. We tackle speech-intent classification task, where a high word-error-rate can limit the LLM's ability to understand the spoken intent. We propose prompting the LLM with an n-best list of ASR hypotheses instead of only the error-prone 1-best hypothesis.
arXiv Detail & Related papers (2023-09-09T17:02:33Z)
Bridging the Gap Between Training and Inference of Bayesian Controllable Language Models [58.990214815032495]
Large-scale pre-trained language models have achieved great success on natural language generation tasks. BCLMs have been shown to be efficient in controllable language generation. We propose a "Gemini Discriminator" for controllable language generation which alleviates the mismatch problem with a small computational cost.
arXiv Detail & Related papers (2022-06-11T12:52:32Z)
Prompt Tuning for Discriminative Pre-trained Language Models [96.04765512463415]
Recent works have shown promising results of prompt tuning in stimulating pre-trained language models (PLMs) for natural language processing (NLP) tasks. It is still unknown whether and how discriminative PLMs, e.g., ELECTRA, can be effectively prompt-tuned. We present DPT, the first prompt tuning framework for discriminative PLMs, which reformulates NLP tasks into a discriminative language modeling problem.
arXiv Detail & Related papers (2022-05-23T10:11:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.