Related papers: Parallel and Multi-Objective Falsification with Scenic and VerifAI

Parallel and Multi-Objective Falsification with Scenic and VerifAI

URL: http://arxiv.org/abs/2107.04164v1
Date: Fri, 9 Jul 2021 01:08:49 GMT
Title: Parallel and Multi-Objective Falsification with Scenic and VerifAI
Authors: Kesav Viswanadha, Edward Kim, Francis Indaheng, Daniel J. Fremont, Sanjit A. Seshia
Abstract summary: We present extensions to the Scenic scenario specification language and VerifAI toolkit. We first present a parallelized framework that is interfaced with both the simulation and sampling capabilities of Scenic. We then present an extension of VerifAI's falsification algorithms to support multi-objective optimization during sampling.
Score: 11.152087017964584
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Falsification has emerged as an important tool for simulation-based verification of autonomous systems. In this paper, we present extensions to the Scenic scenario specification language and VerifAI toolkit that improve the scalability of sampling-based falsification methods by using parallelism and extend falsification to multi-objective specifications. We first present a parallelized framework that is interfaced with both the simulation and sampling capabilities of Scenic and the falsification capabilities of VerifAI, reducing the execution time bottleneck inherently present in simulation-based testing. We then present an extension of VerifAI's falsification algorithms to support multi-objective optimization during sampling, using the concept of rulebooks to specify a preference ordering over multiple metrics that can be used to guide the counterexample search process. Lastly, we evaluate the benefits of these extensions with a comprehensive set of benchmarks written in the Scenic language.

Related papers

Speculative Decoding for Multi-Sample Inference [21.64693536216534]
We propose a novel speculative decoding method tailored for multi-sample reasoning scenarios. Our method exploits the intrinsic consensus of parallel generation paths to synthesize high-quality draft tokens.
arXiv Detail & Related papers (2025-03-07T11:15:36Z)
Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing [35.686125031177234]
Multi-Document Summarization (MDS) is a challenging task that focuses on extracting and synthesizing useful information from multiple lengthy documents. We propose a novel framework that leverages inference-time scaling for this task. We also introduce two new evaluation metrics: Consistency-Aware Preference (CAP) score and LLM Atom-Content-Unit (ACU) score.
arXiv Detail & Related papers (2025-02-27T23:34:47Z)
Scoring Verifiers: Evaluating Synthetic Verification for Code and Reasoning [59.25951947621526]
We propose an approach which can transform existing coding benchmarks into scoring and ranking datasets to evaluate the effectiveness of synthetic verifiers. We release four new benchmarks (HE-R, HE-R+, MBPP-R, and MBPP-R+), and analyzed synthetic verification methods with standard, reasoning-based, and reward-based LLMs. Our experiments show that reasoning can significantly improve test case generation and that scaling the number of test cases enhances the verification accuracy.
arXiv Detail & Related papers (2025-02-19T15:32:11Z)
Adaptive Few-shot Prompting for Machine Translation with Pre-trained Language Models [25.88443566366613]
Large language models (LLMs) with in-context learning have demonstrated remarkable potential in handling neural machine translation. Existing evidence shows that LLMs are prompt-sensitive and it is sub-optimal to apply the fixed prompt to any input for downstream machine translation tasks. We propose an adaptive few-shot prompting framework to automatically select suitable translation demonstrations for various source input sentences.
arXiv Detail & Related papers (2025-01-03T07:47:59Z)
Quasi-random Multi-Sample Inference for Large Language Models [1.647759094903376]
Large language models (LLMs) are often equipped with multi-sample decoding strategies. Traditional text generation methods, such as beam search and sampling-based techniques, have notable limitations. This study explores the potential of arithmetic sampling, contrasting it with ancestral sampling.
arXiv Detail & Related papers (2024-11-09T18:55:04Z)
Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications. The quality of these exemplars in the prompt greatly impacts performance. Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z)
Bisimulation Learning [55.859538562698496]
We compute finite bisimulations of state transition systems with large, possibly infinite state space. Our technique yields faster verification results than alternative state-of-the-art tools in practice.
arXiv Detail & Related papers (2024-05-24T17:11:27Z)
Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass. In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z)
ParaICL: Towards Robust Parallel In-Context Learning [74.38022919598443]
Large language models (LLMs) have become the norm in natural language processing. Few-shot in-context learning (ICL) relies on the choice of few-shot demonstration examples. We propose a novel method named parallel in-context learning (ParaICL)
arXiv Detail & Related papers (2024-03-31T05:56:15Z)
Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions. We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training. As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z)
Automatic Program Instrumentation for Automatic Verification (Extended Technical Report) [0.0]
In deductive verification and software model checking, dealing with certain specification language constructs can be problematic. We propose instrumentation as a unifying verification paradigm that subsumes various existing ad-hoc approaches. We implement our approach in the MonoCera tool, which is tailored to the verification of programs with aggregation.
arXiv Detail & Related papers (2023-05-26T14:55:35Z)
Near-optimal Policy Identification in Active Reinforcement Learning [84.27592560211909]
AE-LSVI is a novel variant of the kernelized least-squares value RL (LSVI) algorithm that combines optimism with pessimism for active exploration. We show that AE-LSVI outperforms other algorithms in a variety of environments when robustness to the initial state is required.
arXiv Detail & Related papers (2022-12-19T14:46:57Z)
Efficiently Controlling Multiple Risks with Pareto Testing [34.83506056862348]
We propose a two-stage process which combines multi-objective optimization with multiple hypothesis testing. We demonstrate the effectiveness of our approach to reliably accelerate the execution of large-scale Transformer models in natural language processing (NLP) applications.
arXiv Detail & Related papers (2022-10-14T15:54:39Z)
Falsification of Cyber-Physical Systems using Bayesian Optimization [0.5407319151576264]
Simulation-based falsification of CPSs is a practical testing method that can be used to raise confidence in the correctness of the system. As each simulation is typically computationally intensive, an important step is to reduce the number of simulations needed to falsify a specification. We study Bayesian optimization (BO), a sample-efficient method that learns a surrogate model that describes the relationship between the parametrization of possible input signals and the evaluation of the specification.
arXiv Detail & Related papers (2022-09-14T15:52:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.