Related papers: Scaling up Memory-Efficient Formal Verification Tools for Tree Ensembles

Scaling up Memory-Efficient Formal Verification Tools for Tree Ensembles

URL: http://arxiv.org/abs/2105.02595v1
Date: Thu, 6 May 2021 11:50:22 GMT
Title: Scaling up Memory-Efficient Formal Verification Tools for Tree Ensembles
Authors: John T\"ornblom and Simin Nadjm-Tehrani
Abstract summary: We formalise and extend the VoTE algorithm presented earlier as a tool description. We show how the separation of property checking from the core verification engine enables verification of versatile requirements. We demonstrate the application of the tool in two case studies, namely digit recognition and aircraft collision avoidance.
Score: 2.588973722689844
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: To guarantee that machine learning models yield outputs that are not only accurate, but also robust, recent works propose formally verifying robustness properties of machine learning models. To be applicable to realistic safety-critical systems, the used verification algorithms need to manage the combinatorial explosion resulting from vast variations in the input domain, and be able to verify correctness properties derived from versatile and domain-specific requirements. In this paper, we formalise the VoTE algorithm presented earlier as a tool description, and extend the tool set with mechanisms for systematic scalability studies. In particular, we show a) how the separation of property checking from the core verification engine enables verification of versatile requirements, b) the scalability of the tool, both in terms of time taken for verification and use of memory, and c) that the algorithm has attractive properties that lend themselves well for massive parallelisation. We demonstrate the application of the tool in two case studies, namely digit recognition and aircraft collision avoidance, where the first case study serves to assess the resource utilisation of the tool, and the second to assess the ability to verify versatile correctness properties.

Related papers

Beyond Accuracy: A Cognitive Load Framework for Mapping the Capability Boundaries of Tool-use Agents [11.65679508751598]
We introduce a framework grounded in Cognitive Load Theory to move from simple performance scoring to a diagnostic tool.<n>Our framework deconstructs task complexity into two quantifiable components: Intrinsic Load and Extraneous Load.<n>Our evaluation reveals distinct performance cliffs as cognitive load increases, allowing us to precisely map each model's capability boundary.
arXiv Detail & Related papers (2026-01-28T09:17:51Z)
The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents [24.482362292984817]
Large language models (LLMs) are rapidly evolving to handle multi-turn tasks.<n> Ensuring their trustworthiness remains a critical challenge.<n> calibration refers to an agent's ability to express confidence that reliably reflects its actual performance.
arXiv Detail & Related papers (2026-01-12T07:10:35Z)
CoSineVerifier: Tool-Augmented Answer Verification for Computation-Oriented Scientific Questions [32.14674040685995]
We introduce model, a tool-augmented verifier that leverages external rubrics to perform precise computations and symbolic simplifications.<n>Experiments conducted on STEM subjects, general QA, and long-form reasoning tasks demonstrates strong generalization of model.
arXiv Detail & Related papers (2025-12-01T03:08:43Z)
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments [70.42705564227548]
We propose an automated environment construction pipeline for large language models (LLMs)<n>This enables the creation of high-quality training environments that provide detailed and measurable feedback without relying on external tools.<n>We also introduce a verifiable reward mechanism that evaluates both the precision of tool use and the completeness of task execution.
arXiv Detail & Related papers (2025-08-12T09:45:19Z)
T^2Agent A Tool-augmented Multimodal Misinformation Detection Agent with Monte Carlo Tree Search [51.91311158085973]
multimodal misinformation often arises from mixed forgery sources, requiring dynamic reasoning and adaptive verification.<n>We propose T2Agent, a novel misinformation detection agent that incorporates a toolkit with Monte Carlo Tree Search.<n>Extensive experiments show that T2Agent consistently outperforms existing baselines on challenging mixed-source multimodal misinformation benchmarks.
arXiv Detail & Related papers (2025-05-26T09:50:55Z)
T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models [9.674458633565111]
We investigate whether small language models (sLMs) can reliably self-verify their outputs under test-time scaling. We propose Tool-integrated self-verification (T1), which delegates-heavy verification steps to external tools, such as a code interpreter. Our theoretical analysis shows that tool integration reduces memorization demands and improves test-time scaling performance.
arXiv Detail & Related papers (2025-04-07T04:01:17Z)
Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger [49.81945268343162]
We propose MeCo, an adaptive decision-making strategy for external tool use. MeCo captures high-level cognitive signals in the representation space, guiding when to invoke tools. Our experiments show that MeCo accurately detects LLMs' internal cognitive signals and significantly improves tool-use decision-making.
arXiv Detail & Related papers (2025-02-18T15:45:01Z)
A Hybrid Framework for Statistical Feature Selection and Image-Based Noise-Defect Detection [55.2480439325792]
This paper presents a hybrid framework that integrates both statistical feature selection and classification techniques to improve defect detection accuracy. We present around 55 distinguished features that are extracted from industrial images, which are then analyzed using statistical methods. By integrating these methods with flexible machine learning applications, the proposed framework improves detection accuracy and reduces false positives and misclassifications.
arXiv Detail & Related papers (2024-12-11T22:12:21Z)
Quantitative Assurance and Synthesis of Controllers from Activity Diagrams [4.419843514606336]
Probabilistic model checking is a widely used formal verification technique to automatically verify qualitative and quantitative properties. This makes it not accessible for researchers and engineers who may not have the required knowledge. We propose a comprehensive verification framework for ADs, including a new profile for probability time, quality annotations, a semantics interpretation of ADs in three Markov models, and a set of transformation rules from activity diagrams to the PRISM language. Most importantly, we developed algorithms for transformation and implemented them in a tool, called QASCAD, using model-based techniques, for fully automated verification.
arXiv Detail & Related papers (2024-02-29T22:40:39Z)
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking [53.66999416757543]
We study how fine-tuning affects the internal mechanisms implemented in language models. Fine-tuning enhances, rather than alters, the mechanistic operation of the model.
arXiv Detail & Related papers (2024-02-22T18:59:24Z)
Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores. We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z)
A Correct-and-Certify Approach to Self-Supervise Object Pose Estimators via Ensemble Self-Training [26.47895284071508]
Real-world robotics applications demand object pose estimation methods that work reliably across a variety of scenarios. Our first contribution is to develop a robust corrector module that corrects pose estimates using depth information. Our second contribution is an ensemble self-training approach that simultaneously trains multiple pose estimators in a self-supervised manner.
arXiv Detail & Related papers (2023-02-12T23:02:03Z)
Specifying and Testing $k$-Safety Properties for Machine-Learning Models [20.24045879238586]
We take inspiration from specifications used in formal methods, expressing $k$ different executions, so-called $k$-safety properties. Here, we show the wide applicability of $k$-safety properties for machine-learning models and present the first specification language for expressing them. Our framework is effective in identifying property violations, and that detected bugs could be used to train better models.
arXiv Detail & Related papers (2022-06-13T11:35:55Z)
Information-Theoretic Odometry Learning [83.36195426897768]
We propose a unified information theoretic framework for learning-motivated methods aimed at odometry estimation. The proposed framework provides an elegant tool for performance evaluation and understanding in information-theoretic language.
arXiv Detail & Related papers (2022-03-11T02:37:35Z)
Fantastic Features and Where to Find Them: Detecting Cognitive Impairment with a Subsequence Classification Guided Approach [6.063165888023164]
We describe a new approach to feature engineering that leverages sequential machine learning models and domain knowledge to predict which features help enhance performance. We demonstrate that CI classification accuracy improves by 2.3% over a strong baseline when using features produced by this method.
arXiv Detail & Related papers (2020-10-13T17:57:18Z)
Verification of ML Systems via Reparameterization [6.482926592121413]
We show how a probabilistic program can be automatically represented in a theorem prover. We also prove that the null model used in a Bayesian hypothesis test satisfies a fairness criterion called demographic parity.
arXiv Detail & Related papers (2020-07-14T02:19:25Z)
A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference. Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
Generating Fact Checking Explanations [52.879658637466605]
A crucial piece of the puzzle that is still missing is to understand how to automate the most elaborate part of the process. This paper provides the first study of how these explanations can be generated automatically based on available claim context. Our results indicate that optimising both objectives at the same time, rather than training them separately, improves the performance of a fact checking system.
arXiv Detail & Related papers (2020-04-13T05:23:25Z)
Adaptive Object Detection with Dual Multi-Label Prediction [78.69064917947624]
We propose a novel end-to-end unsupervised deep domain adaptation model for adaptive object detection. The model exploits multi-label prediction to reveal the object category information in each image. We introduce a prediction consistency regularization mechanism to assist object detection.
arXiv Detail & Related papers (2020-03-29T04:23:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.