Verifying User Interfaces using SPARK Ada: A Case Study of the T34 Syringe Driver
- URL: http://arxiv.org/abs/2509.16681v1
- Date: Sat, 20 Sep 2025 13:14:59 GMT
- Title: Verifying User Interfaces using SPARK Ada: A Case Study of the T34 Syringe Driver
- Authors: Peterson Jean,
- Abstract summary: Many human factor risks will not be caught until proper testing in real life.<n>This research uses SPARK Ada's formal verification tool against a behavioural model of the T34 syringe driver.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The increase in safety and critical systems improved Healthcare. Due to their risk of harm, such systems are subject to stringent guidelines and compliances. These safety measures ensure a seamless experience and mitigate the risk to end-users. Institutions like the Food and Drug Administration and the NHS, respectively, established international standards and competency frameworks to ensure industry compliance with these safety concerns. Medical device manufacturing is mainly concerned with standards. Consequently, these standards now advocate for better human factors considered in user interaction for medical devices. This forces manufacturers to rely on heavy testing and review to cover many of these factors during development. Sadly, many human factor risks will not be caught until proper testing in real life, which might be catastrophic in the case of an ambulatory device like the T34 syringe pump. Therefore, effort in formal methods research may propose new solutions in anticipating these errors in the early stages of development or even reducing their occurrence based on the use of standard generic model. These generically developed models will provide a common framework for safety integration in industry and may potentially be proven using formal verification mathematical proofs. This research uses SPARK Ada's formal verification tool against a behavioural model of the T34 syringe driver. A Generic Infusion Pump model refinement is explored and implemented in SPARK Ada. As a subset of the Ada language, the verification level of the end prototype is evaluated using SPARK. Exploring potential limitations defines the proposed model's implementation liability when considering abstraction and components of User Interface design in SPARK Ada.
Related papers
- Steering Externalities: Benign Activation Steering Unintentionally Increases Jailbreak Risk for Large Language Models [62.16655896700062]
Activation steering is a technique to enhance the utility of Large Language Models (LLMs)<n>We show that it unintentionally introduces critical and under-explored safety risks.<n>Experiments reveal that these interventions act as a force multiplier, creating new vulnerabilities to jailbreaks and increasing attack success rates to over 80% on standard benchmarks.
arXiv Detail & Related papers (2026-02-03T12:32:35Z) - Health-ORSC-Bench: A Benchmark for Measuring Over-Refusal and Safety Completion in Health Context [82.32380418146656]
Health-ORSC-Bench is the first large-scale benchmark designed to measure textbfOver-Refusal and textbfSafe Completion quality in healthcare.<n>Our framework uses an automated pipeline with human validation to test models at varying levels of intent ambiguity.<n>Health-ORSC-Bench provides a rigorous standard for calibrating the next generation of medical AI assistants.
arXiv Detail & Related papers (2026-01-25T01:28:52Z) - The Forgotten Shield: Safety Grafting in Parameter-Space for Medical MLLMs [23.79442915729949]
Medical Multimodal Large Language Models (Medical MLLMs) have achieved remarkable progress in specialized medical tasks.<n>However, research into their safety has lagged, posing potential risks for real-world deployment.<n>We first establish a multidimensional evaluation framework to systematically benchmark the safety of current SOTA Medical MLLMs.
arXiv Detail & Related papers (2025-12-05T06:52:06Z) - Building a Foundational Guardrail for General Agentic Systems via Synthetic Data [76.18834864749606]
LLM agents can plan multi-step tasks, intervening at the planning stage-before any action is executed-is often the safest way to prevent harm.<n>Existing guardrails mostly operate post-execution, which is difficult to scale and leaves little room for controllable supervision at the plan level.<n>We introduce AuraGen, a controllable engine that synthesizes benign trajectories, injects category-labeled risks with difficulty, and filters outputs via an automated reward model.
arXiv Detail & Related papers (2025-10-10T18:42:32Z) - D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models [62.83226685925107]
Deceptive Reasoning Exposure Suite (D-REX) is a novel dataset designed to evaluate the discrepancy between a model's internal reasoning process and its final output.<n>Each sample in D-REX contains the adversarial system prompt, an end-user's test query, the model's seemingly innocuous response, and, crucially, the model's internal chain-of-thought.<n>We demonstrate that D-REX presents a significant challenge for existing models and safety mechanisms.
arXiv Detail & Related papers (2025-09-22T15:59:40Z) - Conformal Object Detection by Sequential Risk Control [3.3274747298291216]
We develop a post-hoc predictive uncertainty quantification procedure with statistical guarantees that are valid for any dataset size.<n>First, we formally define the problem of Conformal Object Detection (COD)<n>We introduce a novel method, Sequential Conformal Risk Control (SeqCRC), that extends the statistical guarantees of Conformal Risk Control to two sequential tasks.<n>We present old and new loss functions and prediction sets suited to applying SeqCRC to different cases and certification requirements.
arXiv Detail & Related papers (2025-05-29T22:19:01Z) - What Makes and Breaks Safety Fine-tuning? A Mechanistic Study [64.9691741899956]
Safety fine-tuning helps align Large Language Models (LLMs) with human preferences for their safe deployment.
We design a synthetic data generation framework that captures salient aspects of an unsafe input.
Using this, we investigate three well-known safety fine-tuning methods.
arXiv Detail & Related papers (2024-07-14T16:12:57Z) - The Need for Guardrails with Large Language Models in Medical Safety-Critical Settings: An Artificial Intelligence Application in the Pharmacovigilance Ecosystem [0.6965384453064829]
Large language models (LLMs) are useful tools with the capacity for performing specific types of knowledge work at an effective scale.
However, deployments in high-risk and safety-critical domains pose unique challenges, notably the issue of hallucinations.
This is particularly concerning in settings such as drug safety, where inaccuracies could lead to patient harm.
We have developed and demonstrated a proof of concept suite of guardrails specifically designed to mitigate certain types of hallucinations and errors for drug safety.
arXiv Detail & Related papers (2024-07-01T19:52:41Z) - Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations [76.19419888353586]
Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations.
We present our efforts to create and deploy a library of detectors: compact and easy-to-build classification models that provide labels for various harms.
arXiv Detail & Related papers (2024-03-09T21:07:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.