FastFI: Enhancing API Call-Site Robustness in Microservice-Based Systems with Fault Injection
- URL: http://arxiv.org/abs/2601.14800v1
- Date: Wed, 21 Jan 2026 09:24:54 GMT
- Title: FastFI: Enhancing API Call-Site Robustness in Microservice-Based Systems with Fault Injection
- Authors: Yuzhen Tan, Jian Wang, Shuaiyu Xie, Bing Li, Yunqing Yong, Neng Zhang, Shaolin Tan,
- Abstract summary: Fault injection is a key technique for assessing software reliability, enabling proactive detection of system defects.<n>FastFI is a fault-injection-guided framework to enhance the robustness of API call sites in microservice-based systems.<n>FastFI reduces end-to-end fault-injection time by an average of 76.12% compared to state-of-the-art baselines.
- Score: 8.84679612879872
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fault injection is a key technique for assessing software reliability, enabling proactive detection of system defects before they manifest in production. However, the increasing complexity of microservice architectures leads to exponential growth in the fault-injection space, rendering traditional random injection inefficient. Recent lineage-driven approaches mitigate this problem through heuristic pruning, but they face two limitations. First, combinatorial-fault discovery remains bottlenecked by general-purpose SAT solvers, which fail to exploit the monotone and low-overlap structure of derived CNF formulas and typically rely on a static upper bound on fault size. Second, existing techniques provide limited post-injection guidance beyond reporting detected faults. To address these challenges, we propose FastFI, a fault-injection-guided framework to enhance the robustness of API call sites in microservice-based systems. FastFI features a DFS-based solver with dynamic fault injection to discover all valid combinatorial faults, and it leverages fault-injection results to identify critical APIs whose call sites should be hardened for robustness. Experiments on four representative microservice benchmarks show that FastFI reduces end-to-end fault-injection time by an average of 76.12\% compared to state-of-the-art baselines while maintaining acceptable resource overhead. Moreover, FastFI accurately identifies high-impact APIs and provides actionable guidance for call-site hardening.
Related papers
- ProbeLLM: Automating Principled Diagnosis of LLM Failures [89.44131968886184]
We propose ProbeLLM, a benchmark-agnostic automated probing framework that elevates weakness discovery from individual failures to structured failure modes.<n>By restricting probing to verifiable test cases and leveraging tool-augmented generation and verification, ProbeLLM grounds failure discovery in reliable evidence.
arXiv Detail & Related papers (2026-02-13T14:33:13Z) - Why Does the LLM Stop Computing: An Empirical Study of User-Reported Failures in Open-Source LLMs [50.075587392477935]
We conduct the first large-scale empirical study of 705 real-world failures from the open-source DeepSeek, Llama, and Qwen ecosystems.<n>Our analysis reveals a paradigm shift: white-box orchestration relocates the reliability bottleneck from model algorithmic defects to the systemic fragility of the deployment stack.
arXiv Detail & Related papers (2026-01-20T06:42:56Z) - Hypothesize-Then-Verify: Speculative Root Cause Analysis for Microservices with Pathwise Parallelism [19.31110304702373]
SpecRCA is a speculative root cause analysis framework that adopts a textithypothesize-then-verify paradigm.<n>Preliminary experiments on the AIOps 2022 demonstrate that SpecRCA achieves superior accuracy and efficiency compared to existing approaches.
arXiv Detail & Related papers (2026-01-06T05:58:25Z) - HiFiNet: Hierarchical Fault Identification in Wireless Sensor Networks via Edge-Based Classification and Graph Aggregation [11.108171977551619]
HiFiNet is a hierarchical fault identification framework for wireless networks.<n>It produces more accurate predictions by capturing both local temporal patterns and network-wide spatial dependencies.<n>It significantly outperforms existing methods in accuracy, F1-score, and precision.
arXiv Detail & Related papers (2025-11-06T16:15:19Z) - Source-Free Object Detection with Detection Transformer [59.33653163035064]
Source-Free Object Detection (SFOD) enables knowledge transfer from a source domain to an unsupervised target domain for object detection without access to source data.<n>Most existing SFOD approaches are either confined to conventional object detection (OD) models like Faster R-CNN or designed as general solutions without tailored adaptations for novel OD architectures, especially Detection Transformer (DETR)<n>In this paper, we introduce Feature Reweighting ANd Contrastive Learning NetworK (FRANCK), a novel SFOD framework specifically designed to perform query-centric feature enhancement for DETRs.
arXiv Detail & Related papers (2025-10-13T07:35:04Z) - Revisiting Vulnerability Patch Localization: An Empirical Study and LLM-Based Solution [44.388332647211776]
Open-source software vulnerability patch detection is a critical component for maintaining software security and ensuring software supply chain integrity.<n>Traditional detection methods face significant scalability challenges when processing large volumes of commit histories.<n>We propose a novel two-stage framework that combines version-driven candidate filtering with large language model-based multi-round dialogue voting.
arXiv Detail & Related papers (2025-09-19T09:09:55Z) - AI Agentic Vulnerability Injection And Transformation with Optimized Reasoning [2.918225266151982]
We present AVIATOR, the first AI-agentic vulnerability injection workflow.<n>It automatically injects realistic, category-specific vulnerabilities for high-fidelity, diverse, large-scale vulnerability dataset generation.<n>It combines semantic analysis, injection synthesis enhanced with LoRA-based fine-tuning and Retrieval-Augmented Generation, as well as post-injection validation via static analysis and LLM-based discriminators.
arXiv Detail & Related papers (2025-08-28T14:59:39Z) - Particle swarm optimization for online sparse streaming feature selection under uncertainty [2.03725086642376]
In real-world applications involving high-dimensional streaming data, online streaming feature selection (OSFS) is widely adopted.<n>This work proposes POS2FS-an uncertainty-aware online sparse streaming feature selection framework enhanced by particle swarm optimization (PSO)<n>The approach introduces: 1) PSO-driven supervision to reduce uncertainty in feature-label relationships; 2) Three-way decision theory to manage feature fuzziness in supervised learning.
arXiv Detail & Related papers (2025-08-24T07:56:41Z) - Towards Resource-Efficient Federated Learning in Industrial IoT for Multivariate Time Series Analysis [50.18156030818883]
Anomaly and missing data constitute a thorny problem in industrial applications.
Deep learning enabled anomaly detection has emerged as a critical direction.
The data collected in edge devices contain user privacy.
arXiv Detail & Related papers (2024-11-06T15:38:31Z) - Demystifying and Extracting Fault-indicating Information from Logs for Failure Diagnosis [29.800380941293277]
Engineers prioritize two categories of log information for diagnosis: fault-indicating descriptions and fault-indicating parameters.
We propose an approach to automatically extract faultindicating information from logs for fault diagnosis, named LoFI.
LoFI outperforms all baseline methods by a significant margin, achieving an absolute improvement of 25.837.9 in F1 over the best baseline method, ChatGPT.
arXiv Detail & Related papers (2024-09-20T15:00:47Z) - FedRFQ: Prototype-Based Federated Learning with Reduced Redundancy,
Minimal Failure, and Enhanced Quality [41.88338945821504]
FedRFQ is a prototype-based federated learning approach that aims to reduce redundancy, minimize failures, and improve underlinequality.
We introduce the BFT-detect, a BFT (Byzantine Fault Tolerance) detectable aggregation algorithm, to ensure the security of FedRFQ against poisoning attacks and server malfunctions.
arXiv Detail & Related papers (2024-01-15T09:50:27Z) - Fast and Accurate Error Simulation for CNNs against Soft Errors [64.54260986994163]
We present a framework for the reliability analysis of Conal Neural Networks (CNNs) via an error simulation engine.
These error models are defined based on the corruption patterns of the output of the CNN operators induced by faults.
We show that our methodology achieves about 99% accuracy of the fault effects w.r.t. SASSIFI, and a speedup ranging from 44x up to 63x w.r.t.FI, that only implements a limited set of error models.
arXiv Detail & Related papers (2022-06-04T19:45:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.