Related papers: Threshy: Supporting Safe Usage of Intelligent Web Services

Threshy: Supporting Safe Usage of Intelligent Web Services

URL: http://arxiv.org/abs/2008.08252v1
Date: Wed, 19 Aug 2020 04:02:45 GMT
Title: Threshy: Supporting Safe Usage of Intelligent Web Services
Authors: Alex Cummaudo, Scott Barnett, Rajesh Vasa and John Grundy
Abstract summary: Threshy is a tool to help developers select a decision threshold suited to their problem domain. Unlike existing tools, Threshy is designed for pre-development, pre-release, and support.
Score: 4.346610687701405
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Increased popularity of `intelligent' web services provides end-users with machine-learnt functionality at little effort to developers. However, these services require a decision threshold to be set which is dependent on problem-specific data. Developers lack a systematic approach for evaluating intelligent services and existing evaluation tools are predominantly targeted at data scientists for pre-development evaluation. This paper presents a workflow and supporting tool, Threshy, to help software developers select a decision threshold suited to their problem domain. Unlike existing tools, Threshy is designed to operate in multiple workflows including pre-development, pre-release, and support. Threshy is designed for tuning the confidence scores returned by intelligent web services and does not deal with hyper-parameter optimisation used in ML models. Additionally, it considers the financial impacts of false positives. Threshold configuration files exported by Threshy can be integrated into client applications and monitoring infrastructure. Demo: https://bit.ly/2YKeYhE.

Related papers

Offline Model-Based Optimization: Comprehensive Review [61.91350077539443]
offline optimization is a fundamental challenge in science and engineering, where the goal is to optimize black-box functions using only offline datasets. Recent advances in model-based optimization have harnessed the generalization capabilities of deep neural networks to develop offline-specific surrogate and generative models. Despite its growing impact in accelerating scientific discovery, the field lacks a comprehensive review.
arXiv Detail & Related papers (2025-03-21T16:35:02Z)
Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger [49.81945268343162]
We propose MeCo, an adaptive decision-making strategy for external tool use. MeCo captures high-level cognitive signals in the representation space, guiding when to invoke tools. Our experiments show that MeCo accurately detects LLMs' internal cognitive signals and significantly improves tool-use decision-making.
arXiv Detail & Related papers (2025-02-18T15:45:01Z)
SMART: Self-Aware Agent for Tool Overuse Mitigation [58.748554080273585]
Current Large Language Model (LLM) agents demonstrate strong reasoning and tool use capabilities, but often lack self-awareness. This imbalance leads to Tool Overuse, where models unnecessarily rely on external tools for tasks with parametric knowledge. We introduce SMART (Strategic Model-Aware Reasoning with Tools), a paradigm that enhances an agent's self-awareness to optimize task handling and reduce tool overuse.
arXiv Detail & Related papers (2025-02-17T04:50:37Z)
LLM-Generated Microservice Implementations from RESTful API Definitions [3.740584607001637]
This paper presents a system that uses Large Language Models (LLMs) to automate the API-first development of software. The system generates OpenAPI specification, generating server code from it, and refining the code through a feedback loop that analyzes execution logs and error messages. The system has the potential to benefit software developers, architects, and organizations to speed up software development cycles.
arXiv Detail & Related papers (2025-02-13T20:50:33Z)
Microservices-Based Framework for Predictive Analytics and Real-time Performance Enhancement in Travel Reservation Systems [1.03590082373586]
The paper presents a framework of architecture dedicated to enhancing the performance of real-time travel reservation systems. Our framework includes real-time predictive analytics, through machine learning models, that optimize forecasting customer demand, dynamic pricing, as well as system performance. Future work will be an investigation of advanced AI models and edge processing to further improve the performance and robustness of the systems employed.
arXiv Detail & Related papers (2024-12-20T07:19:42Z)
WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks? [83.19032025950986]
We study the use of large language model-based agents for interacting with software via web browsers. WorkArena is a benchmark of 33 tasks based on the widely-used ServiceNow platform. BrowserGym is an environment for the design and evaluation of such agents.
arXiv Detail & Related papers (2024-03-12T14:58:45Z)
Interpretable Self-Aware Neural Networks for Robust Trajectory Prediction [50.79827516897913]
We introduce an interpretable paradigm for trajectory prediction that distributes the uncertainty among semantic concepts. We validate our approach on real-world autonomous driving data, demonstrating superior performance over state-of-the-art baselines.
arXiv Detail & Related papers (2022-11-16T06:28:20Z)
A Multiple Criteria Decision Analysis based Approach to Remove Uncertainty in SMP Models [1.6244541005112747]
It is essential to estimate the maintainability of heterogeneous software. A structured methodology was designed, and the datasets were preprocessed and maintainability index (MI) range was also found. To remove the uncertainty among the aforementioned techniques, a popular multiple criteria decision-making model, namely the technique for order preference by similarity to ideal solution (TOPSIS) is used.
arXiv Detail & Related papers (2022-09-30T06:38:10Z)
Exploring Attention-Aware Network Resource Allocation for Customized Metaverse Services [69.37584804990806]
We design an attention-aware network resource allocation scheme to achieve customized Metaverse services. The aim is to allocate more network resources to virtual objects in which users are more interested.
arXiv Detail & Related papers (2022-07-31T06:04:15Z)
Performance Modeling of Metric-Based Serverless Computing Platforms [5.089110111757978]
The proposed performance model can help developers and providers predict the performance and cost of deployments with different configurations. We validate the applicability and accuracy of the proposed performance model by extensive real-world experimentation on Knative.
arXiv Detail & Related papers (2022-02-23T00:39:01Z)
Federated Learning with Unreliable Clients: Performance Analysis and Mechanism Design [76.29738151117583]
Federated Learning (FL) has become a promising tool for training effective machine learning models among distributed clients. However, low quality models could be uploaded to the aggregator server by unreliable clients, leading to a degradation or even a collapse of training. We model these unreliable behaviors of clients and propose a defensive mechanism to mitigate such a security risk.
arXiv Detail & Related papers (2021-05-10T08:02:27Z)
Beware the evolving 'intelligent' web service! An integration architecture tactic to guard AI-first components [5.975695375814527]
Our proposal is an architectural tactic designed to improve intelligent service-dependent software robustness. The tactic involves creating an application-specific benchmark dataset baselined against an intelligent service. A technical evaluation of our implementation of this architecture demonstrates how the tactic can identify 1,054 cases of substantial confidence evolution and 2,461 cases of substantial changes to response label sets.
arXiv Detail & Related papers (2020-05-27T06:15:18Z)
A Privacy-Preserving Distributed Architecture for Deep-Learning-as-a-Service [68.84245063902908]
This paper introduces a novel distributed architecture for deep-learning-as-a-service. It is able to preserve the user sensitive data while providing Cloud-based machine and deep learning services.
arXiv Detail & Related papers (2020-03-30T15:12:03Z)
Unsupervised Model Personalization while Preserving Privacy and Scalability: An Open Problem [55.21502268698577]
This work investigates the task of unsupervised model personalization, adapted to continually evolving, unlabeled local user images. We provide a novel Dual User-Adaptation framework (DUA) to explore the problem. This framework flexibly disentangles user-adaptation into model personalization on the server and local data regularization on the user device.
arXiv Detail & Related papers (2020-03-30T09:35:12Z)
Improving IoT Analytics through Selective Edge Execution [0.0]
We propose to improve the performance of analytics by leveraging edge infrastructure. We devise an algorithm that enables the IoT devices to execute their routines locally. We then outsource them to cloudlet servers, only if they predict they will gain a significant performance improvement.
arXiv Detail & Related papers (2020-03-07T15:02:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.