Threshy: Supporting Safe Usage of Intelligent Web Services
- URL: http://arxiv.org/abs/2008.08252v1
- Date: Wed, 19 Aug 2020 04:02:45 GMT
- Title: Threshy: Supporting Safe Usage of Intelligent Web Services
- Authors: Alex Cummaudo, Scott Barnett, Rajesh Vasa and John Grundy
- Abstract summary: Threshy is a tool to help developers select a decision threshold suited to their problem domain.
Unlike existing tools, Threshy is designed for pre-development, pre-release, and support.
- Score: 4.346610687701405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Increased popularity of `intelligent' web services provides end-users with
machine-learnt functionality at little effort to developers. However, these
services require a decision threshold to be set which is dependent on
problem-specific data. Developers lack a systematic approach for evaluating
intelligent services and existing evaluation tools are predominantly targeted
at data scientists for pre-development evaluation. This paper presents a
workflow and supporting tool, Threshy, to help software developers select a
decision threshold suited to their problem domain. Unlike existing tools,
Threshy is designed to operate in multiple workflows including pre-development,
pre-release, and support. Threshy is designed for tuning the confidence scores
returned by intelligent web services and does not deal with hyper-parameter
optimisation used in ML models. Additionally, it considers the financial
impacts of false positives. Threshold configuration files exported by Threshy
can be integrated into client applications and monitoring infrastructure. Demo:
https://bit.ly/2YKeYhE.
Related papers
- Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger [49.81945268343162]
We propose MeCo, an adaptive decision-making strategy for external tool use.
MeCo captures high-level cognitive signals in the representation space, guiding when to invoke tools.
Our experiments show that MeCo accurately detects LLMs' internal cognitive signals and significantly improves tool-use decision-making.
arXiv Detail & Related papers (2025-02-18T15:45:01Z) - SMART: Self-Aware Agent for Tool Overuse Mitigation [58.748554080273585]
Current Large Language Model (LLM) agents demonstrate strong reasoning and tool use capabilities, but often lack self-awareness.
This imbalance leads to Tool Overuse, where models unnecessarily rely on external tools for tasks with parametric knowledge.
We introduce SMART (Strategic Model-Aware Reasoning with Tools), a paradigm that enhances an agent's self-awareness to optimize task handling and reduce tool overuse.
arXiv Detail & Related papers (2025-02-17T04:50:37Z) - LLM-Generated Microservice Implementations from RESTful API Definitions [3.740584607001637]
This paper presents a system that uses Large Language Models (LLMs) to automate the API-first development of software.
The system generates OpenAPI specification, generating server code from it, and refining the code through a feedback loop that analyzes execution logs and error messages.
The system has the potential to benefit software developers, architects, and organizations to speed up software development cycles.
arXiv Detail & Related papers (2025-02-13T20:50:33Z) - Microservices-Based Framework for Predictive Analytics and Real-time Performance Enhancement in Travel Reservation Systems [1.03590082373586]
The paper presents a framework of architecture dedicated to enhancing the performance of real-time travel reservation systems.
Our framework includes real-time predictive analytics, through machine learning models, that optimize forecasting customer demand, dynamic pricing, as well as system performance.
Future work will be an investigation of advanced AI models and edge processing to further improve the performance and robustness of the systems employed.
arXiv Detail & Related papers (2024-12-20T07:19:42Z) - WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks? [83.19032025950986]
We study the use of large language model-based agents for interacting with software via web browsers.
WorkArena is a benchmark of 33 tasks based on the widely-used ServiceNow platform.
BrowserGym is an environment for the design and evaluation of such agents.
arXiv Detail & Related papers (2024-03-12T14:58:45Z) - Interpretable Self-Aware Neural Networks for Robust Trajectory
Prediction [50.79827516897913]
We introduce an interpretable paradigm for trajectory prediction that distributes the uncertainty among semantic concepts.
We validate our approach on real-world autonomous driving data, demonstrating superior performance over state-of-the-art baselines.
arXiv Detail & Related papers (2022-11-16T06:28:20Z) - Performance Modeling of Metric-Based Serverless Computing Platforms [5.089110111757978]
The proposed performance model can help developers and providers predict the performance and cost of deployments with different configurations.
We validate the applicability and accuracy of the proposed performance model by extensive real-world experimentation on Knative.
arXiv Detail & Related papers (2022-02-23T00:39:01Z) - Federated Learning with Unreliable Clients: Performance Analysis and
Mechanism Design [76.29738151117583]
Federated Learning (FL) has become a promising tool for training effective machine learning models among distributed clients.
However, low quality models could be uploaded to the aggregator server by unreliable clients, leading to a degradation or even a collapse of training.
We model these unreliable behaviors of clients and propose a defensive mechanism to mitigate such a security risk.
arXiv Detail & Related papers (2021-05-10T08:02:27Z) - Beware the evolving 'intelligent' web service! An integration
architecture tactic to guard AI-first components [5.975695375814527]
Our proposal is an architectural tactic designed to improve intelligent service-dependent software robustness.
The tactic involves creating an application-specific benchmark dataset baselined against an intelligent service.
A technical evaluation of our implementation of this architecture demonstrates how the tactic can identify 1,054 cases of substantial confidence evolution and 2,461 cases of substantial changes to response label sets.
arXiv Detail & Related papers (2020-05-27T06:15:18Z) - A Privacy-Preserving Distributed Architecture for
Deep-Learning-as-a-Service [68.84245063902908]
This paper introduces a novel distributed architecture for deep-learning-as-a-service.
It is able to preserve the user sensitive data while providing Cloud-based machine and deep learning services.
arXiv Detail & Related papers (2020-03-30T15:12:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.