Related papers: MLTEing Models: Negotiating, Evaluating, and Documenting Model and System Qualities

MLTEing Models: Negotiating, Evaluating, and Documenting Model and System Qualities

URL: http://arxiv.org/abs/2303.01998v1
Date: Fri, 3 Mar 2023 15:10:38 GMT
Title: MLTEing Models: Negotiating, Evaluating, and Documenting Model and System Qualities
Authors: Katherine R. Maffey, Kyle Dotterrer, Jennifer Niemann, Iain Cruickshank, Grace A. Lewis, Christian K\"astner
Abstract summary: MLTE is a framework and implementation to evaluate machine learning models and systems. It compiles state-of-the-art evaluation techniques into an organizational process. MLTE tooling supports this process by providing a domain-specific language that teams can use to express model requirements.
Score: 1.1352560842946413
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many organizations seek to ensure that machine learning (ML) and artificial intelligence (AI) systems work as intended in production but currently do not have a cohesive methodology in place to do so. To fill this gap, we propose MLTE (Machine Learning Test and Evaluation, colloquially referred to as "melt"), a framework and implementation to evaluate ML models and systems. The framework compiles state-of-the-art evaluation techniques into an organizational process for interdisciplinary teams, including model developers, software engineers, system owners, and other stakeholders. MLTE tooling supports this process by providing a domain-specific language that teams can use to express model requirements, an infrastructure to define, generate, and collect ML evaluation metrics, and the means to communicate results.

Related papers

Beyond Formal Semantics for Capabilities and Skills: Model Context Protocol in Manufacturing [0.12289361708127876]
We present an alternative approach based on the recently introduced Model Context Protocol (MCP)<n>MCP allows systems to expose functionality through a standardized interface that is directly consumable by LLM-based agents.
arXiv Detail & Related papers (2025-06-12T13:02:16Z)
From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems [6.284317913684068]
Compound Al Systems (CAIS) is an emerging paradigm that integrates large language models (LLMs) with external components, such as retrievers, agents, tools, and orchestrators.<n>Despite growing adoption in both academia and industry, the CAIS landscape remains fragmented, lacking a unified framework for analysis, taxonomy, and evaluation.<n>This survey aims to provide researchers and practitioners with a comprehensive foundation for understanding, developing, and advancing the next generation of system-level artificial intelligence.
arXiv Detail & Related papers (2025-06-05T02:34:43Z)
PyTDC: A multimodal machine learning training, evaluation, and inference platform for biomedical foundation models [59.17570021208177]
PyTDC is a machine-learning platform providing streamlined training, evaluation, and inference software for multimodal biological AI models.<n>This paper discusses the components of PyTDC's architecture and, to our knowledge, the first-of-its-kind case study on the introduced single-cell drug-target nomination ML task.
arXiv Detail & Related papers (2025-05-08T18:15:38Z)
A Survey of Small Language Models [104.80308007044634]
Small Language Models (SLMs) have become increasingly important due to their efficiency and performance to perform various language tasks with minimal computational resources. We present a comprehensive survey on SLMs, focusing on their architectures, training techniques, and model compression techniques.
arXiv Detail & Related papers (2024-10-25T23:52:28Z)
Leveraging Large Language Models for Enhanced Process Model Comprehension [33.803742664323856]
In Business Process Management (BPM), effectively comprehending process models is crucial yet poses significant challenges. This paper introduces a novel framework utilizing the advanced capabilities of Large Language Models (LLMs) to enhance the interpretability of complex process models.
arXiv Detail & Related papers (2024-08-08T13:12:46Z)
Benchmarks as Microscopes: A Call for Model Metrology [76.64402390208576]
Modern language models (LMs) pose a new challenge in capability assessment. To be confident in our metrics, we need a new discipline of model metrology.
arXiv Detail & Related papers (2024-07-22T17:52:12Z)
A Framework to Model ML Engineering Processes [1.9744907811058787]
Development of Machine Learning (ML) based systems is complex and requires multidisciplinary teams with diverse skill sets. Current process modeling languages are not suitable for describing the development of such systems. We introduce a framework for modeling ML-based software development processes, built around a domain-specific language.
arXiv Detail & Related papers (2024-04-29T09:17:36Z)
Process Modeling With Large Language Models [42.0652924091318]
This paper explores the integration of Large Language Models (LLMs) into process modeling. We propose a framework that leverages LLMs for the automated generation and iterative refinement of process models. Preliminary results demonstrate the framework's ability to streamline process modeling tasks.
arXiv Detail & Related papers (2024-03-12T11:27:47Z)
Machine Learning-Enabled Software and System Architecture Frameworks [48.87872564630711]
The stakeholders with data science and Machine Learning related concerns, such as data scientists and data engineers, are yet to be included in existing architecture frameworks. We surveyed 61 subject matter experts from over 25 organizations in 10 countries.
arXiv Detail & Related papers (2023-08-09T21:54:34Z)
Benchmarking Automated Machine Learning Methods for Price Forecasting Applications [58.720142291102135]
We show the possibility of substituting manually created ML pipelines with automated machine learning (AutoML) solutions. Based on the CRISP-DM process, we split the manual ML pipeline into a machine learning and non-machine learning part. We show in a case study for the industrial use case of price forecasting, that domain knowledge combined with AutoML can weaken the dependence on ML experts.
arXiv Detail & Related papers (2023-04-28T10:27:38Z)
MDE for Machine Learning-Enabled Software Systems: A Case Study and Comparison of MontiAnna & ML-Quadrat [5.839906946900443]
We propose to adopt the MDE paradigm for the development of Machine Learning-enabled software systems with a focus on the Internet of Things (IoT) domain. We illustrate how two state-of-the-art open-source modeling tools, namely MontiAnna and ML-Quadrat can be used for this purpose as demonstrated through a case study.
arXiv Detail & Related papers (2022-09-15T13:21:16Z)
Data Analytics and Machine Learning Methods, Techniques and Tool for Model-Driven Engineering of Smart IoT Services [0.0]
This dissertation proposes a novel approach to enhance the development of smart services for the Internet of Things (IoT) and smart Cyber-Physical Systems (CPS) The proposed approach offers abstraction and automation to the software engineering processes, as well as the Data Analytics (DA) and Machine Learning (ML) practices. We implement and validate the proposed approach by extending an open source modeling tool, called ThingML.
arXiv Detail & Related papers (2021-02-12T11:09:54Z)
Technology Readiness Levels for Machine Learning Systems [107.56979560568232]
Development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. We have developed a proven systems engineering approach for machine learning development and deployment. Our "Machine Learning Technology Readiness Levels" framework defines a principled process to ensure robust, reliable, and responsible systems.
arXiv Detail & Related papers (2021-01-11T15:54:48Z)
Technology Readiness Levels for AI & ML [79.22051549519989]
Development of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. Engineering systems follow well-defined processes and testing standards to streamline development for high-quality, reliable results. We propose a proven systems engineering approach for machine learning development and deployment.
arXiv Detail & Related papers (2020-06-21T17:14:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.