Exploring ML testing in practice -- Lessons learned from an interactive
rapid review with Axis Communications
- URL: http://arxiv.org/abs/2203.16225v1
- Date: Wed, 30 Mar 2022 12:01:43 GMT
- Title: Exploring ML testing in practice -- Lessons learned from an interactive
rapid review with Axis Communications
- Authors: Qunying Song and Markus Borg and Emelie Engstr\"om and H{\aa}kan
Ard\"o and Sergio Rico
- Abstract summary: There is a growing interest in industry and academia in machine learning (ML) testing.
We believe that industry and academia need to learn together to produce rigorous and relevant knowledge.
- Score: 4.875319458066472
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is a growing interest in industry and academia in machine learning (ML)
testing. We believe that industry and academia need to learn together to
produce rigorous and relevant knowledge. In this study, we initiate a
collaboration between stakeholders from one case company, one research
institute, and one university. To establish a common view of the problem
domain, we applied an interactive rapid review of the state of the art. Four
researchers from Lund University and RISE Research Institutes and four
practitioners from Axis Communications reviewed a set of 180 primary studies on
ML testing. We developed a taxonomy for the communication around ML testing
challenges and results and identified a list of 12 review questions relevant
for Axis Communications. The three most important questions (data testing,
metrics for assessment, and test generation) were mapped to the literature, and
an in-depth analysis of the 35 primary studies matching the most important
question (data testing) was made. A final set of the five best matches were
analysed and we reflect on the criteria for applicability and relevance for the
industry. The taxonomies are helpful for communication but not final.
Furthermore, there was no perfect match to the case company's investigated
review question (data testing). However, we extracted relevant approaches from
the five studies on a conceptual level to support later context-specific
improvements. We found the interactive rapid review approach useful for
triggering and aligning communication between the different stakeholders.
Related papers
- RelevAI-Reviewer: A Benchmark on AI Reviewers for Survey Paper Relevance [0.8089605035945486]
We propose RelevAI-Reviewer, an automatic system that conceptualizes the task of survey paper review as a classification problem.
We introduce a novel dataset comprised of 25,164 instances. Each instance contains one prompt and four candidate papers, each varying in relevance to the prompt.
We develop a machine learning (ML) model capable of determining the relevance of each paper and identifying the most pertinent one.
arXiv Detail & Related papers (2024-06-13T06:42:32Z) - Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions [62.0123588983514]
Large Language Models (LLMs) have demonstrated wide-ranging applications across various fields.
We reformulate the peer-review process as a multi-turn, long-context dialogue, incorporating distinct roles for authors, reviewers, and decision makers.
We construct a comprehensive dataset containing over 26,841 papers with 92,017 reviews collected from multiple sources.
arXiv Detail & Related papers (2024-06-09T08:24:17Z) - SceMQA: A Scientific College Entrance Level Multimodal Question
Answering Benchmark [42.91902601376494]
The paper introduces SceMQA, a novel benchmark for scientific multimodal question answering at the college entrance level.
SceMQA focuses on core science subjects including Mathematics, Physics, Chemistry, and Biology.
It features a blend of multiple-choice and free-response formats, ensuring a comprehensive evaluation of AI models' abilities.
arXiv Detail & Related papers (2024-02-06T19:16:55Z) - Benchmarking Foundation Models with Language-Model-as-an-Examiner [47.345760054595246]
We propose a novel benchmarking framework, Language-Model-as-an-Examiner.
The LM serves as a knowledgeable examiner that formulates questions based on its knowledge and evaluates responses in a reference-free manner.
arXiv Detail & Related papers (2023-06-07T06:29:58Z) - The Quantum Frontier of Software Engineering: A Systematic Mapping Study [16.93115872272979]
Quantum software engineering (QSE) is emerging as a new discipline to enable developers to design and develop quantum programs.
This paper presents a systematic mapping study of the current state of QSE research.
arXiv Detail & Related papers (2023-05-31T09:26:10Z) - The impact and applications of ChatGPT: a systematic review of
literature reviews [0.0]
ChatGPT has become one of the most widely used natural language processing tools.
With thousands of published papers demonstrating its applications across various industries and fields, ChatGPT has sparked significant interest in the research community.
An overview of the available evidence from multiple reviews and studies could provide further insights, minimize redundancy, and identify areas where further research is needed.
arXiv Detail & Related papers (2023-05-08T17:57:34Z) - The Technological Emergence of AutoML: A Survey of Performant Software
and Applications in the Context of Industry [72.10607978091492]
Automated/Autonomous Machine Learning (AutoML/AutonoML) is a relatively young field.
This review makes two primary contributions to knowledge around this topic.
It provides the most up-to-date and comprehensive survey of existing AutoML tools, both open-source and commercial.
arXiv Detail & Related papers (2022-11-08T10:42:08Z) - Investigating Fairness Disparities in Peer Review: A Language Model
Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs)
We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date.
We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z) - Ranking Scientific Papers Using Preference Learning [48.78161994501516]
We cast it as a paper ranking problem based on peer review texts and reviewer scores.
We introduce a novel, multi-faceted generic evaluation framework for making final decisions based on peer reviews.
arXiv Detail & Related papers (2021-09-02T19:41:47Z) - Mining Implicit Relevance Feedback from User Behavior for Web Question
Answering [92.45607094299181]
We make the first study to explore the correlation between user behavior and passage relevance.
Our approach significantly improves the accuracy of passage ranking without extra human labeled data.
In practice, this work has proved effective to substantially reduce the human labeling cost for the QA service in a global commercial search engine.
arXiv Detail & Related papers (2020-06-13T07:02:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.