Related papers: Exploring ML testing in practice -- Lessons learned from an interactive rapid review with Axis Communications

Exploring ML testing in practice -- Lessons learned from an interactive rapid review with Axis Communications

URL: http://arxiv.org/abs/2203.16225v1
Date: Wed, 30 Mar 2022 12:01:43 GMT
Title: Exploring ML testing in practice -- Lessons learned from an interactive rapid review with Axis Communications
Authors: Qunying Song and Markus Borg and Emelie Engstr\"om and H{\aa}kan Ard\"o and Sergio Rico
Abstract summary: There is a growing interest in industry and academia in machine learning (ML) testing. We believe that industry and academia need to learn together to produce rigorous and relevant knowledge.
Score: 4.875319458066472
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There is a growing interest in industry and academia in machine learning (ML) testing. We believe that industry and academia need to learn together to produce rigorous and relevant knowledge. In this study, we initiate a collaboration between stakeholders from one case company, one research institute, and one university. To establish a common view of the problem domain, we applied an interactive rapid review of the state of the art. Four researchers from Lund University and RISE Research Institutes and four practitioners from Axis Communications reviewed a set of 180 primary studies on ML testing. We developed a taxonomy for the communication around ML testing challenges and results and identified a list of 12 review questions relevant for Axis Communications. The three most important questions (data testing, metrics for assessment, and test generation) were mapped to the literature, and an in-depth analysis of the 35 primary studies matching the most important question (data testing) was made. A final set of the five best matches were analysed and we reflect on the criteria for applicability and relevance for the industry. The taxonomies are helpful for communication but not final. Furthermore, there was no perfect match to the case company's investigated review question (data testing). However, we extracted relevant approaches from the five studies on a conceptual level to support later context-specific improvements. We found the interactive rapid review approach useful for triggering and aligning communication between the different stakeholders.

Related papers

Psycholinguistic Analyses in Software Engineering Text: A Systematic Literature Review [9.229310642804036]
Linguistic Inquiry and Word Count (LIWC) offer clearer, interpretable insights into cognitive and emotional processes exhibited in text. Despite its wide use in software engineering research, no comprehensive review of LIWC's use has been conducted. We conducted a systematic review of six prominent databases, identifying 43 SE-related papers using LIWC.
arXiv Detail & Related papers (2025-03-08T00:23:13Z)
PeerQA: A Scientific Question Answering Dataset from Peer Reviews [51.95579001315713]
We present PeerQA, a real-world, scientific, document-level Question Answering dataset. The dataset contains 579 QA pairs from 208 academic articles, with a majority from ML and NLP. We provide a detailed analysis of the collected dataset and conduct experiments establishing baseline systems for all three tasks.
arXiv Detail & Related papers (2025-02-19T12:24:46Z)
Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation [58.064940977804596]
A plethora of new AI models and tools has been proposed, promising to empower researchers and academics worldwide to conduct their research more effectively and efficiently. Ethical concerns regarding shortcomings of these tools and potential for misuse take a particularly prominent place in our discussion.
arXiv Detail & Related papers (2025-02-07T18:26:45Z)
RelevAI-Reviewer: A Benchmark on AI Reviewers for Survey Paper Relevance [0.8089605035945486]
We propose RelevAI-Reviewer, an automatic system that conceptualizes the task of survey paper review as a classification problem. We introduce a novel dataset comprised of 25,164 instances. Each instance contains one prompt and four candidate papers, each varying in relevance to the prompt. We develop a machine learning (ML) model capable of determining the relevance of each paper and identifying the most pertinent one.
arXiv Detail & Related papers (2024-06-13T06:42:32Z)
Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions [62.0123588983514]
Large Language Models (LLMs) have demonstrated wide-ranging applications across various fields. We reformulate the peer-review process as a multi-turn, long-context dialogue, incorporating distinct roles for authors, reviewers, and decision makers. We construct a comprehensive dataset containing over 26,841 papers with 92,017 reviews collected from multiple sources.
arXiv Detail & Related papers (2024-06-09T08:24:17Z)
SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark [42.91902601376494]
The paper introduces SceMQA, a novel benchmark for scientific multimodal question answering at the college entrance level. SceMQA focuses on core science subjects including Mathematics, Physics, Chemistry, and Biology. It features a blend of multiple-choice and free-response formats, ensuring a comprehensive evaluation of AI models' abilities.
arXiv Detail & Related papers (2024-02-06T19:16:55Z)
Benchmarking Foundation Models with Language-Model-as-an-Examiner [47.345760054595246]
We propose a novel benchmarking framework, Language-Model-as-an-Examiner. The LM serves as a knowledgeable examiner that formulates questions based on its knowledge and evaluates responses in a reference-free manner.
arXiv Detail & Related papers (2023-06-07T06:29:58Z)
The Quantum Frontier of Software Engineering: A Systematic Mapping Study [16.93115872272979]
Quantum software engineering (QSE) is emerging as a new discipline to enable developers to design and develop quantum programs. This paper presents a systematic mapping study of the current state of QSE research.
arXiv Detail & Related papers (2023-05-31T09:26:10Z)
The impact and applications of ChatGPT: a systematic review of literature reviews [0.0]
ChatGPT has become one of the most widely used natural language processing tools. With thousands of published papers demonstrating its applications across various industries and fields, ChatGPT has sparked significant interest in the research community. An overview of the available evidence from multiple reviews and studies could provide further insights, minimize redundancy, and identify areas where further research is needed.
arXiv Detail & Related papers (2023-05-08T17:57:34Z)
The Technological Emergence of AutoML: A Survey of Performant Software and Applications in the Context of Industry [72.10607978091492]
Automated/Autonomous Machine Learning (AutoML/AutonoML) is a relatively young field. This review makes two primary contributions to knowledge around this topic. It provides the most up-to-date and comprehensive survey of existing AutoML tools, both open-source and commercial.
arXiv Detail & Related papers (2022-11-08T10:42:08Z)
Investigating Fairness Disparities in Peer Review: A Language Model Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs) We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date. We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z)
Ranking Scientific Papers Using Preference Learning [48.78161994501516]
We cast it as a paper ranking problem based on peer review texts and reviewer scores. We introduce a novel, multi-faceted generic evaluation framework for making final decisions based on peer reviews.
arXiv Detail & Related papers (2021-09-02T19:41:47Z)
Mining Implicit Relevance Feedback from User Behavior for Web Question Answering [92.45607094299181]
We make the first study to explore the correlation between user behavior and passage relevance. Our approach significantly improves the accuracy of passage ranking without extra human labeled data. In practice, this work has proved effective to substantially reduce the human labeling cost for the QA service in a global commercial search engine.
arXiv Detail & Related papers (2020-06-13T07:02:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.