Exploring ML testing in practice -- Lessons learned from an interactive
rapid review with Axis Communications
- URL: http://arxiv.org/abs/2203.16225v1
- Date: Wed, 30 Mar 2022 12:01:43 GMT
- Title: Exploring ML testing in practice -- Lessons learned from an interactive
rapid review with Axis Communications
- Authors: Qunying Song and Markus Borg and Emelie Engstr\"om and H{\aa}kan
Ard\"o and Sergio Rico
- Abstract summary: There is a growing interest in industry and academia in machine learning (ML) testing.
We believe that industry and academia need to learn together to produce rigorous and relevant knowledge.
- Score: 4.875319458066472
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is a growing interest in industry and academia in machine learning (ML)
testing. We believe that industry and academia need to learn together to
produce rigorous and relevant knowledge. In this study, we initiate a
collaboration between stakeholders from one case company, one research
institute, and one university. To establish a common view of the problem
domain, we applied an interactive rapid review of the state of the art. Four
researchers from Lund University and RISE Research Institutes and four
practitioners from Axis Communications reviewed a set of 180 primary studies on
ML testing. We developed a taxonomy for the communication around ML testing
challenges and results and identified a list of 12 review questions relevant
for Axis Communications. The three most important questions (data testing,
metrics for assessment, and test generation) were mapped to the literature, and
an in-depth analysis of the 35 primary studies matching the most important
question (data testing) was made. A final set of the five best matches were
analysed and we reflect on the criteria for applicability and relevance for the
industry. The taxonomies are helpful for communication but not final.
Furthermore, there was no perfect match to the case company's investigated
review question (data testing). However, we extracted relevant approaches from
the five studies on a conceptual level to support later context-specific
improvements. We found the interactive rapid review approach useful for
triggering and aligning communication between the different stakeholders.
Related papers
- PeerQA: A Scientific Question Answering Dataset from Peer Reviews [51.95579001315713]
We present PeerQA, a real-world, scientific, document-level Question Answering dataset.
The dataset contains 579 QA pairs from 208 academic articles, with a majority from ML and NLP.
We provide a detailed analysis of the collected dataset and conduct experiments establishing baseline systems for all three tasks.
arXiv Detail & Related papers (2025-02-19T12:24:46Z) - Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation [58.064940977804596]
A plethora of new AI models and tools has been proposed, promising to empower researchers and academics worldwide to conduct their research more effectively and efficiently.
Ethical concerns regarding shortcomings of these tools and potential for misuse take a particularly prominent place in our discussion.
arXiv Detail & Related papers (2025-02-07T18:26:45Z) - RelevAI-Reviewer: A Benchmark on AI Reviewers for Survey Paper Relevance [0.8089605035945486]
We propose RelevAI-Reviewer, an automatic system that conceptualizes the task of survey paper review as a classification problem.
We introduce a novel dataset comprised of 25,164 instances. Each instance contains one prompt and four candidate papers, each varying in relevance to the prompt.
We develop a machine learning (ML) model capable of determining the relevance of each paper and identifying the most pertinent one.
arXiv Detail & Related papers (2024-06-13T06:42:32Z) - SceMQA: A Scientific College Entrance Level Multimodal Question
Answering Benchmark [42.91902601376494]
The paper introduces SceMQA, a novel benchmark for scientific multimodal question answering at the college entrance level.
SceMQA focuses on core science subjects including Mathematics, Physics, Chemistry, and Biology.
It features a blend of multiple-choice and free-response formats, ensuring a comprehensive evaluation of AI models' abilities.
arXiv Detail & Related papers (2024-02-06T19:16:55Z) - Benchmarking Foundation Models with Language-Model-as-an-Examiner [47.345760054595246]
We propose a novel benchmarking framework, Language-Model-as-an-Examiner.
The LM serves as a knowledgeable examiner that formulates questions based on its knowledge and evaluates responses in a reference-free manner.
arXiv Detail & Related papers (2023-06-07T06:29:58Z) - The Quantum Frontier of Software Engineering: A Systematic Mapping Study [16.93115872272979]
Quantum software engineering (QSE) is emerging as a new discipline to enable developers to design and develop quantum programs.
This paper presents a systematic mapping study of the current state of QSE research.
arXiv Detail & Related papers (2023-05-31T09:26:10Z) - The impact and applications of ChatGPT: a systematic review of
literature reviews [0.0]
ChatGPT has become one of the most widely used natural language processing tools.
With thousands of published papers demonstrating its applications across various industries and fields, ChatGPT has sparked significant interest in the research community.
An overview of the available evidence from multiple reviews and studies could provide further insights, minimize redundancy, and identify areas where further research is needed.
arXiv Detail & Related papers (2023-05-08T17:57:34Z) - Investigating Fairness Disparities in Peer Review: A Language Model
Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs)
We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date.
We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z) - Ranking Scientific Papers Using Preference Learning [48.78161994501516]
We cast it as a paper ranking problem based on peer review texts and reviewer scores.
We introduce a novel, multi-faceted generic evaluation framework for making final decisions based on peer reviews.
arXiv Detail & Related papers (2021-09-02T19:41:47Z) - Mining Implicit Relevance Feedback from User Behavior for Web Question
Answering [92.45607094299181]
We make the first study to explore the correlation between user behavior and passage relevance.
Our approach significantly improves the accuracy of passage ranking without extra human labeled data.
In practice, this work has proved effective to substantially reduce the human labeling cost for the QA service in a global commercial search engine.
arXiv Detail & Related papers (2020-06-13T07:02:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.