REAL ML: Recognizing, Exploring, and Articulating Limitations of Machine
Learning Research
- URL: http://arxiv.org/abs/2205.08363v1
- Date: Thu, 5 May 2022 15:32:45 GMT
- Title: REAL ML: Recognizing, Exploring, and Articulating Limitations of Machine
Learning Research
- Authors: Jessie J. Smith, Saleema Amershi, Solon Barocas, Hanna Wallach,
Jennifer Wortman Vaughan
- Abstract summary: Transparency around limitations can improve the scientific rigor of research, help ensure appropriate interpretation of research findings, and make research claims more credible.
Despite these benefits, the machine learning (ML) research community lacks well-developed norms around disclosing and discussing limitations.
We conduct an iterative design process with 30 ML and ML-adjacent researchers to develop REAL ML, a set of guided activities to help ML researchers recognize, explore, and articulate the limitations of their research.
- Score: 19.71032778307425
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transparency around limitations can improve the scientific rigor of research,
help ensure appropriate interpretation of research findings, and make research
claims more credible. Despite these benefits, the machine learning (ML)
research community lacks well-developed norms around disclosing and discussing
limitations. To address this gap, we conduct an iterative design process with
30 ML and ML-adjacent researchers to develop and test REAL ML, a set of guided
activities to help ML researchers recognize, explore, and articulate the
limitations of their research. Using a three-stage interview and survey study,
we identify ML researchers' perceptions of limitations, as well as the
challenges they face when recognizing, exploring, and articulating limitations.
We develop REAL ML to address some of these practical challenges, and highlight
additional cultural challenges that will require broader shifts in community
norms to address. We hope our study and REAL ML help move the ML research
community toward more active and appropriate engagement with limitations.
Related papers
- A Call for New Recipes to Enhance Spatial Reasoning in MLLMs [85.67171333213301]
Multimodal Large Language Models (MLLMs) have demonstrated impressive performance in general vision-language tasks.
Recent studies have exposed critical limitations in their spatial reasoning capabilities.
This deficiency in spatial reasoning significantly constrains MLLMs' ability to interact effectively with the physical world.
arXiv Detail & Related papers (2025-04-21T11:48:39Z) - MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges? [64.62421656031128]
MLRC-Bench is a benchmark designed to quantify how effectively language agents can tackle challenging Machine Learning (ML) Research Competitions.
Unlike prior work, MLRC-Bench measures the key steps of proposing and implementing novel research methods.
Even the best-performing tested agent closes only 9.3% of the gap between baseline and top human participant scores.
arXiv Detail & Related papers (2025-04-13T19:35:43Z) - Knowledge Boundary of Large Language Models: A Survey [75.67848187449418]
Large language models (LLMs) store vast amount of knowledge in their parameters, but they still have limitations in the memorization and utilization of certain knowledge.
This highlights the critical need to understand the knowledge boundary of LLMs, a concept that remains inadequately defined in existing research.
We propose a comprehensive definition of the LLM knowledge boundary and introduce a formalized taxonomy categorizing knowledge into four distinct types.
arXiv Detail & Related papers (2024-12-17T02:14:02Z) - A Survey on Uncertainty Quantification of Large Language Models: Taxonomy, Open Research Challenges, and Future Directions [9.045698110081686]
Large language models (LLMs) generate plausible, factually-incorrect responses, which are expressed with striking confidence.
Previous work has shown that hallucinations and other non-factual responses generated by LLMs can be detected by examining the uncertainty of the LLM in its response to the pertinent prompt.
This survey seeks to provide an extensive review of existing uncertainty quantification methods for LLMs, identifying their salient features, along with their strengths and weaknesses.
arXiv Detail & Related papers (2024-12-07T06:56:01Z) - Exploring Knowledge Boundaries in Large Language Models for Retrieval Judgment [56.87031484108484]
Large Language Models (LLMs) are increasingly recognized for their practical applications.
Retrieval-Augmented Generation (RAG) tackles this challenge and has shown a significant impact on LLMs.
By minimizing retrieval requests that yield neutral or harmful results, we can effectively reduce both time and computational costs.
arXiv Detail & Related papers (2024-11-09T15:12:28Z) - Reproducibility in Machine Learning-based Research: Overview, Barriers and Drivers [1.4841630983274845]
Research in various fields is currently experiencing challenges regarding awareness of results.
This problem is also prevalent in machine learning (ML) research.
The level of in ML-driven research remains unsatisfactory.
arXiv Detail & Related papers (2024-06-20T13:56:42Z) - Decoding complexity: how machine learning is redefining scientific discovery [16.132517461279487]
Machine learning (ML) has become an essential tool for organising, analysing, and interpreting complex datasets.
This paper explores the transformative role of ML in accelerating breakthroughs across a range of scientific disciplines.
arXiv Detail & Related papers (2024-05-07T09:58:02Z) - Apprentices to Research Assistants: Advancing Research with Large Language Models [0.0]
Large Language Models (LLMs) have emerged as powerful tools in various research domains.
This article examines their potential through a literature review and firsthand experimentation.
arXiv Detail & Related papers (2024-04-09T15:53:06Z) - LimGen: Probing the LLMs for Generating Suggestive Limitations of Research Papers [8.076841611508488]
We present a novel and challenging task of Suggestive Limitation Generation (SLG) for research papers.
We compile a dataset called textbftextitLimGen, encompassing 4068 research papers and their associated limitations from the ACL anthology.
arXiv Detail & Related papers (2024-03-22T17:31:43Z) - Effectiveness Assessment of Recent Large Vision-Language Models [78.69439393646554]
This paper endeavors to evaluate the competency of popular large vision-language models (LVLMs) in specialized and general tasks.
We employ six challenging tasks in three different application scenarios: natural, healthcare, and industrial.
We examine the performance of three recent open-source LVLMs, including MiniGPT-v2, LLaVA-1.5, and Shikra, on both visual recognition and localization in these tasks.
arXiv Detail & Related papers (2024-03-07T08:25:27Z) - Exploring Perceptual Limitation of Multimodal Large Language Models [57.567868157293994]
We quantitatively study the perception of small visual objects in several state-of-the-art MLLMs.
We identify four independent factors that can contribute to this limitation.
Lower object quality and smaller object size can both independently reduce MLLMs' ability to answer visual questions.
arXiv Detail & Related papers (2024-02-12T03:04:42Z) - Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation [109.8527403904657]
We show that large language models (LLMs) possess unwavering confidence in their knowledge and cannot handle the conflict between internal and external knowledge well.
Retrieval augmentation proves to be an effective approach in enhancing LLMs' awareness of knowledge boundaries.
We propose a simple method to dynamically utilize supporting documents with our judgement strategy.
arXiv Detail & Related papers (2023-07-20T16:46:10Z) - Reproducibility in Machine Learning-Driven Research [1.7936835766396748]
Research is facing a viability crisis, in which the results and findings of many studies are difficult or even impossible to reproduce.
This is also the case in machine learning (ML) and artificial intelligence (AI) research.
Although different solutions to address this issue are discussed in the research community such as using ML platforms, the level of in ML-driven research is not increasing substantially.
arXiv Detail & Related papers (2023-07-19T07:00:22Z) - Machine Learning Practices Outside Big Tech: How Resource Constraints
Challenge Responsible Development [1.8275108630751844]
Machine learning practitioners from diverse occupations and backgrounds are increasingly using machine learning (ML) methods.
Past research often excludes the broader, lesser-resourced ML community.
These practitioners share many of the same ML development difficulties and ethical conundrums as their Big Tech counterparts.
arXiv Detail & Related papers (2021-10-06T17:25:21Z) - Understanding the Usability Challenges of Machine Learning In
High-Stakes Decision Making [67.72855777115772]
Machine learning (ML) is being applied to a diverse and ever-growing set of domains.
In many cases, domain experts -- who often have no expertise in ML or data science -- are asked to use ML predictions to make high-stakes decisions.
We investigate the ML usability challenges present in the domain of child welfare screening through a series of collaborations with child welfare screeners.
arXiv Detail & Related papers (2021-03-02T22:50:45Z) - Learnings from Frontier Development Lab and SpaceML -- AI Accelerators
for NASA and ESA [57.06643156253045]
Research with AI and ML technologies lives in a variety of settings with often asynchronous goals and timelines.
We perform a case study of the Frontier Development Lab (FDL), an AI accelerator under a public-private partnership from NASA and ESA.
FDL research follows principled practices that are grounded in responsible development, conduct, and dissemination of AI research.
arXiv Detail & Related papers (2020-11-09T21:23:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.