Related papers: Are We Closing the Loop Yet? Gaps in the Generalizability of VIS4ML Research

Are We Closing the Loop Yet? Gaps in the Generalizability of VIS4ML Research

URL: http://arxiv.org/abs/2308.06290v1
Date: Thu, 10 Aug 2023 21:44:48 GMT
Title: Are We Closing the Loop Yet? Gaps in the Generalizability of VIS4ML Research
Authors: Hariharan Subramonyam, Jessica Hullman
Abstract summary: We survey recent VIS4ML papers to assess the generalizability of research contributions and claims in enabling human-in-the-loop ML. Our results show potential gaps between the current scope of VIS4ML research and aspirations for its use in practice.
Score: 26.829392755701843
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visualization for machine learning (VIS4ML) research aims to help experts apply their prior knowledge to develop, understand, and improve the performance of machine learning models. In conceiving VIS4ML systems, researchers characterize the nature of human knowledge to support human-in-the-loop tasks, design interactive visualizations to make ML components interpretable and elicit knowledge, and evaluate the effectiveness of human-model interchange. We survey recent VIS4ML papers to assess the generalizability of research contributions and claims in enabling human-in-the-loop ML. Our results show potential gaps between the current scope of VIS4ML research and aspirations for its use in practice. We find that while papers motivate that VIS4ML systems are applicable beyond the specific conditions studied, conclusions are often overfitted to non-representative scenarios, are based on interactions with a small set of ML experts and well-understood datasets, fail to acknowledge crucial dependencies, and hinge on decisions that lack justification. We discuss approaches to close the gap between aspirations and research claims and suggest documentation practices to report generality constraints that better acknowledge the exploratory nature of VIS4ML research.

Related papers

Multimodal LLM Augmented Reasoning for Interpretable Visual Perception Analysis [19.032828729570458]
We use established principles and explanations from psychology and cognitive science related to complexity in human visual perception. Our study aims to benchmark MLLMs across various explainability principles relevant to visual perception.
arXiv Detail & Related papers (2025-04-16T22:14:27Z)
How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective [64.00022624183781]
Large language models (LLMs) can assess relevance and support information retrieval (IR) tasks. We investigate how different LLM modules contribute to relevance judgment through the lens of mechanistic interpretability.
arXiv Detail & Related papers (2025-04-10T16:14:55Z)
Exploring and Evaluating Multimodal Knowledge Reasoning Consistency of Multimodal Large Language Models [52.569132872560814]
multimodal large language models (MLLMs) have achieved significant breakthroughs, enhancing understanding across text and vision. However, current MLLMs still face challenges in effectively integrating knowledge across these modalities during multimodal knowledge reasoning. We analyze and compare the extent of consistency degradation in multimodal knowledge reasoning within MLLMs.
arXiv Detail & Related papers (2025-03-03T09:01:51Z)
VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models [62.667142971664575]
We introduce VisFactor, a novel benchmark derived from the Factor-Referenced Cognitive Test (FRCT) VisFactor digitalizes vision-related FRCT subtests to systematically evaluate MLLMs across essential visual cognitive tasks. We present a comprehensive evaluation of state-of-the-art MLLMs, such as GPT-4o, Gemini-Pro, and Qwen-VL.
arXiv Detail & Related papers (2025-02-23T04:21:32Z)
Understanding the Role of LLMs in Multimodal Evaluation Benchmarks [77.59035801244278]
This paper investigates the role of the Large Language Model (LLM) backbone in Multimodal Large Language Models (MLLMs) evaluation. Our study encompasses four diverse MLLM benchmarks and eight state-of-the-art MLLMs. Key findings reveal that some benchmarks allow high performance even without visual inputs and up to 50% of error rates can be attributed to insufficient world knowledge in the LLM backbone.
arXiv Detail & Related papers (2024-10-16T07:49:13Z)
Understanding the Human-LLM Dynamic: A Literature Survey of LLM Use in Programming Tasks [0.850206009406913]
Large Language Models (LLMs) are transforming programming practices, offering significant capabilities for code generation activities. This paper focuses on their use in programming tasks, drawing insights from user studies that assess the impact of LLMs on programming tasks.
arXiv Detail & Related papers (2024-10-01T19:34:46Z)
Evaluating Interventional Reasoning Capabilities of Large Language Models [58.52919374786108]
Large language models (LLMs) can estimate causal effects under interventions on different parts of a system. We conduct empirical analyses to evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention. We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types, and enable a study of intervention-based reasoning.
arXiv Detail & Related papers (2024-04-08T14:15:56Z)
Explaining Large Language Models Decisions Using Shapley Values [1.223779595809275]
Large language models (LLMs) have opened up exciting possibilities for simulating human behavior and cognitive processes. However, the validity of utilizing LLMs as stand-ins for human subjects remains uncertain. This paper presents a novel approach based on Shapley values to interpret LLM behavior and quantify the relative contribution of each prompt component to the model's output.
arXiv Detail & Related papers (2024-03-29T22:49:43Z)
Effectiveness Assessment of Recent Large Vision-Language Models [78.69439393646554]
This paper endeavors to evaluate the competency of popular large vision-language models (LVLMs) in specialized and general tasks. We employ six challenging tasks in three different application scenarios: natural, healthcare, and industrial. We examine the performance of three recent open-source LVLMs, including MiniGPT-v2, LLaVA-1.5, and Shikra, on both visual recognition and localization in these tasks.
arXiv Detail & Related papers (2024-03-07T08:25:27Z)
Exploring the Cognitive Knowledge Structure of Large Language Models: An Educational Diagnostic Assessment Approach [50.125704610228254]
Large Language Models (LLMs) have not only exhibited exceptional performance across various tasks, but also demonstrated sparks of intelligence. Recent studies have focused on assessing their capabilities on human exams and revealed their impressive competence in different domains. We conduct an evaluation using MoocRadar, a meticulously annotated human test dataset based on Bloom taxonomy.
arXiv Detail & Related papers (2023-10-12T09:55:45Z)
ExpeL: LLM Agents Are Experiential Learners [60.54312035818746]
We introduce the Experiential Learning (ExpeL) agent to allow learning from agent experiences without requiring parametric updates. Our agent autonomously gathers experiences and extracts knowledge using natural language from a collection of training tasks. At inference, the agent recalls its extracted insights and past experiences to make informed decisions.
arXiv Detail & Related papers (2023-08-20T03:03:34Z)
Metacognitive Prompting Improves Understanding in Large Language Models [12.112914393948415]
We introduce Metacognitive Prompting (MP), a strategy inspired by human introspective reasoning processes. We conduct experiments on four prevalent Large Language Models (LLMs) across ten natural language understanding (NLU) datasets. MP consistently outperforms existing prompting methods in both general and domain-specific NLU tasks.
arXiv Detail & Related papers (2023-08-10T05:10:17Z)
Visual Analytics For Machine Learning: A Data Perspective Survey [17.19676876329529]
This survey focuses on summarizing VIS4ML works from the data perspective. We categorize the common data handled by ML models into five types, explain the unique features of each type, and highlight the corresponding ML models that are good at learning from them. Second, from the large number of VIS4ML works, we tease out six tasks that operate on these types of data at different stages of the ML pipeline to understand, diagnose, and refine ML models.
arXiv Detail & Related papers (2023-07-15T05:13:06Z)
The State of the Art in Enhancing Trust in Machine Learning Models with the Use of Visualizations [0.0]
Machine learning (ML) models are nowadays used in complex applications in various domains, such as medicine, bioinformatics, and other sciences. Due to their black box nature, however, it may sometimes be hard to understand and trust the results they provide. This has increased the demand for reliable visualization tools related to enhancing trust in ML models. We present a State-of-the-Art Report (STAR) on enhancing trust in ML models with the use of interactive visualization.
arXiv Detail & Related papers (2022-12-22T14:29:43Z)
Understanding the Usability Challenges of Machine Learning In High-Stakes Decision Making [67.72855777115772]
Machine learning (ML) is being applied to a diverse and ever-growing set of domains. In many cases, domain experts -- who often have no expertise in ML or data science -- are asked to use ML predictions to make high-stakes decisions. We investigate the ML usability challenges present in the domain of child welfare screening through a series of collaborations with child welfare screeners.
arXiv Detail & Related papers (2021-03-02T22:50:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.