Are We Closing the Loop Yet? Gaps in the Generalizability of VIS4ML
Research
- URL: http://arxiv.org/abs/2308.06290v1
- Date: Thu, 10 Aug 2023 21:44:48 GMT
- Title: Are We Closing the Loop Yet? Gaps in the Generalizability of VIS4ML
Research
- Authors: Hariharan Subramonyam, Jessica Hullman
- Abstract summary: We survey recent VIS4ML papers to assess the generalizability of research contributions and claims in enabling human-in-the-loop ML.
Our results show potential gaps between the current scope of VIS4ML research and aspirations for its use in practice.
- Score: 26.829392755701843
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visualization for machine learning (VIS4ML) research aims to help experts
apply their prior knowledge to develop, understand, and improve the performance
of machine learning models. In conceiving VIS4ML systems, researchers
characterize the nature of human knowledge to support human-in-the-loop tasks,
design interactive visualizations to make ML components interpretable and
elicit knowledge, and evaluate the effectiveness of human-model interchange. We
survey recent VIS4ML papers to assess the generalizability of research
contributions and claims in enabling human-in-the-loop ML. Our results show
potential gaps between the current scope of VIS4ML research and aspirations for
its use in practice. We find that while papers motivate that VIS4ML systems are
applicable beyond the specific conditions studied, conclusions are often
overfitted to non-representative scenarios, are based on interactions with a
small set of ML experts and well-understood datasets, fail to acknowledge
crucial dependencies, and hinge on decisions that lack justification. We
discuss approaches to close the gap between aspirations and research claims and
suggest documentation practices to report generality constraints that better
acknowledge the exploratory nature of VIS4ML research.
Related papers
- Understanding the Role of LLMs in Multimodal Evaluation Benchmarks [77.59035801244278]
This paper investigates the role of the Large Language Model (LLM) backbone in Multimodal Large Language Models (MLLMs) evaluation.
Our study encompasses four diverse MLLM benchmarks and eight state-of-the-art MLLMs.
Key findings reveal that some benchmarks allow high performance even without visual inputs and up to 50% of error rates can be attributed to insufficient world knowledge in the LLM backbone.
arXiv Detail & Related papers (2024-10-16T07:49:13Z) - Understanding the Human-LLM Dynamic: A Literature Survey of LLM Use in Programming Tasks [0.850206009406913]
Large Language Models (LLMs) are transforming programming practices, offering significant capabilities for code generation activities.
This paper focuses on their use in programming tasks, drawing insights from user studies that assess the impact of LLMs on programming tasks.
arXiv Detail & Related papers (2024-10-01T19:34:46Z) - Evaluating Interventional Reasoning Capabilities of Large Language Models [58.52919374786108]
Large language models (LLMs) can estimate causal effects under interventions on different parts of a system.
We conduct empirical analyses to evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention.
We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types, and enable a study of intervention-based reasoning.
arXiv Detail & Related papers (2024-04-08T14:15:56Z) - Explaining Large Language Models Decisions Using Shapley Values [1.223779595809275]
Large language models (LLMs) have opened up exciting possibilities for simulating human behavior and cognitive processes.
However, the validity of utilizing LLMs as stand-ins for human subjects remains uncertain.
This paper presents a novel approach based on Shapley values to interpret LLM behavior and quantify the relative contribution of each prompt component to the model's output.
arXiv Detail & Related papers (2024-03-29T22:49:43Z) - Effectiveness Assessment of Recent Large Vision-Language Models [78.69439393646554]
This paper endeavors to evaluate the competency of popular large vision-language models (LVLMs) in specialized and general tasks.
We employ six challenging tasks in three different application scenarios: natural, healthcare, and industrial.
We examine the performance of three recent open-source LVLMs, including MiniGPT-v2, LLaVA-1.5, and Shikra, on both visual recognition and localization in these tasks.
arXiv Detail & Related papers (2024-03-07T08:25:27Z) - Exploring the Cognitive Knowledge Structure of Large Language Models: An
Educational Diagnostic Assessment Approach [50.125704610228254]
Large Language Models (LLMs) have not only exhibited exceptional performance across various tasks, but also demonstrated sparks of intelligence.
Recent studies have focused on assessing their capabilities on human exams and revealed their impressive competence in different domains.
We conduct an evaluation using MoocRadar, a meticulously annotated human test dataset based on Bloom taxonomy.
arXiv Detail & Related papers (2023-10-12T09:55:45Z) - ExpeL: LLM Agents Are Experiential Learners [60.54312035818746]
We introduce the Experiential Learning (ExpeL) agent to allow learning from agent experiences without requiring parametric updates.
Our agent autonomously gathers experiences and extracts knowledge using natural language from a collection of training tasks.
At inference, the agent recalls its extracted insights and past experiences to make informed decisions.
arXiv Detail & Related papers (2023-08-20T03:03:34Z) - Metacognitive Prompting Improves Understanding in Large Language Models [12.112914393948415]
We introduce Metacognitive Prompting (MP), a strategy inspired by human introspective reasoning processes.
We conduct experiments on four prevalent Large Language Models (LLMs) across ten natural language understanding (NLU) datasets.
MP consistently outperforms existing prompting methods in both general and domain-specific NLU tasks.
arXiv Detail & Related papers (2023-08-10T05:10:17Z) - Visual Analytics For Machine Learning: A Data Perspective Survey [17.19676876329529]
This survey focuses on summarizing VIS4ML works from the data perspective.
We categorize the common data handled by ML models into five types, explain the unique features of each type, and highlight the corresponding ML models that are good at learning from them.
Second, from the large number of VIS4ML works, we tease out six tasks that operate on these types of data at different stages of the ML pipeline to understand, diagnose, and refine ML models.
arXiv Detail & Related papers (2023-07-15T05:13:06Z) - The State of the Art in Enhancing Trust in Machine Learning Models with the Use of Visualizations [0.0]
Machine learning (ML) models are nowadays used in complex applications in various domains, such as medicine, bioinformatics, and other sciences.
Due to their black box nature, however, it may sometimes be hard to understand and trust the results they provide.
This has increased the demand for reliable visualization tools related to enhancing trust in ML models.
We present a State-of-the-Art Report (STAR) on enhancing trust in ML models with the use of interactive visualization.
arXiv Detail & Related papers (2022-12-22T14:29:43Z) - Understanding the Usability Challenges of Machine Learning In
High-Stakes Decision Making [67.72855777115772]
Machine learning (ML) is being applied to a diverse and ever-growing set of domains.
In many cases, domain experts -- who often have no expertise in ML or data science -- are asked to use ML predictions to make high-stakes decisions.
We investigate the ML usability challenges present in the domain of child welfare screening through a series of collaborations with child welfare screeners.
arXiv Detail & Related papers (2021-03-02T22:50:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.