Capturing Visualization Design Rationale
- URL: http://arxiv.org/abs/2506.16571v2
- Date: Tue, 01 Jul 2025 17:51:47 GMT
- Title: Capturing Visualization Design Rationale
- Authors: Maeve Hutchinson, Radu Jianu, Aidan Slingsby, Jo Wood, Pranava Madhyastha,
- Abstract summary: We present a new dataset and methodology for probing visualization design rationale through natural language.<n>We leverage a unique source of real-world visualizations and natural language narratives: literate visualization notebooks created by students as part of a data visualization course.<n>We also use large language models (LLMs) to generate and categorize question-answer-rationale triples from the narratives and articulations in the notebooks.
- Score: 5.051297047598238
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prior natural language datasets for data visualization have focused on tasks such as visualization literacy assessment, insight generation, and visualization generation from natural language instructions. These studies often rely on controlled setups with purpose-built visualizations and artificially constructed questions. As a result, they tend to prioritize the interpretation of visualizations, focusing on decoding visualizations rather than understanding their encoding. In this paper, we present a new dataset and methodology for probing visualization design rationale through natural language. We leverage a unique source of real-world visualizations and natural language narratives: literate visualization notebooks created by students as part of a data visualization course. These notebooks combine visual artifacts with design exposition, in which students make explicit the rationale behind their design decisions. We also use large language models (LLMs) to generate and categorize question-answer-rationale triples from the narratives and articulations in the notebooks. We then carefully validate the triples and curate a dataset that captures and distills the visualization design choices and corresponding rationales of the students.
Related papers
- Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition [50.86415025650168]
Masked image modeling (MIM) tends to exploit local structures to reconstruct visual patterns, resulting in limited linguistic knowledge.<n>We propose a Linguistics-aware Masked Image Modeling (LMIM) approach, which channels the linguistic information into the decoding process of MIM through a separate branch.
arXiv Detail & Related papers (2025-03-24T14:53:35Z) - Natural Language Generation for Visualizations: State of the Art, Challenges and Future Directions [7.064953237013352]
We focus on the research works that focus on text generation for visualizations.
To characterize the NLG problem and the design space of proposed solutions, we pose five Wh-questions.
We categorize the solutions used in the surveyed papers based on these "five Wh-questions"
arXiv Detail & Related papers (2024-09-29T15:53:18Z) - Beyond Embeddings: The Promise of Visual Table in Visual Reasoning [38.558250602212425]
We propose Visual Table, a novel form of visual representation tailored for visual reasoning.
Visual tables are constructed as hierarchical descriptions of visual scenes, featuring a scene description and multiple object-centric descriptions.
They deliver instance-level world knowledge and detailed attributes that are essential for visual reasoning.
arXiv Detail & Related papers (2024-03-27T04:49:23Z) - Visually Dehallucinative Instruction Generation [0.8192907805418583]
This paper presents a novel and scalable method for generating visually dehallucinative instructions, dubbed CAP2QA, that constrains the scope to only image contents.
It shows that our proposed method significantly reduces visual hallucination while consistently improving visual recognition ability and expressiveness.
arXiv Detail & Related papers (2024-02-13T10:25:45Z) - Visual Storytelling with Question-Answer Plans [70.89011289754863]
We present a novel framework which integrates visual representations with pretrained language models and planning.
Our model translates the image sequence into a visual prefix, a sequence of continuous embeddings which language models can interpret.
It also leverages a sequence of question-answer pairs as a blueprint plan for selecting salient visual concepts and determining how they should be assembled into a narrative.
arXiv Detail & Related papers (2023-10-08T21:45:34Z) - Localization vs. Semantics: Visual Representations in Unimodal and
Multimodal Models [57.08925810659545]
We conduct a comparative analysis of the visual representations in existing vision-and-language models and vision-only models.
Our empirical observations suggest that vision-and-language models are better at label prediction tasks.
We hope our study sheds light on the role of language in visual learning, and serves as an empirical guide for various pretrained models.
arXiv Detail & Related papers (2022-12-01T05:00:18Z) - Visual Superordinate Abstraction for Robust Concept Learning [80.15940996821541]
Concept learning constructs visual representations that are connected to linguistic semantics.
We ascribe the bottleneck to a failure of exploring the intrinsic semantic hierarchy of visual concepts.
We propose a visual superordinate abstraction framework for explicitly modeling semantic-aware visual subspaces.
arXiv Detail & Related papers (2022-05-28T14:27:38Z) - From Two to One: A New Scene Text Recognizer with Visual Language
Modeling Network [70.47504933083218]
We propose a Visual Language Modeling Network (VisionLAN), which views the visual and linguistic information as a union.
VisionLAN significantly improves the speed by 39% and adaptively considers the linguistic information to enhance the visual features for accurate recognition.
arXiv Detail & Related papers (2021-08-22T07:56:24Z) - Natural Language Rationales with Full-Stack Visual Reasoning: From
Pixels to Semantic Frames to Commonsense Graphs [106.15931418425906]
We present the first study focused on generating natural language rationales across several complex visual reasoning tasks.
We present RationaleVT Transformer, an integrated model that learns to generate free-text rationales by combining pretrained language models with object recognition, grounded visual semantic frames, and visual commonsense graphs.
Our experiments show that the base pretrained language model benefits from visual adaptation and that free-text rationalization is a promising research direction to complement model interpretability for complex visual-textual reasoning tasks.
arXiv Detail & Related papers (2020-10-15T05:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.