Related papers: Athanor: Authoring Action Modification-based Interactions on Static Visualizations via Natural Language

Athanor: Authoring Action Modification-based Interactions on Static Visualizations via Natural Language

URL: http://arxiv.org/abs/2601.17736v1
Date: Sun, 25 Jan 2026 08:08:42 GMT
Title: Athanor: Authoring Action Modification-based Interactions on Static Visualizations via Natural Language
Authors: Can Liu, Jaeuk Lee, Tianhe Chen, Zhibang Jiang, Xiaolin Wen, Yong Wang,
Abstract summary: Athanor is a novel approach to transform existing static visualizations into interactive ones using multimodal large language models and natural language instructions.<n>Athanor allows users to effortlessly author interactions through natural language instructions, eliminating the need for programming.
Score: 9.92682960014568
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Interactivity is crucial for effective data visualizations. However, it is often challenging to implement interactions for existing static visualizations, since the underlying code and data for existing static visualizations are often not available, and it also takes significant time and effort to enable interactions for them even if the original code and data are available. To fill this gap, we propose Athanor, a novel approach to transform existing static visualizations into interactive ones using multimodal large language models (MLLMs) and natural language instructions. Our approach introduces three key innovations: (1) an action-modification interaction design space that maps visualization interactions into user actions and corresponding adjustments, (2) a multi-agent requirement analyzer that translates natural language instructions into an actionable operational space, and (3) a visualization abstraction transformer that converts static visualizations into flexible and interactive representations regardless of their underlying implementation. Athanor allows users to effortlessly author interactions through natural language instructions, eliminating the need for programming. We conducted two case studies and in-depth interviews with target users to evaluate our approach. The results demonstrate the effectiveness and usability of our approach in allowing users to conveniently enable flexible interactions for static visualizations.

Related papers

Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models [80.28579390566298]
We introduce Interact2Ar, a text-conditioned autoregressive diffusion model for generating full-body, human-human interactions.<n>Hand kinematics are incorporated through dedicated parallel branches, enabling high-fidelity full-body generation.<n>Our model enables a series of downstream applications, including temporal motion composition, real-time adaptation to disturbances, and extension beyond dyadic to multi-person scenarios.
arXiv Detail & Related papers (2025-12-22T18:59:50Z)
InteracTalker: Prompt-Based Human-Object Interaction with Co-Speech Gesture Generation [1.7523719472700858]
We introduce InteracTalker, a novel framework that seamlessly integrates prompt-based object-aware interactions with co-speech gesture generation.<n>Our framework utilizes a Generalized Motion Adaptation Module that enables independent training, adapting to the corresponding motion condition.<n>InteracTalker successfully unifies these previously separate tasks, outperforming prior methods in both co-speech gesture generation and object-interaction synthesis.
arXiv Detail & Related papers (2025-12-14T12:29:49Z)
Contextualized Representation Learning for Effective Human-Object Interaction Detection [17.242400169885453]
Human-Object Interaction (HOI) detection aims to simultaneously localize human-object pairs and recognize their interactions.<n>We introduce a Contextualized Representation Learning that integrates both affordance-guided reasoning and contextual prompts.<n>Our proposed method demonstrates superior performance on both the HICO-Det and V-COCO datasets in most scenarios.
arXiv Detail & Related papers (2025-09-16T08:03:16Z)
Dynamic Scoring with Enhanced Semantics for Training-Free Human-Object Interaction Detection [51.52749744031413]
Human-Object Interaction (HOI) detection aims to identify humans and objects within images and interpret their interactions.<n>Existing HOI methods rely heavily on large datasets with manual annotations to learn interactions from visual cues.<n>We propose a novel training-free HOI detection framework for Dynamic Scoring with enhanced semantics.
arXiv Detail & Related papers (2025-07-23T12:30:19Z)
Seeing Beyond the Scene: Enhancing Vision-Language Models with Interactional Reasoning [27.511627003202538]
Traditional scene graphs primarily focus on spatial relationships, limiting vision-language models' (VLMs) ability to reason about complex interactions in visual scenes.<n>This paper addresses two key challenges: (1) conventional detection-to-construction methods produce unfocused, contextually irrelevant relationship sets, and (2) existing approaches fail to form persistent memories for generalizing interaction reasoning to new scenes.<n>We propose Interaction-augmented Scene Graph Reasoning (ISGR), a framework that enhances VLMs' interactional reasoning through three complementary components.
arXiv Detail & Related papers (2025-05-14T04:04:23Z)
InterChat: Enhancing Generative Visual Analytics using Multimodal Interactions [22.007942964950217]
We develop InterChat, a generative visual analytics system that combines direct manipulation of visual elements with natural language inputs.<n>This integration enables precise intent communication and supports progressive, visually driven exploratory data analyses.
arXiv Detail & Related papers (2025-03-06T05:35:19Z)
Flex: End-to-End Text-Instructed Visual Navigation from Foundation Model Features [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.<n>Our findings are synthesized in Flex (Fly lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.<n>We demonstrate the effectiveness of this approach on a quadrotor fly-to-target task, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z)
Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues. Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z)
Learning Action-Effect Dynamics from Pairs of Scene-graphs [50.72283841720014]
We propose a novel method that leverages scene-graph representation of images to reason about the effects of actions described in natural language. Our proposed approach is effective in terms of performance, data efficiency, and generalization capability compared to existing models.
arXiv Detail & Related papers (2022-12-07T03:36:37Z)
VIRT: Improving Representation-based Models for Text Matching through Virtual Interaction [50.986371459817256]
We propose a novel textitVirtual InteRacTion mechanism, termed as VIRT, to enable full and deep interaction modeling in representation-based models. VIRT asks representation-based encoders to conduct virtual interactions to mimic the behaviors as interaction-based models do.
arXiv Detail & Related papers (2021-12-08T09:49:28Z)
Towards Transparent Interactive Semantic Parsing via Step-by-Step Correction [17.000283696243564]
We investigate an interactive semantic parsing framework that explains the predicted logical form step by step in natural language. We focus on question answering over knowledge bases (KBQA) as an instantiation of our framework. Our experiments show that the interactive framework with human feedback has the potential to greatly improve overall parse accuracy.
arXiv Detail & Related papers (2021-10-15T20:11:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.