Creating User-steerable Projections with Interactive Semantic Mapping
- URL: http://arxiv.org/abs/2506.15479v1
- Date: Wed, 18 Jun 2025 14:10:38 GMT
- Title: Creating User-steerable Projections with Interactive Semantic Mapping
- Authors: Artur André Oliveira, Mateus Espadoto, Roberto Hirata Jr., Roberto M. Cesar Jr., Alex C. Telea,
- Abstract summary: We introduce a novel user-guided projection framework for image and text data.<n>We enable users to steer projections dynamically via natural-language guiding prompts.<n>Our approach bridges the gap between fully automated DR techniques and human-centered data exploration.
- Score: 1.056341184072737
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dimensionality reduction (DR) techniques map high-dimensional data into lower-dimensional spaces. Yet, current DR techniques are not designed to explore semantic structure that is not directly available in the form of variables or class labels. We introduce a novel user-guided projection framework for image and text data that enables customizable, interpretable, data visualizations via zero-shot classification with Multimodal Large Language Models (MLLMs). We enable users to steer projections dynamically via natural-language guiding prompts, to specify high-level semantic relationships of interest to the users which are not explicitly present in the data dimensions. We evaluate our method across several datasets and show that it not only enhances cluster separation, but also transforms DR into an interactive, user-driven process. Our approach bridges the gap between fully automated DR techniques and human-centered data exploration, offering a flexible and adaptive way to tailor projections to specific analytical needs.
Related papers
- PATFinger: Prompt-Adapted Transferable Fingerprinting against Unauthorized Multimodal Dataset Usage [19.031839603738057]
multimodal datasets can be leveraged to pre-train vision-adapted models by providing cross-modal semantics.<n>We propose a novel prompt-language transferable fingerprinting scheme called PATFinger.<n>Our scheme utilizes inherent dataset attributes as fingerprints instead of compelling the model to learn triggers.
arXiv Detail & Related papers (2025-04-15T09:53:02Z) - QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding [53.69841526266547]
Fine-tuning a pre-trained Vision-Language Model with new datasets often falls short in optimizing the vision encoder.<n>We introduce QID, a novel, streamlined, architecture-preserving approach that integrates query embeddings into the vision encoder.
arXiv Detail & Related papers (2025-04-03T18:47:16Z) - Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - Interactive Counterfactual Generation for Univariate Time Series [7.331969743532515]
Our approach aims to enhance the transparency and understanding of deep learning models' decision processes.
By abstracting user interactions with the projected data points, our method facilitates an intuitive generation of counterfactual explanations.
We validate this method using the ECG5000 benchmark dataset, demonstrating significant improvements in interpretability and user understanding of time series classification.
arXiv Detail & Related papers (2024-08-20T08:19:55Z) - VERA: Generating Visual Explanations of Two-Dimensional Embeddings via Region Annotation [0.0]
Visual Explanations via Region (VERA) is an automatic embedding-annotation approach that generates visual explanations for any two-dimensional embedding.
VERA produces informative explanations that characterize distinct regions in the embedding space, allowing users to gain an overview of the embedding landscape at a glance.
We illustrate the usage of VERA on a real-world data set and validate the utility of our approach with a comparative user study.
arXiv Detail & Related papers (2024-06-07T10:23:03Z) - DimVis: Interpreting Visual Clusters in Dimensionality Reduction With Explainable Boosting Machine [3.2748787252933442]
DimVis is a tool that employs supervised Explainable Boosting Machine (EBM) models as an interpretation assistant for DR projections.
Our tool facilitates high-dimensional data analysis by providing an interpretation of feature relevance in visual clusters.
arXiv Detail & Related papers (2024-02-10T04:50:36Z) - DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control [68.14798033899955]
Large, pretrained latent diffusion models (LDMs) have demonstrated an extraordinary ability to generate creative content.
However, are they usable as large-scale data generators, e.g., to improve tasks in the perception stack, like semantic segmentation?
We investigate this question in the context of autonomous driving, and answer it with a resounding "yes"
arXiv Detail & Related papers (2023-12-05T18:34:12Z) - Open-Vocabulary Camouflaged Object Segmentation [66.94945066779988]
We introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS)
We construct a large-scale complex scene dataset (textbfOVCamo) containing 11,483 hand-selected images with fine annotations and corresponding object classes.
By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects.
arXiv Detail & Related papers (2023-11-19T06:00:39Z) - VLSlice: Interactive Vision-and-Language Slice Discovery [17.8634551024147]
VLSlice is an interactive system enabling user-guided discovery of coherent representation-level subgroups with consistent visiolinguistic behavior.
We show that VLSlice enables users to quickly generate diverse high-coherency slices in a user study and release the tool publicly.
arXiv Detail & Related papers (2023-09-13T04:02:38Z) - Zero-shot Composed Text-Image Retrieval [72.43790281036584]
We consider the problem of composed image retrieval (CIR)
It aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
arXiv Detail & Related papers (2023-06-12T17:56:01Z) - Generating More Pertinent Captions by Leveraging Semantics and Style on
Multi-Source Datasets [56.018551958004814]
This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources.
Large-scale datasets with noisy image-text pairs provide a sub-optimal source of supervision.
We propose to leverage and separate semantics and descriptive style through the incorporation of a style token and keywords extracted through a retrieval component.
arXiv Detail & Related papers (2021-11-24T19:00:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.