Related papers: LayerFlow: Layer-wise Exploration of LLM Embeddings using Uncertainty-aware Interlinked Projections

LayerFlow: Layer-wise Exploration of LLM Embeddings using Uncertainty-aware Interlinked Projections

URL: http://arxiv.org/abs/2504.10504v1
Date: Wed, 09 Apr 2025 12:24:58 GMT
Title: LayerFlow: Layer-wise Exploration of LLM Embeddings using Uncertainty-aware Interlinked Projections
Authors: Rita Sevastjanova, Robin Gerling, Thilo Spinner, Mennatallah El-Assady,
Abstract summary: LayerFlow is a visual analytics workspace that displays embeddings in an interlinked projection design.<n>It communicates the transformation, representation, and interpretation uncertainty.<n>We show the usability of the presented workspace through replication and expert case studies.
Score: 11.252261879736102
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) represent words through contextual word embeddings encoding different language properties like semantics and syntax. Understanding these properties is crucial, especially for researchers investigating language model capabilities, employing embeddings for tasks related to text similarity, or evaluating the reasons behind token importance as measured through attribution methods. Applications for embedding exploration frequently involve dimensionality reduction techniques, which reduce high-dimensional vectors to two dimensions used as coordinates in a scatterplot. This data transformation step introduces uncertainty that can be propagated to the visual representation and influence users' interpretation of the data. To communicate such uncertainties, we present LayerFlow - a visual analytics workspace that displays embeddings in an interlinked projection design and communicates the transformation, representation, and interpretation uncertainty. In particular, to hint at potential data distortions and uncertainties, the workspace includes several visual components, such as convex hulls showing 2D and HD clusters, data point pairwise distances, cluster summaries, and projection quality metrics. We show the usability of the presented workspace through replication and expert case studies that highlight the need to communicate uncertainty through multiple visual components and different data perspectives.

Related papers

A Multimodal Depth-Aware Method For Embodied Reference Understanding [56.30142869506262]
Embodied Reference Understanding requires identifying a target object in a visual scene based on both language instructions and pointing cues.<n>We propose a novel ERU framework that jointly leverages data augmentation, depth-map modality, and a depth-aware decision module.
arXiv Detail & Related papers (2025-10-09T14:32:21Z)
Mind-the-Glitch: Visual Correspondence for Detecting Inconsistencies in Subject-Driven Generation [120.23172120151821]
We propose a novel approach for disentangling visual and semantic features from the backbones of pre-trained diffusion models.<n>We introduce an automated pipeline that constructs image pairs with annotated semantic and visual correspondences.<n>We propose a new metric, Visual Semantic Matching, that quantifies visual inconsistencies in subject-driven image generation.
arXiv Detail & Related papers (2025-09-26T07:11:55Z)
Contributions to Label-Efficient Learning in Computer Vision and Remote Sensing [6.091702876917279]
This manuscript presents a series of selected contributions to the topic of label-efficient learning in computer vision and remote sensing.<n>The central focus of this research is to develop and adapt methods that can learn effectively from limited or partially annotated data.<n>The contributions span both methodological developments and domain-specific adaptations, in particular addressing challenges unique to Earth observation data.
arXiv Detail & Related papers (2025-08-21T21:31:50Z)
Generalized Decoupled Learning for Enhancing Open-Vocabulary Dense Perception [71.26728044621458]
DeCLIP is a novel framework that enhances CLIP by decoupling the self-attention module to obtain content'' and context'' features respectively.<n>It consistently achieves state-of-the-art performance across a broad spectrum of tasks, including 2D detection and segmentation, 3D instance segmentation, video instance segmentation, and 6D object pose estimation.
arXiv Detail & Related papers (2025-08-15T06:43:51Z)
LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance [56.474856189865946]
Large multi-modal models (LMMs) struggle with inaccurate segmentation and hallucinated comprehension.<n>We propose LIRA, a framework that capitalizes on the complementary relationship between visual comprehension and segmentation.<n>LIRA achieves state-of-the-art performance in both segmentation and comprehension tasks.
arXiv Detail & Related papers (2025-07-08T07:46:26Z)
Enhancing Spatial Reasoning in Multimodal Large Language Models through Reasoning-based Segmentation [50.81551581148339]
We introduce Relevant Reasoning (R$2$S), a reasoning-based segmentation framework.<n>We also introduce 3D ReasonSeg, a reasoning-based segmentation dataset.<n>Both experiments demonstrate that the R$2$S and 3D ReasonSeg effectively endow 3D point cloud perception with stronger spatial reasoning capabilities.
arXiv Detail & Related papers (2025-06-29T06:58:08Z)
Ambiguity Resolution in Text-to-Structured Data Mapping [10.285528620331696]
Ambiguity in natural language is a significant obstacle for achieving accurate text to structured data mapping.<n>We propose a new framework to improve the performance of large language models (LLMs) on ambiguous agentic tool calling through missing concepts prediction.
arXiv Detail & Related papers (2025-05-16T20:39:30Z)
Disentangling Linguistic Features with Dimension-Wise Analysis of Vector Embeddings [0.0]
This paper proposes a framework for uncovering the specific dimensions of vector embeddings that encode distinct linguistic properties (LPs) We introduce the Linguistically Distinct Sentence Pairs dataset, which isolates ten key linguistic features such as synonymy, negation, tense, and quantity. Using this dataset, we analyze BERT embeddings with various methods to identify the most influential dimensions for each LP. Our findings show that certain properties, such as negation and polarity, are robustly encoded in specific dimensions, while others, like synonymy, exhibit more complex patterns.
arXiv Detail & Related papers (2025-04-20T23:38:16Z)
Mitigating Knowledge Discrepancies among Multiple Datasets for Task-agnostic Unified Face Alignment [30.501432077729245]
Despite the similar structures of human faces, existing face alignment methods cannot learn unified knowledge from multiple datasets.<n>This paper presents a strategy to unify knowledge from multiple datasets.<n>The successful mitigation of discrepancies also enhances the efficiency of knowledge transferring to a novel dataset.
arXiv Detail & Related papers (2025-03-28T11:59:27Z)
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding [79.43306110124875]
AlignVLM is a vision-text alignment method that maps visual features to a weighted average of text embeddings.<n>Our experiments show that AlignVLM achieves state-of-the-art performance compared to prior alignment methods.
arXiv Detail & Related papers (2025-02-03T13:34:51Z)
Teaching VLMs to Localize Specific Objects from In-context Examples [56.797110842152]
We find that present-day Vision-Language Models (VLMs) lack a fundamental cognitive ability: learning to localize specific objects in a scene by taking into account the context.<n>This work is the first to explore and benchmark personalized few-shot localization for VLMs.
arXiv Detail & Related papers (2024-11-20T13:34:22Z)
DimVis: Interpreting Visual Clusters in Dimensionality Reduction With Explainable Boosting Machine [3.2748787252933442]
DimVis is a tool that employs supervised Explainable Boosting Machine (EBM) models as an interpretation assistant for DR projections. Our tool facilitates high-dimensional data analysis by providing an interpretation of feature relevance in visual clusters.
arXiv Detail & Related papers (2024-02-10T04:50:36Z)
Open-Vocabulary Camouflaged Object Segmentation [66.94945066779988]
We introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS) We construct a large-scale complex scene dataset (textbfOVCamo) containing 11,483 hand-selected images with fine annotations and corresponding object classes. By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects.
arXiv Detail & Related papers (2023-11-19T06:00:39Z)
An Integral Projection-based Semantic Autoencoder for Zero-Shot Learning [0.46644955105516456]
Zero-shot Learning (ZSL) classification categorizes or predicts classes (labels) that are not included in the training set (unseen classes) Recent works proposed different semantic autoencoder (SAE) models where the encoder embeds a visual feature space into the semantic space and the decoder reconstructs the original visual feature space. We propose an integral projection-based semantic autoencoder (IP-SAE) where an encoder projects a visual feature space vectord with the semantic space into a latent representation space.
arXiv Detail & Related papers (2023-06-26T12:06:20Z)
Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets. We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models. Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z)
Visual Perturbation-aware Collaborative Learning for Overcoming the Language Prior Problem [60.0878532426877]
We propose a novel collaborative learning scheme from the viewpoint of visual perturbation calibration. Specifically, we devise a visual controller to construct two sorts of curated images with different perturbation extents. The experimental results on two diagnostic VQA-CP benchmark datasets evidently demonstrate its effectiveness.
arXiv Detail & Related papers (2022-07-24T23:50:52Z)
Exploring Dimensionality Reduction Techniques in Multilingual Transformers [64.78260098263489]
This paper gives a comprehensive account of the impact of dimensional reduction techniques on the performance of state-of-the-art multilingual Siamese Transformers. It shows that it is possible to achieve an average reduction in the number of dimensions of $91.58% pm 2.59%$ and $54.65% pm 32.20%$, respectively.
arXiv Detail & Related papers (2022-04-18T17:20:55Z)
High-dimensional distributed semantic spaces for utterances [0.2907403645801429]
This paper describes a model for high-dimensional representation for utterance and text level data. It is based on a mathematically principled and behaviourally plausible approach to representing linguistic information. The paper shows how the implemented model is able to represent a broad range of linguistic features in a common integral framework of fixed dimensionality.
arXiv Detail & Related papers (2021-04-01T12:09:47Z)
Prototypical Representation Learning for Relation Extraction [56.501332067073065]
This paper aims to learn predictive, interpretable, and robust relation representations from distantly-labeled data. We learn prototypes for each relation from contextual information to best explore the intrinsic semantics of relations. Results on several relation learning tasks show that our model significantly outperforms the previous state-of-the-art relational models.
arXiv Detail & Related papers (2021-03-22T08:11:43Z)
Contrastive analysis for scatter plot-based representations of dimensionality reduction [0.0]
This paper introduces a methodology to explore multidimensional datasets and interpret clusters' formation. We also introduce a bipartite graph to visually interpret and explore the relationship between the statistical variables used to understand how the attributes influenced cluster formation.
arXiv Detail & Related papers (2021-01-26T01:16:31Z)
Understanding Spatial Relations through Multiple Modalities [78.07328342973611]
spatial relations between objects can either be explicit -- expressed as spatial prepositions, or implicit -- expressed by spatial verbs such as moving, walking, shifting, etc. We introduce the task of inferring implicit and explicit spatial relations between two entities in an image. We design a model that uses both textual and visual information to predict the spatial relations, making use of both positional and size information of objects and image embeddings.
arXiv Detail & Related papers (2020-07-19T01:35:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.