Related papers: Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval

Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval

URL: http://arxiv.org/abs/2506.00041v1
Date: Wed, 28 May 2025 02:50:17 GMT
Title: Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval
Authors: Seongwan Park, Taeklim Kim, Youngjoong Ko,
Abstract summary: We propose a novel interpretability framework for Dense Passage Retrieval (DPR) models.<n>We generate natural language descriptions for each latent concept, enabling human interpretations of both the dense embeddings and the query-document similarity scores of DPR models.<n>We show that Concept-Level Sparse Retrieval (CL-SR) achieves high index-space and computational efficiency while maintaining robust performance across vocabulary and semantic mismatches.
Score: 13.31210969917096
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite their strong performance, Dense Passage Retrieval (DPR) models suffer from a lack of interpretability. In this work, we propose a novel interpretability framework that leverages Sparse Autoencoders (SAEs) to decompose previously uninterpretable dense embeddings from DPR models into distinct, interpretable latent concepts. We generate natural language descriptions for each latent concept, enabling human interpretations of both the dense embeddings and the query-document similarity scores of DPR models. We further introduce Concept-Level Sparse Retrieval (CL-SR), a retrieval framework that directly utilizes the extracted latent concepts as indexing units. CL-SR effectively combines the semantic expressiveness of dense embeddings with the transparency and efficiency of sparse representations. We show that CL-SR achieves high index-space and computational efficiency while maintaining robust performance across vocabulary and semantic mismatches.

Related papers

Unveiling Decision-Making in LLMs for Text Classification : Extraction of influential and interpretable concepts with Sparse Autoencoders [0.0]
We present a novel SAE-based architecture tailored for text classification.<n>We benchmark this architecture against established methods such as ConceptShap, Independent Component Analysis, and other SAE-based concept extraction techniques.<n>Our empirical results show that our architecture improves both the causality and interpretability of the extracted features.
arXiv Detail & Related papers (2025-06-30T15:18:50Z)
Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit [16.996218963146788]
Sparse autoencoders (SAEs) have recently become central tools for interpretability.<n>This paper evaluates SAEs in a controlled setting using MNIST.<n>We introduce a multi-iteration SAE by unrolling Matching Pursuit (MP-SAE)
arXiv Detail & Related papers (2025-06-05T16:57:58Z)
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [76.15163242945813]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence.<n>We introduce a novel generative model that generates tokens on the basis of human-interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z)
Retrieval-Augmented Semantic Parsing: Improving Generalization with Lexical Knowledge [6.948555996661213]
We introduce Retrieval-Augmented Semantic Parsing (RASP), a simple yet effective approach that integrates external symbolic knowledge into the parsing process.<n>Our experiments show that LLMs outperform previous encoder-decoder baselines for semantic parsing.<n>RASP further enhances their ability to predict unseen concepts, nearly doubling the performance of previous models on out-of-distribution concepts.
arXiv Detail & Related papers (2024-12-13T15:30:20Z)
Disentangling Dense Embeddings with Sparse Autoencoders [0.0]
Sparse autoencoders (SAEs) have shown promise in extracting interpretable features from complex neural networks. We present one of the first applications of SAEs to dense text embeddings from large language models. We show that the resulting sparse representations maintain semantic fidelity while offering interpretability.
arXiv Detail & Related papers (2024-08-01T15:46:22Z)
Hierarchical Indexing for Retrieval-Augmented Opinion Summarization [60.5923941324953]
We propose a method for unsupervised abstractive opinion summarization that combines the attributability and scalability of extractive approaches with the coherence and fluency of Large Language Models (LLMs) Our method, HIRO, learns an index structure that maps sentences to a path through a semantically organized discrete hierarchy. At inference time, we populate the index and use it to identify and retrieve clusters of sentences containing popular opinions from input reviews.
arXiv Detail & Related papers (2024-03-01T10:38:07Z)
Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE) [22.364723506539974]
We show that the semantic structure of CLIP's latent space can be leveraged to provide interpretability. We propose a novel method, Sparse Linear Concept Embeddings, for transforming CLIP representations into sparse linear combinations of human-interpretable concepts.
arXiv Detail & Related papers (2024-02-16T00:04:36Z)
Prompt-based Logical Semantics Enhancement for Implicit Discourse Relation Recognition [4.7938839332508945]
We propose a Prompt-based Logical Semantics Enhancement (PLSE) method for Implicit Discourse Relation Recognition (IDRR) Our method seamlessly injects knowledge relevant to discourse relation into pre-trained language models through prompt-based connective prediction. Experimental results on PDTB 2.0 and CoNLL16 datasets demonstrate that our method achieves outstanding and consistent performance against the current state-of-the-art models.
arXiv Detail & Related papers (2023-11-01T08:38:08Z)
Scalable Learning of Latent Language Structure With Logical Offline Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text. As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z)
What Are You Token About? Dense Retrieval as Distributions Over the Vocabulary [68.77983831618685]
We propose to interpret the vector representations produced by dual encoders by projecting them into the model's vocabulary space. We show that the resulting projections contain rich semantic information, and draw connection between them and sparse retrieval.
arXiv Detail & Related papers (2022-12-20T16:03:25Z)
UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query. Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms. We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z)
Improve Variational Autoencoder for Text Generationwith Discrete Latent Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning. VAEs tend to ignore latent variables with a strong auto-regressive decoder. We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.