Interpretable Text Embeddings and Text Similarity Explanation: A Survey
- URL: http://arxiv.org/abs/2502.14862v2
- Date: Thu, 02 Oct 2025 15:24:21 GMT
- Title: Interpretable Text Embeddings and Text Similarity Explanation: A Survey
- Authors: Juri Opitz, Lucas Möller, Andrianos Michail, Sebastian Padó, Simon Clematide,
- Abstract summary: We provide a structured overview of methods specializing in inherently interpretable text embeddings and text similarity explanation.<n>We compare means of evaluation, discuss overarching lessons learned and finally identify opportunities and open challenges for future research.
- Score: 14.332308036519064
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text embeddings are a fundamental component in many NLP tasks, including classification, regression, clustering, and semantic search. However, despite their ubiquitous application, challenges persist in interpreting embeddings and explaining similarities between them. In this work, we provide a structured overview of methods specializing in inherently interpretable text embeddings and text similarity explanation, an underexplored research area. We characterize the main ideas, approaches, and trade-offs. We compare means of evaluation, discuss overarching lessons learned and finally identify opportunities and open challenges for future research.
Related papers
- Integration of Contextual Descriptors in Ontology Alignment for Enrichment of Semantic Correspondence [13.69268253901738]
A formalization was developed that enables the integration of essential and contextual descriptors to create a comprehensive knowledge model.<n>The hierarchical structure of the semantic approach and the mathematical apparatus for analyzing potential conflicts between concepts are demonstrated.
arXiv Detail & Related papers (2024-11-28T12:59:32Z) - Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored.
Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges.
We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z) - Studying Socially Unacceptable Discourse Classification (SUD) through
different eyes: "Are we on the same page ?" [4.87717454493713]
We first build and present a novel corpus that contains a large variety of manually annotated texts from different online sources.
This global context allows us to test the generalization ability of SUD classifiers.
From this perspective, we can analyze how (possibly) different annotation modalities influence SUD learning.
arXiv Detail & Related papers (2023-08-08T10:42:33Z) - Composition-contrastive Learning for Sentence Embeddings [23.85590618900386]
This work is the first to do so without incurring costs in auxiliary training objectives or additional network parameters.
Experimental results on semantic textual similarity tasks show improvements over baselines that are comparable with state-of-the-art approaches.
arXiv Detail & Related papers (2023-07-14T14:39:35Z) - Natural Language Decompositions of Implicit Content Enable Better Text
Representations [56.85319224208865]
We introduce a method for the analysis of text that takes implicitly communicated content explicitly into account.
We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed.
Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP.
arXiv Detail & Related papers (2023-05-23T23:45:20Z) - PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and
Entailment Recognition [63.51569687229681]
We argue for the need to recognize the textual entailment relation of each proposition in a sentence individually.
We propose PropSegmEnt, a corpus of over 45K propositions annotated by expert human raters.
Our dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2) classifying the entailment relation of each proposition with respect to a different yet topically-aligned document.
arXiv Detail & Related papers (2022-12-21T04:03:33Z) - An Inclusive Notion of Text [69.36678873492373]
We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP.
We introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling.
arXiv Detail & Related papers (2022-11-10T14:26:43Z) - Textual Entailment Recognition with Semantic Features from Empirical
Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text.
In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis.
We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z) - Interpreting BERT-based Text Similarity via Activation and Saliency Maps [26.279593839644836]
We present an unsupervised technique for explaining paragraph similarities inferred by pre-trained BERT models.
By looking at a pair of paragraphs, our technique identifies important words that dictate each paragraph's semantics, matches between the words in both paragraphs, and retrieves the most important pairs that explain the similarity between the two.
arXiv Detail & Related papers (2022-08-13T10:06:24Z) - On the Faithfulness Measurements for Model Interpretations [100.2730234575114]
Post-hoc interpretations aim to uncover how natural language processing (NLP) models make predictions.
To tackle these issues, we start with three criteria: the removal-based criterion, the sensitivity of interpretations, and the stability of interpretations.
Motivated by the desideratum of these faithfulness notions, we introduce a new class of interpretation methods that adopt techniques from the adversarial domain.
arXiv Detail & Related papers (2021-04-18T09:19:44Z) - TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between
Corpora [14.844685568451833]
We introduce TextEssence, an interactive system designed to enable comparative analysis of corpora using embeddings.
TextEssence includes visual, neighbor-based, and similarity-based modes of embedding analysis in a lightweight, web-based interface.
arXiv Detail & Related papers (2021-03-19T21:26:28Z) - Interpretable Deep Learning: Interpretations, Interpretability,
Trustworthiness, and Beyond [49.93153180169685]
We introduce and clarify two basic concepts-interpretations and interpretability-that people usually get confused.
We elaborate the design of several recent interpretation algorithms, from different perspectives, through proposing a new taxonomy.
We summarize the existing work in evaluating models' interpretability using "trustworthy" interpretation algorithms.
arXiv Detail & Related papers (2021-03-19T08:40:30Z) - "Let's Eat Grandma": When Punctuation Matters in Sentence Representation
for Sentiment Analysis [13.873803872380229]
We argue that punctuation could play a significant role in sentiment analysis and propose a novel representation model to improve syntactic and contextual performance.
We conduct experiments on publicly available datasets and verify that our model can identify the sentiments more accurately over other state-of-the-art baseline methods.
arXiv Detail & Related papers (2020-12-10T19:07:31Z) - XTE: Explainable Text Entailment [8.036150169408241]
Entailment is the task of determining whether a piece of text logically follows from another piece of text.
XTE - Explainable Text Entailment - is a novel composite approach for recognizing text entailment.
arXiv Detail & Related papers (2020-09-25T20:49:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.