Evaluating Sparse Interpretable Word Embeddings for Biomedical Domain
- URL: http://arxiv.org/abs/2005.05114v1
- Date: Mon, 11 May 2020 13:56:58 GMT
- Title: Evaluating Sparse Interpretable Word Embeddings for Biomedical Domain
- Authors: Mohammad Amin Samadi, Mohammad Sadegh Akhondzadeh, Sayed Jalal Zahabi,
Mohammad Hossein Manshaei, Zeinab Maleki, Payman Adibi
- Abstract summary: Interpretability is a key means to justification which is an integral part when it comes to biomedical applications.
We present an inclusive study on interpretability of word embeddings in the medical domain, focusing on the role of sparse methods.
Based on our experiments, it is seen that sparse word vectors show far more interpretability while preserving the performance of their original vectors in downstream tasks.
- Score: 1.3526604206343171
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Word embeddings have found their way into a wide range of natural language
processing tasks including those in the biomedical domain. While these vector
representations successfully capture semantic and syntactic word relations,
hidden patterns and trends in the data, they fail to offer interpretability.
Interpretability is a key means to justification which is an integral part when
it comes to biomedical applications. We present an inclusive study on
interpretability of word embeddings in the medical domain, focusing on the role
of sparse methods. Qualitative and quantitative measurements and metrics for
interpretability of word vector representations are provided. For the
quantitative evaluation, we introduce an extensive categorized dataset that can
be used to quantify interpretability based on category theory. Intrinsic and
extrinsic evaluation of the studied methods are also presented. As for the
latter, we propose datasets which can be utilized for effective extrinsic
evaluation of word vectors in the biomedical domain. Based on our experiments,
it is seen that sparse word vectors show far more interpretability while
preserving the performance of their original vectors in downstream tasks.
Related papers
- How well do distributed representations convey contextual lexical semantics: a Thesis Proposal [3.3585951129432323]
In this thesis, we examine the efficacy of distributed representations from modern neural networks in encoding lexical meaning.
We identify four sources of ambiguity based on the relatedness and similarity of meanings influenced by context.
We then aim to evaluate these sources by collecting or constructing multilingual datasets, leveraging various language models, and employing linguistic analysis tools.
arXiv Detail & Related papers (2024-06-02T14:08:51Z) - Agentivit\`a e telicit\`a in GilBERTo: implicazioni cognitive [77.71680953280436]
The goal of this study is to investigate whether a Transformer-based neural language model infers lexical semantics.
The semantic properties considered are telicity (also combined with definiteness) and agentivity.
arXiv Detail & Related papers (2023-07-06T10:52:22Z) - Leveraging knowledge graphs to update scientific word embeddings using
latent semantic imputation [0.0]
We show how glslsi can impute embeddings for domain-specific words from up-to-date knowledge graphs.
We show that LSI can produce reliable embedding vectors for rare and OOV terms in the biomedical domain.
arXiv Detail & Related papers (2022-10-27T12:15:26Z) - An Informational Space Based Semantic Analysis for Scientific Texts [62.997667081978825]
This paper introduces computational methods for semantic analysis and the quantifying the meaning of short scientific texts.
The representation of scientific-specific meaning is standardised by replacing the situation representations, rather than psychological properties.
The research in this paper conducts the base for the geometric representation of the meaning of texts.
arXiv Detail & Related papers (2022-05-31T11:19:32Z) - Discriminative Attribution from Counterfactuals [64.94009515033984]
We present a method for neural network interpretability by combining feature attribution with counterfactual explanations.
We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner.
arXiv Detail & Related papers (2021-09-28T00:53:34Z) - Semantic Analysis for Automated Evaluation of the Potential Impact of
Research Articles [62.997667081978825]
This paper presents a novel method for vector representation of text meaning based on information theory.
We show how this informational semantics is used for text classification on the basis of the Leicester Scientific Corpus.
We show that an informational approach to representing the meaning of a text has offered a way to effectively predict the scientific impact of research papers.
arXiv Detail & Related papers (2021-04-26T20:37:13Z) - Group-Sparse Matrix Factorization for Transfer Learning of Word
Embeddings [31.849734024331283]
We propose an intuitive estimator that exploits structure via a groupsparse penalty to efficiently transfer learn domainspecific word embeddings.
We prove that all local minima identified by our noncorpora objective function are statistically indistinguishable from the minimum under standard regularization conditions.
arXiv Detail & Related papers (2021-04-18T18:19:03Z) - Prototypical Representation Learning for Relation Extraction [56.501332067073065]
This paper aims to learn predictive, interpretable, and robust relation representations from distantly-labeled data.
We learn prototypes for each relation from contextual information to best explore the intrinsic semantics of relations.
Results on several relation learning tasks show that our model significantly outperforms the previous state-of-the-art relational models.
arXiv Detail & Related papers (2021-03-22T08:11:43Z) - Knowledge-Base Enriched Word Embeddings for Biomedical Domain [5.086571902225929]
We propose a new word embedding based model for biomedical domain that jointly leverages the information from available corpora and domain knowledge.
Unlike existing approaches, the proposed methodology is simple but adept at capturing the precise knowledge available in domain resources in an accurate way.
arXiv Detail & Related papers (2021-02-20T18:18:51Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.