Topological Data Analysis for Word Sense Disambiguation
- URL: http://arxiv.org/abs/2203.00565v1
- Date: Tue, 1 Mar 2022 15:41:54 GMT
- Title: Topological Data Analysis for Word Sense Disambiguation
- Authors: Michael Rawson, Samuel Dooley, Mithun Bharadwaj, and Rishabh Choudhary
- Abstract summary: We develop and test a novel unsupervised algorithm for word sense induction and disambiguation.
Our approach relies on advanced mathematical concepts in the field of topology which provides a richer conceptualization of clusters for the word sense induction tasks.
This shows the promise of topological algorithms for natural language processing and we advocate for future work in this promising area.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop and test a novel unsupervised algorithm for word sense induction
and disambiguation which uses topological data analysis. Typical approaches to
the problem involve clustering, based on simple low level features of distance
in word embeddings. Our approach relies on advanced mathematical concepts in
the field of topology which provides a richer conceptualization of clusters for
the word sense induction tasks. We use a persistent homology barcode algorithm
on the SemCor dataset and demonstrate that our approach gives low relative
error on word sense induction. This shows the promise of topological algorithms
for natural language processing and we advocate for future work in this
promising area.
Related papers
- Topograph: An efficient Graph-Based Framework for Strictly Topology Preserving Image Segmentation [78.54656076915565]
Topological correctness plays a critical role in many image segmentation tasks.
Most networks are trained using pixel-wise loss functions, such as Dice, neglecting topological accuracy.
We propose a novel, graph-based framework for topologically accurate image segmentation.
arXiv Detail & Related papers (2024-11-05T16:20:14Z) - Graph-Boosted Attentive Network for Semantic Body Parsing [1.4042211166197214]
This paper proposes a novel approach to decomposing multiple human bodies into semantic part regions in unconstrained environments.
We propose a convolutional neural network architecture which comprises of novel semantic and contour attention mechanisms across feature hierarchy.
Our proposed method achieves the state-of-art results on the challenging Pascal Person-Part dataset.
arXiv Detail & Related papers (2024-07-08T13:32:01Z) - Word Sense Induction with Hierarchical Clustering and Mutual Information
Maximization [14.997937028599255]
Word sense induction is a difficult problem in natural language processing.
We propose a novel unsupervised method based on hierarchical clustering and invariant information clustering.
We empirically demonstrate that, in certain cases, our approach outperforms prior WSI state-of-the-art methods.
arXiv Detail & Related papers (2022-10-11T13:04:06Z) - Fine-Grained Visual Entailment [51.66881737644983]
We propose an extension of this task, where the goal is to predict the logical relationship of fine-grained knowledge elements within a piece of text to an image.
Unlike prior work, our method is inherently explainable and makes logical predictions at different levels of granularity.
We evaluate our method on a new dataset of manually annotated knowledge elements and show that our method achieves 68.18% accuracy at this challenging task.
arXiv Detail & Related papers (2022-03-29T16:09:38Z) - Semantic Search for Large Scale Clinical Ontologies [63.71950996116403]
We present a deep learning approach to build a search system for large clinical vocabularies.
We propose a Triplet-BERT model and a method that generates training data based on semantic training data.
The model is evaluated using five real benchmark data sets and the results show that our approach achieves high results on both free text to concept and concept to searching concept vocabularies.
arXiv Detail & Related papers (2022-01-01T05:15:42Z) - Large Scale Substitution-based Word Sense Induction [48.49573297876054]
We present a word-sense induction method based on pre-trained masked language models (MLMs), which can cheaply scale to large vocabularies and large corpora.
The result is a corpus which is sense-tagged according to a corpus-derived sense inventory and where each sense is associated with indicative words.
Evaluation on English Wikipedia that was sense-tagged using our method shows that both the induced senses, and the per-instance sense assignment, are of high quality even compared to WSD methods, such as Babelfy.
arXiv Detail & Related papers (2021-10-14T19:40:37Z) - Learning the Implicit Semantic Representation on Graph-Structured Data [57.670106959061634]
Existing representation learning methods in graph convolutional networks are mainly designed by describing the neighborhood of each node as a perceptual whole.
We propose a Semantic Graph Convolutional Networks (SGCN) that explores the implicit semantics by learning latent semantic-paths in graphs.
arXiv Detail & Related papers (2021-01-16T16:18:43Z) - A Novel Word Sense Disambiguation Approach Using WordNet Knowledge Graph [0.0]
This paper presents a knowledge-based word sense disambiguation algorithm, namely Sequential Contextual Similarity Matrix multiplication (SCSMM)
The SCSMM algorithm combines semantic similarity, knowledge, and document context to respectively exploit the merits of local context.
The proposed algorithm outperformed all other algorithms when disambiguating nouns on the combined gold standard datasets.
arXiv Detail & Related papers (2021-01-08T06:47:32Z) - Argumentative Topology: Finding Loop(holes) in Logic [3.977669302067367]
Topological Word Embeddings uses mathematical techniques in dynamical system analysis and data driven shape extraction.
We show that using a topological delay embedding we are able to capture and extract a different, shape-based notion of logic.
arXiv Detail & Related papers (2020-11-17T21:23:58Z) - Semi-Supervised Learning with Meta-Gradient [123.26748223837802]
We propose a simple yet effective meta-learning algorithm in semi-supervised learning.
We find that the proposed algorithm performs favorably against state-of-the-art methods.
arXiv Detail & Related papers (2020-07-08T08:48:56Z) - Ontology-based Interpretable Machine Learning for Textual Data [35.01650633374998]
We introduce a novel interpreting framework that learns an interpretable model based on sampling technique to explain prediction models.
To narrow down the search space for explanations, we design a learnable anchor algorithm.
A set of regulations is further introduced, regarding combining learned interpretable representations with anchors to generate comprehensible explanations.
arXiv Detail & Related papers (2020-04-01T02:51:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.