A Novel Method of Extracting Topological Features from Word Embeddings
- URL: http://arxiv.org/abs/2003.13074v2
- Date: Sun, 19 Apr 2020 21:56:57 GMT
- Title: A Novel Method of Extracting Topological Features from Word Embeddings
- Authors: Shafie Gholizadeh, Armin Seyeditabari and Wlodek Zadrozny
- Abstract summary: We introduce a novel algorithm to extract topological features from word embedding representation of text.
We will show our defined topological features may outperform conventional text mining features.
- Score: 2.4063592468412267
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, topological data analysis has been utilized for a wide range
of problems to deal with high dimensional noisy data. While text
representations are often high dimensional and noisy, there are only a few work
on the application of topological data analysis in natural language processing.
In this paper, we introduce a novel algorithm to extract topological features
from word embedding representation of text that can be used for text
classification. Working on word embeddings, topological data analysis can
interpret the embedding high-dimensional space and discover the relations among
different embedding dimensions. We will use persistent homology, the most
commonly tool from topological data analysis, for our experiment. Examining our
topological algorithm on long textual documents, we will show our defined
topological features may outperform conventional text mining features.
Related papers
- Topograph: An efficient Graph-Based Framework for Strictly Topology Preserving Image Segmentation [78.54656076915565]
Topological correctness plays a critical role in many image segmentation tasks.
Most networks are trained using pixel-wise loss functions, such as Dice, neglecting topological accuracy.
We propose a novel, graph-based framework for topologically accurate image segmentation.
arXiv Detail & Related papers (2024-11-05T16:20:14Z) - Ontology Embedding: A Survey of Methods, Applications and Resources [54.3453925775069]
Ontologies are widely used for representing domain knowledge and meta data.
One straightforward solution is to integrate statistical analysis and machine learning.
Numerous papers have been published on embedding, but a lack of systematic reviews hinders researchers from gaining a comprehensive understanding of this field.
arXiv Detail & Related papers (2024-06-16T14:49:19Z) - Linguistics from a topological viewpoint [2.4238741865874363]
In this paper, we describe a workflow to analyze the topological shapes of South American languages.
As a result, it is difficult to have a clear visualization of the data.
arXiv Detail & Related papers (2024-03-16T23:10:42Z) - Topological Learning in Multi-Class Data Sets [0.3050152425444477]
We study the impact of topological complexity on learning in feedforward deep neural networks (DNNs)
We evaluate our topological classification algorithm on multiple constructed and open source data sets.
arXiv Detail & Related papers (2023-01-23T21:54:25Z) - Rethinking Persistent Homology for Visual Recognition [27.625893409863295]
This paper performs a detailed analysis of the effectiveness of topological properties for image classification in various training scenarios.
We identify the scenarios that benefit the most from topological features, e.g., training simple networks on small datasets.
arXiv Detail & Related papers (2022-07-09T08:01:11Z) - Hierarchical Heterogeneous Graph Representation Learning for Short Text
Classification [60.233529926965836]
We propose a new method called SHINE, which is based on graph neural network (GNN) for short text classification.
First, we model the short text dataset as a hierarchical heterogeneous graph consisting of word-level component graphs.
Then, we dynamically learn a short document graph that facilitates effective label propagation among similar short texts.
arXiv Detail & Related papers (2021-10-30T05:33:05Z) - Artificial Text Detection via Examining the Topology of Attention Maps [58.46367297712477]
We propose three novel types of interpretable topological features for this task based on Topological Data Analysis (TDA)
We empirically show that the features derived from the BERT model outperform count- and neural-based baselines up to 10% on three common datasets.
The probing analysis of the features reveals their sensitivity to the surface and syntactic properties.
arXiv Detail & Related papers (2021-09-10T12:13:45Z) - Argumentative Topology: Finding Loop(holes) in Logic [3.977669302067367]
Topological Word Embeddings uses mathematical techniques in dynamical system analysis and data driven shape extraction.
We show that using a topological delay embedding we are able to capture and extract a different, shape-based notion of logic.
arXiv Detail & Related papers (2020-11-17T21:23:58Z) - A Survey of Embedding Space Alignment Methods for Language and Knowledge
Graphs [77.34726150561087]
We survey the current research landscape on word, sentence and knowledge graph embedding algorithms.
We provide a classification of the relevant alignment techniques and discuss benchmark datasets used in this field of research.
arXiv Detail & Related papers (2020-10-26T16:08:13Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - Topological Data Analysis in Text Classification: Extracting Features
with Additive Information [2.1410799064827226]
Topological Data Analysis is challenging to apply to high dimensional numeric data.
Topological features carry some exclusive information not captured by conventional text mining methods.
Adding topological features to the conventional features in ensemble models improves the classification results.
arXiv Detail & Related papers (2020-03-29T21:02:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.