Related papers: A Novel Method of Extracting Topological Features from Word Embeddings

A Novel Method of Extracting Topological Features from Word Embeddings

URL: http://arxiv.org/abs/2003.13074v2
Date: Sun, 19 Apr 2020 21:56:57 GMT
Title: A Novel Method of Extracting Topological Features from Word Embeddings
Authors: Shafie Gholizadeh, Armin Seyeditabari and Wlodek Zadrozny
Abstract summary: We introduce a novel algorithm to extract topological features from word embedding representation of text. We will show our defined topological features may outperform conventional text mining features.
Score: 2.4063592468412267
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, topological data analysis has been utilized for a wide range of problems to deal with high dimensional noisy data. While text representations are often high dimensional and noisy, there are only a few work on the application of topological data analysis in natural language processing. In this paper, we introduce a novel algorithm to extract topological features from word embedding representation of text that can be used for text classification. Working on word embeddings, topological data analysis can interpret the embedding high-dimensional space and discover the relations among different embedding dimensions. We will use persistent homology, the most commonly tool from topological data analysis, for our experiment. Examining our topological algorithm on long textual documents, we will show our defined topological features may outperform conventional text mining features.

Related papers

Topograph: An efficient Graph-Based Framework for Strictly Topology Preserving Image Segmentation [78.54656076915565]
Topological correctness plays a critical role in many image segmentation tasks. Most networks are trained using pixel-wise loss functions, such as Dice, neglecting topological accuracy. We propose a novel, graph-based framework for topologically accurate image segmentation.
arXiv Detail & Related papers (2024-11-05T16:20:14Z)
Ontology Embedding: A Survey of Methods, Applications and Resources [54.3453925775069]
Ontologies are widely used for representing domain knowledge and meta data. One straightforward solution is to integrate statistical analysis and machine learning. Numerous papers have been published on embedding, but a lack of systematic reviews hinders researchers from gaining a comprehensive understanding of this field.
arXiv Detail & Related papers (2024-06-16T14:49:19Z)
Linguistics from a topological viewpoint [2.4238741865874363]
In this paper, we describe a workflow to analyze the topological shapes of South American languages. As a result, it is difficult to have a clear visualization of the data.
arXiv Detail & Related papers (2024-03-16T23:10:42Z)
Topological Learning in Multi-Class Data Sets [0.3050152425444477]
We study the impact of topological complexity on learning in feedforward deep neural networks (DNNs) We evaluate our topological classification algorithm on multiple constructed and open source data sets.
arXiv Detail & Related papers (2023-01-23T21:54:25Z)
Rethinking Persistent Homology for Visual Recognition [27.625893409863295]
This paper performs a detailed analysis of the effectiveness of topological properties for image classification in various training scenarios. We identify the scenarios that benefit the most from topological features, e.g., training simple networks on small datasets.
arXiv Detail & Related papers (2022-07-09T08:01:11Z)
Hierarchical Heterogeneous Graph Representation Learning for Short Text Classification [60.233529926965836]
We propose a new method called SHINE, which is based on graph neural network (GNN) for short text classification. First, we model the short text dataset as a hierarchical heterogeneous graph consisting of word-level component graphs. Then, we dynamically learn a short document graph that facilitates effective label propagation among similar short texts.
arXiv Detail & Related papers (2021-10-30T05:33:05Z)
Artificial Text Detection via Examining the Topology of Attention Maps [58.46367297712477]
We propose three novel types of interpretable topological features for this task based on Topological Data Analysis (TDA) We empirically show that the features derived from the BERT model outperform count- and neural-based baselines up to 10% on three common datasets. The probing analysis of the features reveals their sensitivity to the surface and syntactic properties.
arXiv Detail & Related papers (2021-09-10T12:13:45Z)
Argumentative Topology: Finding Loop(holes) in Logic [3.977669302067367]
Topological Word Embeddings uses mathematical techniques in dynamical system analysis and data driven shape extraction. We show that using a topological delay embedding we are able to capture and extract a different, shape-based notion of logic.
arXiv Detail & Related papers (2020-11-17T21:23:58Z)
A Survey of Embedding Space Alignment Methods for Language and Knowledge Graphs [77.34726150561087]
We survey the current research landscape on word, sentence and knowledge graph embedding algorithms. We provide a classification of the relevant alignment techniques and discuss benchmark datasets used in this field of research.
arXiv Detail & Related papers (2020-10-26T16:08:13Z)
Neural Deepfake Detection with Factual Structure of Text [78.30080218908849]
We propose a graph-based model for deepfake detection of text. Our approach represents the factual structure of a given document as an entity graph. Our model can distinguish the difference in the factual structure between machine-generated text and human-written text.
arXiv Detail & Related papers (2020-10-15T02:35:31Z)
A Comparative Study on Structural and Semantic Properties of Sentence Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction. We show that different embedding spaces have different degrees of strength for the structural and semantic properties. These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z)
Topological Data Analysis in Text Classification: Extracting Features with Additive Information [2.1410799064827226]
Topological Data Analysis is challenging to apply to high dimensional numeric data. Topological features carry some exclusive information not captured by conventional text mining methods. Adding topological features to the conventional features in ensemble models improves the classification results.
arXiv Detail & Related papers (2020-03-29T21:02:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.