Related papers: Using Artificial Neural Networks to Determine Ontologies Most Relevant to Scientific Texts

Using Artificial Neural Networks to Determine Ontologies Most Relevant to Scientific Texts

URL: http://arxiv.org/abs/2309.09203v1
Date: Sun, 17 Sep 2023 08:08:50 GMT
Title: Using Artificial Neural Networks to Determine Ontologies Most Relevant to Scientific Texts
Authors: Luk\'a\v{s} Korel, Alexander S. Behr, Norbert Kockmann and Martin Hole\v{n}a
Abstract summary: This paper provides an insight into the possibility of how to find most relevant texts using artificial networks. The basic idea of presented approach is to select a representative from a source text file and embed it to a vector space. We have considered different classifiers to categorize the embedded output from the transformer, in particular a random forest.
Score: 44.99833362998488
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: This paper provides an insight into the possibility of how to find ontologies most relevant to scientific texts using artificial neural networks. The basic idea of the presented approach is to select a representative paragraph from a source text file, embed it to a vector space by a pre-trained fine-tuned transformer, and classify the embedded vector according to its relevance to a target ontology. We have considered different classifiers to categorize the output from the transformer, in particular random forest, support vector machine, multilayer perceptron, k-nearest neighbors, and Gaussian process classifiers. Their suitability has been evaluated in a use case with ontologies and scientific texts concerning catalysis research. From results we can say the worst results have random forest. The best results in this task brought support vector machine classifier.

Related papers

Comparing Lexical and Semantic Vector Search Methods When Classifying Medical Documents [0.0]
Our task was to classify rigidly-structured medical documents according to their content.<n>We found that using off-the-shelf semantic vector search produced slightly worse predictive accuracy than creating a bespoke lexical vector search model.
arXiv Detail & Related papers (2025-05-16T17:06:35Z)
Novel Deep Neural Network Classifier Characterization Metrics with Applications to Dataless Evaluation [1.6574413179773757]
In this work, we evaluate a Deep Neural Network (DNN) classifier's training quality without any example dataset. Our empirical study of the proposed method for ResNet18, trained with CAFIR10 and CAFIR100 datasets, confirms that data-less evaluation of DNN classifiers is indeed possible.
arXiv Detail & Related papers (2024-07-17T20:40:46Z)
Forging the Forger: An Attempt to Improve Authorship Verification via Data Augmentation [52.72682366640554]
Authorship Verification (AV) is a text classification task concerned with inferring whether a candidate text has been written by one specific author or by someone else. It has been shown that many AV systems are vulnerable to adversarial attacks, where a malicious author actively tries to fool the classifier by either concealing their writing style, or by imitating the style of another author.
arXiv Detail & Related papers (2024-03-17T16:36:26Z)
Training toward significance with the decorrelated event classifier transformer neural network [0.0]
In natural language processing, one of the leading neural network architectures is the transformer. It is found that this trained network can perform better than boosted decision trees and feed-forward networks.
arXiv Detail & Related papers (2023-12-31T08:57:29Z)
Knowledge Trees: Gradient Boosting Decision Trees on Knowledge Neurons as Probing Classifier [0.0]
Logistic regression on the output representation of the transformer neural network layer is most often used to probing the syntactic properties of the language model. We show that using gradient boosting decision trees at the Knowledge Neuron layer is more advantageous than using logistic regression on the output representations of the transformer layer.
arXiv Detail & Related papers (2023-12-17T15:37:03Z)
Probabilistic Transformer: A Probabilistic Dependency Model for Contextual Word Representation [52.270712965271656]
We propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective. We find that the graph of our model resembles transformers, with correspondences between dependencies and self-attention. Experiments show that our model performs competitively to transformers on small to medium sized datasets.
arXiv Detail & Related papers (2023-11-26T06:56:02Z)
Khmer Text Classification Using Word Embedding and Neural Networks [0.0]
We discuss various classification approaches for Khmer text. A Khmer word embedding model is trained on a 30-million-Khmer-word corpus to construct word vector representations. We evaluate the performance of different approaches on a news article dataset for both multi-class and multi-label text classification tasks.
arXiv Detail & Related papers (2021-12-13T15:57:32Z)
On the rate of convergence of a classifier based on a Transformer encoder [55.41148606254641]
The rate of convergence of the misclassification probability of the classifier towards the optimal misclassification probability is analyzed. It is shown that this classifier is able to circumvent the curse of dimensionality provided the aposteriori probability satisfies a suitable hierarchical composition model.
arXiv Detail & Related papers (2021-11-29T14:58:29Z)
Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance [55.10864476206503]
We investigate the use of quantized vectors to model the latent linguistic embedding. By enforcing different policies over the latent spaces in the training, we are able to obtain a latent linguistic embedding. Our experiments show that the voice cloning system built with vector quantization has only a small degradation in terms of perceptive evaluations.
arXiv Detail & Related papers (2021-06-25T07:51:35Z)
Be More with Less: Hypergraph Attention Networks for Inductive Text Classification [56.98218530073927]
Graph neural networks (GNNs) have received increasing attention in the research community and demonstrated their promising results on this canonical task. Despite the success, their performance could be largely jeopardized in practice since they are unable to capture high-order interaction between words. We propose a principled model -- hypergraph attention networks (HyperGAT) which can obtain more expressive power with less computational consumption for text representation learning.
arXiv Detail & Related papers (2020-11-01T00:21:59Z)
An Intelligent CNN-VAE Text Representation Technology Based on Text Semantics for Comprehensive Big Data [15.680918844684454]
A text feature representation model based on convolutional neural network (CNN) and variational autoencoder (VAE) is proposed. The proposed model outperforms in k-nearest neighbor (KNN), random forest (RF) and support vector machine (SVM) classification algorithms.
arXiv Detail & Related papers (2020-08-28T07:39:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.