Evolutionary Algorithms Approach For Search Based On Semantic Document Similarity
- URL: http://arxiv.org/abs/2502.19437v1
- Date: Thu, 20 Feb 2025 18:56:52 GMT
- Title: Evolutionary Algorithms Approach For Search Based On Semantic Document Similarity
- Authors: Chandrashekar Muniyappa, Eujin Kim,
- Abstract summary: We develop clustering, recommendation, and question-and-answering systems using various text representation techniques.<n>We show that Universal Sentence vectors (USE) is used to capture the semantic similarity of text.<n>And the transfer learning technique is used to apply Genetic Algorithm (GA) and Differential Evolution (DE) algorithms to search and retrieve relevant top N documents.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advancements in cloud computing and distributed computing have fostered research activities in Computer science. As a result, researchers have made significant progress in Neural Networks, Evolutionary Computing Algorithms like Genetic, and Differential evolution algorithms. These algorithms are used to develop clustering, recommendation, and question-and-answering systems using various text representation and similarity measurement techniques. In this research paper, Universal Sentence Encoder (USE) is used to capture the semantic similarity of text; And the transfer learning technique is used to apply Genetic Algorithm (GA) and Differential Evolution (DE) algorithms to search and retrieve relevant top N documents based on user query. The proposed approach is applied to the Stanford Question and Answer (SQuAD) Dataset to identify a user query. Finally, through experiments, we prove that text documents can be efficiently represented as sentence embedding vectors using USE to capture the semantic similarity, and by comparing the results of the Manhattan Distance, GA, and DE algorithms we prove that the evolutionary algorithms are good at finding the top N results than the traditional ranking approach.
Related papers
- From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models [63.188607839223046]
This survey focuses on the benefits of scaling compute during inference.
We explore three areas under a unified mathematical formalism: token-level generation algorithms, meta-generation algorithms, and efficient generation.
arXiv Detail & Related papers (2024-06-24T17:45:59Z) - Relation-aware Ensemble Learning for Knowledge Graph Embedding [68.94900786314666]
We propose to learn an ensemble by leveraging existing methods in a relation-aware manner.
exploring these semantics using relation-aware ensemble leads to a much larger search space than general ensemble methods.
We propose a divide-search-combine algorithm RelEns-DSC that searches the relation-wise ensemble weights independently.
arXiv Detail & Related papers (2023-10-13T07:40:12Z) - A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper.
Our dataset consists of 477 self-reported expertise scores provided by 58 researchers.
For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z) - Learning with Differentiable Algorithms [6.47243430672461]
This thesis explores combining classic algorithms and machine learning systems like neural networks.
The thesis formalizes the idea of algorithmic supervision, which allows a neural network to learn from or in conjunction with an algorithm.
In addition, this thesis proposes differentiable algorithms, such as differentiable sorting networks, differentiable sorting gates, and differentiable logic gate networks.
arXiv Detail & Related papers (2022-09-01T17:30:00Z) - The CLRS Algorithmic Reasoning Benchmark [28.789225199559834]
Learning representations of algorithms is an emerging area of machine learning, seeking to bridge concepts from neural networks with classical algorithms.
We propose the CLRS Algorithmic Reasoning Benchmark, covering classical algorithms from the Introduction to Algorithms textbook.
Our benchmark spans a variety of algorithmic reasoning procedures, including sorting, searching, dynamic programming, graph algorithms, string algorithms and geometric algorithms.
arXiv Detail & Related papers (2022-05-31T09:56:44Z) - Word Embeddings and Validity Indexes in Fuzzy Clustering [5.063728016437489]
fuzzy-based analysis of various vector representations of words, i.e., word embeddings.
We use two popular fuzzy clustering algorithms on count-based word embeddings, with different methods and dimensionality.
We evaluate results of experiments with various clustering validity indexes to compare different algorithm variation with different embeddings accuracy.
arXiv Detail & Related papers (2022-04-26T18:08:19Z) - AsySQN: Faster Vertical Federated Learning Algorithms with Better
Computation Resource Utilization [159.75564904944707]
We propose an asynchronous quasi-Newton (AsySQN) framework for vertical federated learning (VFL)
The proposed algorithms make descent steps scaled by approximate without calculating the inverse Hessian matrix explicitly.
We show that the adopted asynchronous computation can make better use of the computation resource.
arXiv Detail & Related papers (2021-09-26T07:56:10Z) - Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms.
The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.
We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z) - A Novel Word Sense Disambiguation Approach Using WordNet Knowledge Graph [0.0]
This paper presents a knowledge-based word sense disambiguation algorithm, namely Sequential Contextual Similarity Matrix multiplication (SCSMM)
The SCSMM algorithm combines semantic similarity, knowledge, and document context to respectively exploit the merits of local context.
The proposed algorithm outperformed all other algorithms when disambiguating nouns on the combined gold standard datasets.
arXiv Detail & Related papers (2021-01-08T06:47:32Z) - Using the Full-text Content of Academic Articles to Identify and
Evaluate Algorithm Entities in the Domain of Natural Language Processing [7.163189900803623]
This article takes the field of natural language processing (NLP) as an example and identifies algorithms from academic papers in the field.
A dictionary of algorithms is constructed by manually annotating the contents of papers, and sentences containing algorithms in the dictionary are extracted through dictionary-based matching.
The number of articles mentioning an algorithm is used as an indicator to analyze the influence of that algorithm.
arXiv Detail & Related papers (2020-10-21T08:24:18Z) - A Systematic Characterization of Sampling Algorithms for Open-ended
Language Generation [71.31905141672529]
We study the widely adopted ancestral sampling algorithms for auto-regressive language models.
We identify three key properties that are shared among them: entropy reduction, order preservation, and slope preservation.
We find that the set of sampling algorithms that satisfies these properties performs on par with the existing sampling algorithms.
arXiv Detail & Related papers (2020-09-15T17:28:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.