EBIC.JL -- an Efficient Implementation of Evolutionary Biclustering
Algorithm in Julia
- URL: http://arxiv.org/abs/2105.01196v1
- Date: Mon, 3 May 2021 22:30:38 GMT
- Title: EBIC.JL -- an Efficient Implementation of Evolutionary Biclustering
Algorithm in Julia
- Authors: Pawe{\l} Renc, Patryk Orzechowski, Aleksander Byrski, Jaros{\l}aw
W\k{a}s, and Jason H. Moore
- Abstract summary: We introduce EBIC.JL - an implementation of one of the most accurate biclustering algorithms in Julia.
We show that the new version maintains comparable accuracy to its predecessor EBIC while converging faster for the majority of the problems.
- Score: 59.422301529692454
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Biclustering is a data mining technique which searches for local patterns in
numeric tabular data with main application in bioinformatics. This technique
has shown promise in multiple areas, including development of biomarkers for
cancer, disease subtype identification, or gene-drug interactions among others.
In this paper we introduce EBIC.JL - an implementation of one of the most
accurate biclustering algorithms in Julia, a modern highly parallelizable
programming language for data science. We show that the new version maintains
comparable accuracy to its predecessor EBIC while converging faster for the
majority of the problems. We hope that this open source software in a
high-level programming language will foster research in this promising field of
bioinformatics and expedite development of new biclustering methods for big
data.
Related papers
- MEC-IP: Efficient Discovery of Markov Equivalent Classes via Integer Programming [3.2513035377783717]
This paper presents a novel Programming (IP) approach for discovering the Markov Equivalent Class (MEC) of Bayesian Networks (BNs) through observational data.
Our numerical results show that not only a remarkable reduction in computational time is achieved by our algorithm but also an improvement in causal discovery accuracy is seen across diverse datasets.
arXiv Detail & Related papers (2024-10-22T22:56:33Z) - RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation [54.707460684650584]
Large Language Models (LLMs) demonstrate human-level capabilities in dialogue, reasoning, and knowledge retention.
Current research addresses this bottleneck by equipping LLMs with external knowledge, a technique known as Retrieval Augmented Generation (RAG)
RAGLAB is a modular and research-oriented open-source library that reproduces 6 existing algorithms and provides a comprehensive ecosystem for investigating RAG algorithms.
arXiv Detail & Related papers (2024-08-21T07:20:48Z) - From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models [63.188607839223046]
This survey focuses on the benefits of scaling compute during inference.
We explore three areas under a unified mathematical formalism: token-level generation algorithms, meta-generation algorithms, and efficient generation.
arXiv Detail & Related papers (2024-06-24T17:45:59Z) - An Evaluation of Large Language Models in Bioinformatics Research [52.100233156012756]
We study the performance of large language models (LLMs) on a wide spectrum of crucial bioinformatics tasks.
These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems.
Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks.
arXiv Detail & Related papers (2024-02-21T11:27:31Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - ParlayANN: Scalable and Deterministic Parallel Graph-Based Approximate
Nearest Neighbor Search Algorithms [5.478671305092084]
We introduce ParlayANN, a library of deterministic and parallel graph-based approximate nearest neighbor search algorithms.
We develop novel parallel implementations for four state-of-the-art graph-based ANNS algorithms that scale to billion-scale datasets.
arXiv Detail & Related papers (2023-05-07T19:28:23Z) - 2021 BEETL Competition: Advancing Transfer Learning for Subject
Independence & Heterogenous EEG Data Sets [89.84774119537087]
We design two transfer learning challenges around diagnostics and Brain-Computer-Interfacing (BCI)
Task 1 is centred on medical diagnostics, addressing automatic sleep stage annotation across subjects.
Task 2 is centred on Brain-Computer Interfacing (BCI), addressing motor imagery decoding across both subjects and data sets.
arXiv Detail & Related papers (2022-02-14T12:12:20Z) - Bioinspired Cortex-based Fast Codebook Generation [0.09449650062296822]
We introduce a feature extraction method inspired by sensory cortical networks in the brain.
Dubbed as bioinspired cortex, the algorithm provides convergence to features from streaming signals with superior computational efficiency.
We show herein the superior performance of the cortex model in clustering and vector quantization.
arXiv Detail & Related papers (2022-01-28T18:37:43Z) - Data-Driven Logistic Regression Ensembles With Applications in Genomics [0.0]
We propose a new approach for dealing with high-dimensional binary classification problems that combines ideas from regularization and ensembling.
We demonstrate the good performance of our method in terms of prediction accuracy and identification of key biomarkers using several medical datasets involving common diseases such as cancer, multiple sclerosis and psoriasis.
arXiv Detail & Related papers (2021-02-17T05:57:26Z) - EBIC: an open source software for high-dimensional and big data biclustering analyses [2.863279092948239]
We present the latest release of EBIC, a next-generation biclustering algorithm for mining genetic data.
The major contribution of this paper is adding support for big data, making it possible to efficiently run large genomic data mining analyses.
EBIC was applied to datasets of different sizes, including a large DNA methylation dataset with 436,444 rows.
arXiv Detail & Related papers (2018-07-26T02:57:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.