Related papers: EBIC.JL -- an Efficient Implementation of Evolutionary Biclustering Algorithm in Julia

EBIC.JL -- an Efficient Implementation of Evolutionary Biclustering Algorithm in Julia

URL: http://arxiv.org/abs/2105.01196v1
Date: Mon, 3 May 2021 22:30:38 GMT
Title: EBIC.JL -- an Efficient Implementation of Evolutionary Biclustering Algorithm in Julia
Authors: Pawe{\l} Renc, Patryk Orzechowski, Aleksander Byrski, Jaros{\l}aw W\k{a}s, and Jason H. Moore
Abstract summary: We introduce EBIC.JL - an implementation of one of the most accurate biclustering algorithms in Julia. We show that the new version maintains comparable accuracy to its predecessor EBIC while converging faster for the majority of the problems.
Score: 59.422301529692454
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Biclustering is a data mining technique which searches for local patterns in numeric tabular data with main application in bioinformatics. This technique has shown promise in multiple areas, including development of biomarkers for cancer, disease subtype identification, or gene-drug interactions among others. In this paper we introduce EBIC.JL - an implementation of one of the most accurate biclustering algorithms in Julia, a modern highly parallelizable programming language for data science. We show that the new version maintains comparable accuracy to its predecessor EBIC while converging faster for the majority of the problems. We hope that this open source software in a high-level programming language will foster research in this promising field of bioinformatics and expedite development of new biclustering methods for big data.

Related papers

An Enhanced Model-based Approach for Short Text Clustering [58.60681789677676]
Short text clustering has become increasingly important with the popularity of social media like Twitter, Google+, and Facebook.<n>Existing methods can be broadly categorized into two paradigms: topic model-based approaches and deep representation learning-based approaches.<n>We propose a collapsed Gibbs Sampling algorithm for the Dirichlet Multinomial Mixture model (GSDMM), which effectively handles the sparsity and high dimensionality of short texts.<n>Based on several aspects of GSDMM that warrant further refinement, we propose an improved approach, GSDMM+, designed to further optimize its performance.
arXiv Detail & Related papers (2025-07-18T10:07:42Z)
MEC-IP: Efficient Discovery of Markov Equivalent Classes via Integer Programming [3.2513035377783717]
This paper presents a novel Programming (IP) approach for discovering the Markov Equivalent Class (MEC) of Bayesian Networks (BNs) through observational data. Our numerical results show that not only a remarkable reduction in computational time is achieved by our algorithm but also an improvement in causal discovery accuracy is seen across diverse datasets.
arXiv Detail & Related papers (2024-10-22T22:56:33Z)
Language Model Powered Digital Biology with BRAD [5.309032614374711]
Large Language Models (LLMs) are well-suited for unstructured integration. We present a prototype Bioinformatics Retrieval Augmented Digital assistant (BRAD)
arXiv Detail & Related papers (2024-09-04T16:43:14Z)
RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation [54.707460684650584]
Large Language Models (LLMs) demonstrate human-level capabilities in dialogue, reasoning, and knowledge retention. Current research addresses this bottleneck by equipping LLMs with external knowledge, a technique known as Retrieval Augmented Generation (RAG) RAGLAB is a modular and research-oriented open-source library that reproduces 6 existing algorithms and provides a comprehensive ecosystem for investigating RAG algorithms.
arXiv Detail & Related papers (2024-08-21T07:20:48Z)
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models [63.188607839223046]
This survey focuses on the benefits of scaling compute during inference. We explore three areas under a unified mathematical formalism: token-level generation algorithms, meta-generation algorithms, and efficient generation.
arXiv Detail & Related papers (2024-06-24T17:45:59Z)
An Evaluation of Large Language Models in Bioinformatics Research [52.100233156012756]
We study the performance of large language models (LLMs) on a wide spectrum of crucial bioinformatics tasks. These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems. Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks.
arXiv Detail & Related papers (2024-02-21T11:27:31Z)
Single-Cell Deep Clustering Method Assisted by Exogenous Gene Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells. During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation. This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z)
ParlayANN: Scalable and Deterministic Parallel Graph-Based Approximate Nearest Neighbor Search Algorithms [5.478671305092084]
We introduce ParlayANN, a library of deterministic and parallel graph-based approximate nearest neighbor search algorithms. We develop novel parallel implementations for four state-of-the-art graph-based ANNS algorithms that scale to billion-scale datasets.
arXiv Detail & Related papers (2023-05-07T19:28:23Z)
2021 BEETL Competition: Advancing Transfer Learning for Subject Independence & Heterogenous EEG Data Sets [89.84774119537087]
We design two transfer learning challenges around diagnostics and Brain-Computer-Interfacing (BCI) Task 1 is centred on medical diagnostics, addressing automatic sleep stage annotation across subjects. Task 2 is centred on Brain-Computer Interfacing (BCI), addressing motor imagery decoding across both subjects and data sets.
arXiv Detail & Related papers (2022-02-14T12:12:20Z)
Bioinspired Cortex-based Fast Codebook Generation [0.09449650062296822]
We introduce a feature extraction method inspired by sensory cortical networks in the brain. Dubbed as bioinspired cortex, the algorithm provides convergence to features from streaming signals with superior computational efficiency. We show herein the superior performance of the cortex model in clustering and vector quantization.
arXiv Detail & Related papers (2022-01-28T18:37:43Z)
Data-Driven Logistic Regression Ensembles With Applications in Genomics [0.0]
We propose a new approach for dealing with high-dimensional binary classification problems that combines ideas from regularization and ensembling. We demonstrate the good performance of our method in terms of prediction accuracy and identification of key biomarkers using several medical datasets involving common diseases such as cancer, multiple sclerosis and psoriasis.
arXiv Detail & Related papers (2021-02-17T05:57:26Z)
EBIC: an open source software for high-dimensional and big data biclustering analyses [2.863279092948239]
We present the latest release of EBIC, a next-generation biclustering algorithm for mining genetic data. The major contribution of this paper is adding support for big data, making it possible to efficiently run large genomic data mining analyses. EBIC was applied to datasets of different sizes, including a large DNA methylation dataset with 436,444 rows.
arXiv Detail & Related papers (2018-07-26T02:57:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.