Motif Diversity in Human Liver ChIP-seq Data Using MAP-Elites
- URL: http://arxiv.org/abs/2601.17808v1
- Date: Sun, 25 Jan 2026 11:57:54 GMT
- Title: Motif Diversity in Human Liver ChIP-seq Data Using MAP-Elites
- Authors: Alejandro Medina, Mary Lauren Benton,
- Abstract summary: We apply the MAP-Elites algorithm to evolve position weight matrix motifs under a likelihood-based fitness objective.<n>Results show that MAP-Elites recovers multiple high-quality motif variants with fitness comparable to MEME's strongest solutions.
- Score: 45.88028371034407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motif discovery is a core problem in computational biology, traditionally formulated as a likelihood optimization task that returns a single dominant motif from a DNA sequence dataset. However, regulatory sequence data admit multiple plausible motif explanations, reflecting underlying biological heterogeneity. In this work, we frame motif discovery as a quality-diversity problem and apply the MAP-Elites algorithm to evolve position weight matrix motifs under a likelihood-based fitness objective while explicitly preserving diversity across biologically meaningful dimensions. We evaluate MAP-Elites using three complementary behavioral characterizations that capture trade-offs between motif specificity, compositional structure, coverage, and robustness. Experiments on human CTCF liver ChIP-seq data aligned to the human reference genome compare MAP-Elites against a standard motif discovery tool, MEME, under matched evaluation criteria across stratified dataset subsets. Results show that MAP-Elites recovers multiple high-quality motif variants with fitness comparable to MEME's strongest solutions while revealing structured diversity obscured by single-solution approaches.
Related papers
- DOGMA: Weaving Structural Information into Data-centric Single-cell Transcriptomics Analysis [43.565183518761984]
We propose DOGMA, a data-centric framework designed for the structural reshaping and semantic enhancement of raw data.<n>In complex multi-species and multi-organ benchmarks, DOGMA SOTA performance, exhibiting superior zero-shot robustness and sample efficiency.
arXiv Detail & Related papers (2026-02-02T09:10:09Z) - MS-ISSM: Objective Quality Assessment of Point Clouds Using Multi-scale Implicit Structural Similarity [65.85858856481131]
unstructured and irregular nature of point clouds poses a significant challenge for objective quality assessment (PCQA)<n>We propose the Multi-scale Implicit Structural Similarity Measurement (MS-ISSM)
arXiv Detail & Related papers (2026-01-03T14:58:52Z) - An Interpretable Ensemble Framework for Multi-Omics Dementia Biomarker Discovery Under HDLSS Conditions [0.0]
We propose a novel ensemble approach combining Graph Attention Networks (GAT), MultiOmics Variational AutoEncoder (MOVE), Elastic-net sparse regression, and Storey's False Discovery Rate (FDR)<n>We evaluate performance using both simulated multi-omics data and the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset.<n>Our method demonstrates superior predictive accuracy, feature selection precision, and biological relevance.
arXiv Detail & Related papers (2025-09-04T15:20:13Z) - Consistency of Feature Attribution in Deep Learning Architectures for Multi-Omics [0.36646002427839136]
We investigate the use of Shapley Additive Explanations (SHAP) on a multi-view deep learning model applied to multi-omics data.<n> Rankings of features via SHAP are compared across various architectures to evaluate consistency of the method.<n>We present an alternative, simple method to assess the robustness of identification of important biomolecules.
arXiv Detail & Related papers (2025-07-30T17:53:42Z) - GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models.
GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies.
We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z) - Must: Maximizing Latent Capacity of Spatial Transcriptomics Data [41.70354088000952]
This paper introduces Multiple-modality Structure Transformation, named MuST, a novel methodology to tackle the challenge.
It integrates the multi-modality information contained in the ST data effectively into a uniform latent space to provide a foundation for all the downstream tasks.
The results show that it outperforms existing state-of-the-art methods with clear advantages in the precision of identifying and preserving structures of tissues and biomarkers.
arXiv Detail & Related papers (2024-01-15T09:07:28Z) - Genetic InfoMax: Exploring Mutual Information Maximization in
High-Dimensional Imaging Genetics Studies [50.11449968854487]
Genome-wide association studies (GWAS) are used to identify relationships between genetic variations and specific traits.
Representation learning for imaging genetics is largely under-explored due to the unique challenges posed by GWAS.
We introduce a trans-modal learning framework Genetic InfoMax (GIM) to address the specific challenges of GWAS.
arXiv Detail & Related papers (2023-09-26T03:59:21Z) - Unsupervised ensemble-based phenotyping helps enhance the
discoverability of genes related to heart morphology [57.25098075813054]
We propose a new framework for gene discovery entitled Un Phenotype Ensembles.
It builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner.
These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations.
arXiv Detail & Related papers (2023-01-07T18:36:44Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Object-Attribute Biclustering for Elimination of Missing Genotypes in
Ischemic Stroke Genome-Wide Data [2.0236506875465863]
Missing genotypes can affect the efficacy of machine learning approaches to identify the risk genetic variants of common diseases and traits.
The problem occurs when genotypic data are collected from different experiments with different DNA microarrays, each being characterised by its pattern of uncalled (missing) genotypes.
We use well-developed notions of object-attribute biclusters and formal concepts that correspond to dense subrelations in the binary relation.
arXiv Detail & Related papers (2020-10-22T12:27:43Z) - Mycorrhiza: Genotype Assignment usingPhylogenetic Networks [2.286041284499166]
We introduce Mycorrhiza, a machine learning approach for the genotype assignment problem.
Our algorithm makes use of phylogenetic networks to engineer features that encode the evolutionary relationships among samples.
Mycorrhiza yields particularly significant gains on datasets with a large average fixation index (FST) or deviation from the Hardy-Weinberg equilibrium.
arXiv Detail & Related papers (2020-10-14T02:36:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.