Related papers: scInterpreter: Training Large Language Models to Interpret scRNA-seq Data for Cell Type Annotation

Related papers

A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following [32.67347401145835]
Large language models excel at interpreting complex natural language instructions, enabling them to perform a wide range of tasks. We present InstructCell, a multi-modal AI copilot that leverages natural language as a medium for more direct and flexible single-cell analysis. InstructCell empowers researchers to accomplish critical tasks-such as cell type annotation, conditional pseudo-cell generation, and drug sensitivity prediction-using straightforward natural language commands.
arXiv Detail & Related papers (2025-01-14T15:12:19Z)
scReader: Prompting Large Language Models to Interpret scRNA-seq Data [12.767105992391555]
We propose an innovative hybrid approach that integrates the general knowledge capabilities of large language models with domain-specific representation models for single-cell omics data interpretation. By inputting single-cell gene-level expression data with prompts, we effectively model cellular representations based on the differential expression levels of genes across various species and cell types.
arXiv Detail & Related papers (2024-12-24T04:28:42Z)
Single-Cell Omics Arena: A Benchmark Study for Large Language Models on Cell Type Annotation Using Single-Cell Data [13.56585855722118]
Large language models (LLMs) have demonstrated their ability to efficiently process and synthesize vast corpora of text to automatically extract biological knowledge. Our study explores the potential of LLMs to accurately classify and annotate cell types in single-cell RNA sequencing (scRNA-seq) data. The results demonstrate that LLMs can provide robust interpretations of single-cell data without requiring additional fine-tuning.
arXiv Detail & Related papers (2024-12-03T23:58:35Z)
A generative framework to bridge data-driven models and scientific theories in language neuroscience [84.76462599023802]
We present generative explanation-mediated validation, a framework for generating concise explanations of language selectivity in the brain. We show that explanatory accuracy is closely related to the predictive power and stability of the underlying statistical models.
arXiv Detail & Related papers (2024-10-01T15:57:48Z)
Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation [2.9921619703037274]
We propose a retrieval augmented generation (RAG) framework backed by a large language model (LLM) to correct the output of a smaller model for the linguistic task of morphological glossing. We leverage linguistic information to make up for the lack of data and trainable parameters, while allowing for inputs from written descriptive grammars interpreted and distilled through an LLM. We show that a compact, RAG-supported model is highly effective in data-scarce settings, achieving a new state-of-the-art for this task and our target languages.
arXiv Detail & Related papers (2024-10-01T04:20:14Z)
EEG-Language Modeling for Pathology Detection [0.0]
This study pioneers EEG-language models trained on clinical reports and 15000 EEGs. Our results indicate that models learn richer representations from being exposed to a variety of report segments. representations of EEG-language models can significantly improve pathology detection compared to those of EEG-only models.
arXiv Detail & Related papers (2024-09-02T10:03:03Z)
Critical Data Size of Language Models from a Grokking Perspective [35.029074833552656]
We formalize the phase transition under the grokking configuration into the Data Efficiency Hypothesis. We show that generalization occurs only when language models reach a critical size. Our results deepen the understanding of language model training, offering a novel perspective on the role of data in the learning mechanism of language models.
arXiv Detail & Related papers (2024-01-19T03:24:36Z)
Interpretable Medical Diagnostics with Structured Data Extraction by Large Language Models [59.89454513692417]
Tabular data is often hidden in text, particularly in medical diagnostic reports. We propose a novel, simple, and effective methodology for extracting structured tabular data from textual medical reports, called TEMED-LLM. We demonstrate that our approach significantly outperforms state-of-the-art text classification models in medical diagnostics.
arXiv Detail & Related papers (2023-06-08T09:12:28Z)
Revolutionizing Single Cell Analysis: The Power of Large Language Models for Cell Type Annotation [0.0]
Large language models such as ChatGPT and New Bing provide accurate annotations of cell types. By using ChatGPT to annotate single cell data, we can relate rare cell type to their function. This can have important applications in understanding cancer progression, mammalian development, and stem cell differentiation.
arXiv Detail & Related papers (2023-04-05T18:45:54Z)
Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z)
CCRL: Contrastive Cell Representation Learning [0.0]
We propose Contrastive Cell Representation Learning (CCRL) model for cell identification in H&E slides. We show that this model can outperform all currently available cell clustering models by a large margin across two datasets from different tissue types.
arXiv Detail & Related papers (2022-08-12T18:12:03Z)
Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training [86.91380874390778]
We present Generation-Augmented Pre-training (GAP), that jointly learns representations of natural language utterances and table schemas by leveraging generation models to generate pre-train data. Based on experimental results, neural semantics that leverage GAP MODEL obtain new state-of-the-art results on both SPIDER and CRITERIA-TO-generative benchmarks.
arXiv Detail & Related papers (2020-12-18T15:53:50Z)
Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements. We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations. Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z)
Data Augmentation for Spoken Language Understanding via Pretrained Language Models [113.56329266325902]
Training of spoken language understanding (SLU) models often faces the problem of data scarcity. We put forward a data augmentation method using pretrained language models to boost the variability and accuracy of generated utterances.
arXiv Detail & Related papers (2020-04-29T04:07:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.