Data-Driven Information Extraction and Enrichment of Molecular Profiling
Data for Cancer Cell Lines
- URL: http://arxiv.org/abs/2307.00933v2
- Date: Mon, 12 Feb 2024 11:43:15 GMT
- Title: Data-Driven Information Extraction and Enrichment of Molecular Profiling
Data for Cancer Cell Lines
- Authors: Ellery Smith, Rahel Paloots, Dimitris Giagkos, Michael Baudis, Kurt
Stockinger
- Abstract summary: This work presents the design, implementation and application of a novel data extraction and exploration system.
We introduce a new public data exploration portal, which enables automatic linking of genomic copy number variants plots with ranked, related entities.
Our system is publicly available on the web at https://cancercelllines.org.
- Score: 1.1999555634662633
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the proliferation of research means and computational methodologies,
published biomedical literature is growing exponentially in numbers and volume.
Cancer cell lines are frequently used models in biological and medical research
that are currently applied for a wide range of purposes, from studies of
cellular mechanisms to drug development, which has led to a wealth of related
data and publications. Sifting through large quantities of text to gather
relevant information on the cell lines of interest is tedious and extremely
slow when performed by humans. Hence, novel computational information
extraction and correlation mechanisms are required to boost meaningful
knowledge extraction. In this work, we present the design, implementation and
application of a novel data extraction and exploration system. This system
extracts deep semantic relations between textual entities from scientific
literature to enrich existing structured clinical data in the domain of cancer
cell lines. We introduce a new public data exploration portal, which enables
automatic linking of genomic copy number variants plots with ranked, related
entities such as affected genes. Each relation is accompanied by
literature-derived evidences, allowing for deep, yet rapid, literature search,
using existing structured data as a springboard. Our system is publicly
available on the web at https://cancercelllines.org
Related papers
- UniCell: Universal Cell Nucleus Classification via Prompt Learning [76.11864242047074]
We propose a universal cell nucleus classification framework (UniCell)
It employs a novel prompt learning mechanism to uniformly predict the corresponding categories of pathological images from different dataset domains.
In particular, our framework adopts an end-to-end architecture for nuclei detection and classification, and utilizes flexible prediction heads for adapting various datasets.
arXiv Detail & Related papers (2024-02-20T11:50:27Z) - Graph-Based Retriever Captures the Long Tail of Biomedical Knowledge [2.2814097119704058]
Large language models (LLMs) are transforming the way information is retrieved with vast amounts of knowledge being summarized and presented.
LLMs are prone to highlight the most frequently seen pieces of information from the training set and to neglect the rare ones.
We introduce a novel information-retrieval method that leverages a knowledge graph to downsample these clusters and mitigate the information overload problem.
arXiv Detail & Related papers (2024-02-19T18:31:11Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Integrating curation into scientific publishing to train AI models [1.6982459897303823]
We have embedded multimodal data curation into the academic publishing process to annotate segmented figure panels and captions.
The dataset, SourceData-NLP, contains more than 620,000 annotated biomedical entities.
We evaluate the utility of the dataset to train AI models using named-entity recognition, segmentation of figure captions into their constituent panels, and a novel context-dependent semantic task.
arXiv Detail & Related papers (2023-10-31T13:22:38Z) - Descriptive Knowledge Graph in Biomedical Domain [26.91431888505873]
We present a novel system that automatically extracts and generates informative and descriptive sentences from the biomedical corpus.
Unlike previous search engines or exploration systems that retrieve unconnected passages, our system organizes descriptive sentences as a graph.
We spotlight the application of our system in COVID-19 research, illustrating its utility in areas such as drug repurposing and literature curation.
arXiv Detail & Related papers (2023-10-18T03:10:25Z) - Machine Learning Approach for Cancer Entities Association and
Classification [0.0]
The study uses the two most non-trivial NLP, Natural Language Processing functions, Entity Recognition, and text classification to discover knowledge from biomedical literature.
Named Entity Recognition (NER) recognizes and extracts the predefined entities related to cancer from unstructured text with the support of a user-friendly interface and built-in dictionaries.
Text classification helps to explore the insights into the text and simplifies data categorization, querying, and article screening.
arXiv Detail & Related papers (2023-05-30T07:36:12Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - DeepShovel: An Online Collaborative Platform for Data Extraction in
Geoscience Literature with AI Assistance [48.55345030503826]
Geoscientists need to read a huge amount of literature to locate, extract, and aggregate relevant results and data.
DeepShovel is a publicly-available AI-assisted data extraction system to support their needs.
A follow-up user evaluation with 14 researchers suggested DeepShovel improved users' efficiency of data extraction for building scientific databases.
arXiv Detail & Related papers (2022-02-21T12:18:08Z) - BioIE: Biomedical Information Extraction with Multi-head Attention
Enhanced Graph Convolutional Network [9.227487525657901]
We propose Biomedical Information Extraction, a hybrid neural network to extract relations from biomedical text and unstructured medical reports.
We evaluate our model on two major biomedical relationship extraction tasks, chemical-disease relation and chemical-protein interaction, and a cross-hospital pan-cancer pathology report corpus.
arXiv Detail & Related papers (2021-10-26T13:19:28Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.