PGB: A PubMed Graph Benchmark for Heterogeneous Network Representation
Learning
- URL: http://arxiv.org/abs/2305.02691v3
- Date: Fri, 25 Aug 2023 05:24:59 GMT
- Title: PGB: A PubMed Graph Benchmark for Heterogeneous Network Representation
Learning
- Authors: Eric W Lee, Joyce C Ho
- Abstract summary: We introduce PubMed Graph Benchmark (PGB), a new benchmark for evaluating heterogeneous graph embeddings for biomedical literature.
The benchmark contains rich metadata including abstract authors, citations, MeSH hierarchy, MeSH hierarchy and other information.
- Score: 5.747361083768407
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There has been rapid growth in biomedical literature, yet capturing the
heterogeneity of the bibliographic information of these articles remains
relatively understudied. Although graph mining research via heterogeneous graph
neural networks has taken center stage, it remains unclear whether these
approaches capture the heterogeneity of the PubMed database, a vast digital
repository containing over 33 million articles. We introduce PubMed Graph
Benchmark (PGB), a new benchmark dataset for evaluating heterogeneous graph
embeddings for biomedical literature. The benchmark contains rich metadata
including abstract, authors, citations, MeSH terms, MeSH hierarchy, and some
other information. The benchmark contains three different evaluation tasks
encompassing systematic reviews, node classification, and node clustering. In
PGB, we aggregate the metadata associated with the biomedical articles from
PubMed into a unified source and make the benchmark publicly available for any
future works.
Related papers
- The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges [101.83124435649358]
Homophily principle, ie nodes with the same labels or similar attributes are more likely to be connected.
Recent work has identified a non-trivial set of datasets where GNN's performance compared to the NN's is not satisfactory.
arXiv Detail & Related papers (2024-07-12T18:04:32Z) - Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features.
We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv Detail & Related papers (2024-03-19T03:59:14Z) - Integrating curation into scientific publishing to train AI models [1.6982459897303823]
We have embedded multimodal data curation into the academic publishing process to annotate segmented figure panels and captions.
The dataset, SourceData-NLP, contains more than 620,000 annotated biomedical entities.
We evaluate the utility of the dataset to train AI models using named-entity recognition, segmentation of figure captions into their constituent panels, and a novel context-dependent semantic task.
arXiv Detail & Related papers (2023-10-31T13:22:38Z) - Applying BioBERT to Extract Germline Gene-Disease Associations for Building a Knowledge Graph from the Biomedical Literature [0.0]
This paper presents SimpleGermKG, an automatic knowledge graph construction approach that connects germline genes and diseases.
For the extraction of genes and diseases, we employ BioBERT, a pre-trained BERT model on biomedical corpora.
For semantic relationships between articles, genes, and diseases, we implemented a part-whole relation approach.
Our knowledge graph contains 297 genes, 130 diseases, and 46,747 triples.
arXiv Detail & Related papers (2023-09-11T18:05:12Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - SHGNN: Structure-Aware Heterogeneous Graph Neural Network [77.78459918119536]
This paper proposes a novel Structure-Aware Heterogeneous Graph Neural Network (SHGNN) to address the above limitations.
We first utilize a feature propagation module to capture the local structure information of intermediate nodes in the meta-path.
Next, we use a tree-attention aggregator to incorporate the graph structure information into the aggregation module on the meta-path.
Finally, we leverage a meta-path aggregator to fuse the information aggregated from different meta-paths.
arXiv Detail & Related papers (2021-12-12T14:18:18Z) - Scientific Language Models for Biomedical Knowledge Base Completion: An
Empirical Study [62.376800537374024]
We study scientific LMs for KG completion, exploring whether we can tap into their latent knowledge to enhance biomedical link prediction.
We integrate the LM-based models with KG embedding models, using a router method that learns to assign each input example to either type of model and provides a substantial boost in performance.
arXiv Detail & Related papers (2021-06-17T17:55:33Z) - Weakly-supervised Graph Meta-learning for Few-shot Node Classification [53.36828125138149]
We propose a new graph meta-learning framework -- Graph Hallucination Networks (Meta-GHN)
Based on a new robustness-enhanced episodic training, Meta-GHN is meta-learned to hallucinate clean node representations from weakly-labeled data.
Extensive experiments demonstrate the superiority of Meta-GHN over existing graph meta-learning studies.
arXiv Detail & Related papers (2021-06-12T22:22:10Z) - A Literature Review of Recent Graph Embedding Techniques for Biomedical
Data [36.446560017794845]
Many graph-based learning methods have been proposed to analyze such type of data.
The main difficulty is how to handle high dimensionality and sparsity of the biomedical graphs.
graph embedding methods provide an effective and efficient way to address the above issues.
arXiv Detail & Related papers (2021-01-17T01:53:50Z) - Biomedical Knowledge Graph Refinement and Completion using Graph
Representation Learning and Top-K Similarity Measure [1.4660617536303606]
This work demonstrates learning discrete representations of the integrated biomedical knowledge graph Chem2Bio2RD.
We perform a knowledge graph completion and refinement task using a simple top-K cosine similarity measure between the learned embedding vectors.
arXiv Detail & Related papers (2020-12-18T22:19:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.