SemOpenAlex: The Scientific Landscape in 26 Billion RDF Triples
- URL: http://arxiv.org/abs/2308.03671v1
- Date: Mon, 7 Aug 2023 15:46:39 GMT
- Title: SemOpenAlex: The Scientific Landscape in 26 Billion RDF Triples
- Authors: Michael F\"arber, David Lamprecht, Johan Krause, Linn Aung, Peter
Haase
- Abstract summary: SemOpenAlex is an extensive RDF knowledge graph that contains over 26 billion triples about scientific publications and their associated entities.
We offer the data through multiple channels, including RDF dump files, a SPARQL endpoint, and as a data source in the Linked Open Data cloud.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present SemOpenAlex, an extensive RDF knowledge graph that contains over
26 billion triples about scientific publications and their associated entities,
such as authors, institutions, journals, and concepts. SemOpenAlex is licensed
under CC0, providing free and open access to the data. We offer the data
through multiple channels, including RDF dump files, a SPARQL endpoint, and as
a data source in the Linked Open Data cloud, complete with resolvable URIs and
links to other data sources. Moreover, we provide embeddings for knowledge
graph entities using high-performance computing. SemOpenAlex enables a broad
range of use-case scenarios, such as exploratory semantic search via our
website, large-scale scientific impact quantification, and other forms of
scholarly big data analytics within and across scientific disciplines.
Additionally, it enables academic recommender systems, such as recommending
collaborators, publications, and venues, including explainability capabilities.
Finally, SemOpenAlex can serve for RDF query optimization benchmarks, creating
scholarly knowledge-guided language models, and as a hub for semantic
scientific publishing.
Related papers
- Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval [49.42043077545341]
We propose a knowledge-aware query expansion framework, augmenting LLMs with structured document relations from knowledge graph (KG)
We leverage document texts as rich KG node representations and use document-based relation filtering for our Knowledge-Aware Retrieval (KAR)
arXiv Detail & Related papers (2024-10-17T17:03:23Z) - Contri(e)ve: Context + Retrieve for Scholarly Question Answering [0.0]
We present a two step solution using open source Large Language Model(LLM): Llama3.1 for Scholarly-QALD dataset.
Firstly, we extract the context pertaining to the question from different structured and unstructured data sources.
Secondly, we implement prompt engineering to improve the information retrieval performance of the LLM.
arXiv Detail & Related papers (2024-09-13T17:38:47Z) - Query of CC: Unearthing Large Scale Domain-Specific Knowledge from
Public Corpora [104.16648246740543]
We propose an efficient data collection method based on large language models.
The method bootstraps seed information through a large language model and retrieves related data from public corpora.
It not only collects knowledge-related data for specific domains but unearths the data with potential reasoning procedures.
arXiv Detail & Related papers (2024-01-26T03:38:23Z) - Linked Papers With Code: The Latest in Machine Learning as an RDF
Knowledge Graph [1.450405446885067]
We introduce Linked Papers With Code, an RDF knowledge graph that provides comprehensive, current information about almost 400,000 machine learning publications.
Compared to its non-RDF-based counterpart Papers With Code, LPWC translates the latest advancements in machine learning into RDF format.
As a knowledge graph in the Linked Open Data cloud, we offer LPWC in multiple formats from RDF dump files to SPARQL endpoint for direct web queries.
arXiv Detail & Related papers (2023-10-31T14:09:15Z) - The Semantic Scholar Open Data Platform [79.4493235243312]
Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature.
We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF content extraction and automatic knowledge graph construction.
The graph includes advanced semantic features such as structurally parsed text, natural language summaries, and vector embeddings.
arXiv Detail & Related papers (2023-01-24T17:13:08Z) - Lessons from Deep Learning applied to Scholarly Information Extraction:
What Works, What Doesn't, and Future Directions [12.62863659147376]
We show how EneRex can extract key insights from a large-scale dataset in the domain of computer science.
We highlight how the existing datasets are limited in their capacity and how EneRex may fit into an existing knowledge graph.
arXiv Detail & Related papers (2022-07-08T17:37:56Z) - Open Domain Question Answering over Virtual Documents: A Unified
Approach for Data and Text [62.489652395307914]
We use the data-to-text method as a means for encoding structured knowledge for knowledge-intensive applications, i.e. open-domain question answering (QA)
Specifically, we propose a verbalizer-retriever-reader framework for open-domain QA over data and text where verbalized tables from Wikipedia and triples from Wikidata are used as augmented knowledge sources.
We show that our Unified Data and Text QA, UDT-QA, can effectively benefit from the expanded knowledge index, leading to large gains over text-only baselines.
arXiv Detail & Related papers (2021-10-16T00:11:21Z) - Graph integration of structured, semistructured and unstructured data
for data journalism [0.0]
We describe a complete approach for integrating dynamic sets of heterogeneous data sources.
Our approach is implemented within the ConnectionLens system; we validate it through a set of experiments.
arXiv Detail & Related papers (2020-07-23T08:55:09Z) - SciREX: A Challenge Dataset for Document-Level Information Extraction [56.83748634747753]
It is challenging to create a large-scale information extraction dataset at the document level.
We introduce SciREX, a document level IE dataset that encompasses multiple IE tasks.
We develop a neural model as a strong baseline that extends previous state-of-the-art IE models to document-level IE.
arXiv Detail & Related papers (2020-05-01T17:30:10Z) - ENT-DESC: Entity Description Generation by Exploring Knowledge Graph [53.03778194567752]
In practice, the input knowledge could be more than enough, since the output description may only cover the most significant knowledge.
We introduce a large-scale and challenging dataset to facilitate the study of such a practical scenario in KG-to-text.
We propose a multi-graph structure that is able to represent the original graph information more comprehensively.
arXiv Detail & Related papers (2020-04-30T14:16:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.