Related papers: Semantic and Relational Spaces in Science of Science: Deep Learning Models for Article Vectorisation

Semantic and Relational Spaces in Science of Science: Deep Learning Models for Article Vectorisation

URL: http://arxiv.org/abs/2011.02887v1
Date: Thu, 5 Nov 2020 14:57:41 GMT
Title: Semantic and Relational Spaces in Science of Science: Deep Learning Models for Article Vectorisation
Authors: Diego Kozlowski, Jennifer Dusdal, Jun Pang and Andreas Zilian
Abstract summary: We focus on document-level embeddings based on the semantic and relational aspects of articles, using Natural Language Processing (NLP) and Graph Neural Networks (GNNs) Our results show that using NLP we can encode a semantic space of articles, while with GNN we are able to build a relational space where the social practices of a research community are also encoded.
Score: 4.178929174617172
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Over the last century, we observe a steady and exponentially growth of scientific publications globally. The overwhelming amount of available literature makes a holistic analysis of the research within a field and between fields based on manual inspection impossible. Automatic techniques to support the process of literature review are required to find the epistemic and social patterns that are embedded in scientific publications. In computer sciences, new tools have been developed to deal with large volumes of data. In particular, deep learning techniques open the possibility of automated end-to-end models to project observations to a new, low-dimensional space where the most relevant information of each observation is highlighted. Using deep learning to build new representations of scientific publications is a growing but still emerging field of research. The aim of this paper is to discuss the potential and limits of deep learning for gathering insights about scientific research articles. We focus on document-level embeddings based on the semantic and relational aspects of articles, using Natural Language Processing (NLP) and Graph Neural Networks (GNNs). We explore the different outcomes generated by those techniques. Our results show that using NLP we can encode a semantic space of articles, while with GNN we are able to build a relational space where the social practices of a research community are also encoded.

Related papers

Accelerating Scientific Research with Gemini: Case Studies and Common Techniques [105.15622072347811]
Large language models (LLMs) have opened new avenues for accelerating scientific research.<n>We present a collection of case studies demonstrating how researchers have successfully collaborated with advanced AI models.
arXiv Detail & Related papers (2026-02-03T18:56:17Z)
Large-Scale Multidimensional Knowledge Profiling of Scientific Literature [46.15403461273178]
We compile a unified corpus of more than 100,000 papers from 22 major conferences between 2020 and 2025.<n>Our analysis highlights several notable shifts, including the growth of safety, multimodal reasoning, and agent-oriented studies.<n>These findings provide an evidence-based view of how AI research is evolving and offer a resource for understanding broader trends and identifying emerging directions.
arXiv Detail & Related papers (2026-01-21T16:47:05Z)
Small Language Models Offer Significant Potential for Science Community [5.061460306853287]
MiniLM is a framework to evaluate the feasibility of precise, rapid, and cost-effective information retrieval from extensive geoscience literature.<n>A curated corpus of approximately 77 million high-quality sentences was extracted from 95 leading peer-reviewed geoscience journals.<n>MiniLM holds significant potential within the geoscience community for applications such as fact and image retrievals, trend analyses, contradiction analyses, and educational purposes.
arXiv Detail & Related papers (2025-10-18T07:25:05Z)
Predicting New Research Directions in Materials Science using Large Language Models and Concept Graphs [30.813288388998256]
We show that large language models (LLMs) can extract concepts more efficiently than automated keyword extraction methods.<n>A machine learning model is trained to predict emerging combinations of concepts, based on historical data.<n>We show that the model can inspire materials scientists in their creative thinking process by predicting innovative combinations of topics that have not yet been investigated.
arXiv Detail & Related papers (2025-06-20T08:26:12Z)
Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation [58.064940977804596]
A plethora of new AI models and tools has been proposed, promising to empower researchers and academics worldwide to conduct their research more effectively and efficiently. Ethical concerns regarding shortcomings of these tools and potential for misuse take a particularly prominent place in our discussion.
arXiv Detail & Related papers (2025-02-07T18:26:45Z)
pathfinder: A Semantic Framework for Literature Review and Knowledge Discovery in Astronomy [2.6952253149772996]
Pathfinder is a machine learning framework designed to enable literature review and knowledge discovery in astronomy. Our framework couples advanced retrieval techniques with LLM-based synthesis to search astronomical literature by semantic context. It addresses complexities of jargon, named entities, and temporal aspects through time-based and citation-based weighting schemes.
arXiv Detail & Related papers (2024-08-02T20:05:24Z)
Retrieval-Enhanced Machine Learning: Synthesis and Opportunities [60.34182805429511]
Retrieval-enhancement can be extended to a broader spectrum of machine learning (ML) This work introduces a formal framework of this paradigm, Retrieval-Enhanced Machine Learning (REML), by synthesizing the literature in various domains in ML with consistent notations which is missing from the current literature. The goal of this work is to equip researchers across various disciplines with a comprehensive, formally structured framework of retrieval-enhanced models, thereby fostering interdisciplinary future research.
arXiv Detail & Related papers (2024-07-17T20:01:21Z)
Ontology Embedding: A Survey of Methods, Applications and Resources [54.3453925775069]
Ontologies are widely used for representing domain knowledge and meta data. One straightforward solution is to integrate statistical analysis and machine learning. Numerous papers have been published on embedding, but a lack of systematic reviews hinders researchers from gaining a comprehensive understanding of this field.
arXiv Detail & Related papers (2024-06-16T14:49:19Z)
A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery [68.48094108571432]
Large language models (LLMs) have revolutionized the way text and other modalities of data are handled. We aim to provide a more holistic view of the research landscape by unveiling cross-field and cross-modal connections between scientific LLMs.
arXiv Detail & Related papers (2024-06-16T08:03:24Z)
MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows [58.56005277371235]
We introduce MASSW, a comprehensive text dataset on Multi-Aspect Summarization of ScientificAspects. MASSW includes more than 152,000 peer-reviewed publications from 17 leading computer science conferences spanning the past 50 years. We demonstrate the utility of MASSW through multiple novel machine-learning tasks that can be benchmarked using this new dataset.
arXiv Detail & Related papers (2024-06-10T15:19:09Z)
The Semantic Scholar Open Data Platform [79.4493235243312]
Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature. We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF content extraction and automatic knowledge graph construction. The graph includes advanced semantic features such as structurally parsed text, natural language summaries, and vector embeddings.
arXiv Detail & Related papers (2023-01-24T17:13:08Z)
Mapping Research Trajectories [0.0]
We propose a principled approach for emphmapping research trajectories, which is applicable to all kinds of scientific entities. Our visualizations depict the research topics of entities over time in a straightforward interpr. manner. In a practical demonstrator application, we exemplify the proposed approach on a publication corpus from machine learning.
arXiv Detail & Related papers (2022-04-25T13:32:39Z)
CitationIE: Leveraging the Citation Graph for Scientific Information Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers. We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z)
Evaluating the state-of-the-art in mapping research spaces: a Brazilian case study [0.0]
Two recent works propose methods for creating research maps from scientists' publication records. We evaluate these models' ability to predict whether a given entity will enter a new field. We conduct a case study to showcase how these models can be used to characterize science dynamics in the context of Brazil.
arXiv Detail & Related papers (2021-04-07T18:14:41Z)
Generating Knowledge Graphs by Employing Natural Language Processing and Machine Learning Techniques within the Scholarly Domain [1.9004296236396943]
We present a new architecture that takes advantage of Natural Language Processing and Machine Learning methods for extracting entities and relationships from research publications. Within this research work, we i) tackle the challenge of knowledge extraction by employing several state-of-the-art Natural Language Processing and Text Mining tools. We generated a scientific knowledge graph including 109,105 triples, extracted from 26,827 abstracts of papers within the Semantic Web domain.
arXiv Detail & Related papers (2020-10-28T08:31:40Z)
Topic Space Trajectories: A case study on machine learning literature [0.0]
We present topic space trajectories, a structure that allows for the comprehensible tracking of research topics. We show the applicability of our approach on a publication corpus spanning 50 years of machine learning research from 32 publication venues. Our novel analysis method may be employed for paper classification, for the prediction of future research topics, and for the recommendation of fitting conferences and journals for submitting unpublished work.
arXiv Detail & Related papers (2020-10-23T10:53:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.