Evaluating the state-of-the-art in mapping research spaces: a Brazilian
case study
- URL: http://arxiv.org/abs/2104.03338v1
- Date: Wed, 7 Apr 2021 18:14:41 GMT
- Title: Evaluating the state-of-the-art in mapping research spaces: a Brazilian
case study
- Authors: Francisco Galuppo Azevedo and Fabricio Murai
- Abstract summary: Two recent works propose methods for creating research maps from scientists' publication records.
We evaluate these models' ability to predict whether a given entity will enter a new field.
We conduct a case study to showcase how these models can be used to characterize science dynamics in the context of Brazil.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scientific knowledge cannot be seen as a set of isolated fields, but as a
highly connected network. Understanding how research areas are connected is of
paramount importance for adequately allocating funding and human resources
(e.g., assembling teams to tackle multidisciplinary problems). The relationship
between disciplines can be drawn from data on the trajectory of individual
scientists, as researchers often make contributions in a small set of
interrelated areas. Two recent works propose methods for creating research maps
from scientists' publication records: by using a frequentist approach to create
a transition probability matrix; and by learning embeddings (vector
representations). Surprisingly, these models were evaluated on different
datasets and have never been compared in the literature. In this work, we
compare both models in a systematic way, using a large dataset of publication
records from Brazilian researchers. We evaluate these models' ability to
predict whether a given entity (scientist, institution or region) will enter a
new field w.r.t. the area under the ROC curve. Moreover, we analyze how
sensitive each method is to the number of publications and the number of fields
associated to one entity. Last, we conduct a case study to showcase how these
models can be used to characterize science dynamics in the context of Brazil.
Related papers
- Retrieval-Enhanced Machine Learning: Synthesis and Opportunities [60.34182805429511]
Retrieval-enhancement can be extended to a broader spectrum of machine learning (ML)
This work introduces a formal framework of this paradigm, Retrieval-Enhanced Machine Learning (REML), by synthesizing the literature in various domains in ML with consistent notations which is missing from the current literature.
The goal of this work is to equip researchers across various disciplines with a comprehensive, formally structured framework of retrieval-enhanced models, thereby fostering interdisciplinary future research.
arXiv Detail & Related papers (2024-07-17T20:01:21Z) - A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery [68.48094108571432]
Large language models (LLMs) have revolutionized the way text and other modalities of data are handled.
We aim to provide a more holistic view of the research landscape by unveiling cross-field and cross-modal connections between scientific LLMs.
arXiv Detail & Related papers (2024-06-16T08:03:24Z) - A Survey of Decomposition-Based Evolutionary Multi-Objective Optimization: Part II -- A Data Science Perspective [4.322038460697958]
We build a knowledge graph that encapsulates more than 5,400 papers, 10,000 authors, 400 venues, and 1,600 institutions for MOEA/D research.
We also explore the collaboration and citation networks of MOEA/D, uncovering hidden patterns in the growth of literature.
arXiv Detail & Related papers (2024-04-22T14:38:58Z) - Large Models for Time Series and Spatio-Temporal Data: A Survey and
Outlook [95.32949323258251]
Temporal data, notably time series andtemporal-temporal data, are prevalent in real-world applications.
Recent advances in large language and other foundational models have spurred increased use in time series andtemporal data mining.
arXiv Detail & Related papers (2023-10-16T09:06:00Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - Mapping Research Trajectories [0.0]
We propose a principled approach for emphmapping research trajectories, which is applicable to all kinds of scientific entities.
Our visualizations depict the research topics of entities over time in a straightforward interpr. manner.
In a practical demonstrator application, we exemplify the proposed approach on a publication corpus from machine learning.
arXiv Detail & Related papers (2022-04-25T13:32:39Z) - Semantic and Relational Spaces in Science of Science: Deep Learning
Models for Article Vectorisation [4.178929174617172]
We focus on document-level embeddings based on the semantic and relational aspects of articles, using Natural Language Processing (NLP) and Graph Neural Networks (GNNs)
Our results show that using NLP we can encode a semantic space of articles, while with GNN we are able to build a relational space where the social practices of a research community are also encoded.
arXiv Detail & Related papers (2020-11-05T14:57:41Z) - A Survey of Embedding Space Alignment Methods for Language and Knowledge
Graphs [77.34726150561087]
We survey the current research landscape on word, sentence and knowledge graph embedding algorithms.
We provide a classification of the relevant alignment techniques and discuss benchmark datasets used in this field of research.
arXiv Detail & Related papers (2020-10-26T16:08:13Z) - Method and Dataset Entity Mining in Scientific Literature: A CNN +
Bi-LSTM Model with Self-attention [21.93889297841459]
We propose a novel entity recognition model, called MDER, which is able to effectively extract the method and dataset entities from scientific papers.
We evaluate the proposed model on datasets constructed from the published papers of four research areas in computer science, i.e., NLP, CV, Data Mining and AI.
arXiv Detail & Related papers (2020-10-26T13:38:43Z) - Topic Space Trajectories: A case study on machine learning literature [0.0]
We present topic space trajectories, a structure that allows for the comprehensible tracking of research topics.
We show the applicability of our approach on a publication corpus spanning 50 years of machine learning research from 32 publication venues.
Our novel analysis method may be employed for paper classification, for the prediction of future research topics, and for the recommendation of fitting conferences and journals for submitting unpublished work.
arXiv Detail & Related papers (2020-10-23T10:53:42Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.