Unsupervised embedding of trajectories captures the latent structure of
scientific migration
- URL: http://arxiv.org/abs/2012.02785v3
- Date: Fri, 17 Nov 2023 20:32:42 GMT
- Title: Unsupervised embedding of trajectories captures the latent structure of
scientific migration
- Authors: Dakota Murray, Jisung Yoon, Sadamori Kojaku, Rodrigo Costas, Woo-Sung
Jung, Sta\v{s}a Milojevi\'c, Yong-Yeol Ahn
- Abstract summary: We show the ability of the model word2vec to encode nuanced relationships between discrete locations from migration trajectories.
We show that the power of word2vec to encode migration patterns stems from its mathematical equivalence with the gravity model of mobility.
Using techniques that leverage its semantic structure, we demonstrate that embeddings can learn the rich structure that underpins scientific migration.
- Score: 4.028844692958469
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human migration and mobility drives major societal phenomena including
epidemics, economies, innovation, and the diffusion of ideas. Although human
mobility and migration have been heavily constrained by geographic distance
throughout the history, advances and globalization are making other factors
such as language and culture increasingly more important. Advances in neural
embedding models, originally designed for natural language, provide an
opportunity to tame this complexity and open new avenues for the study of
migration. Here, we demonstrate the ability of the model word2vec to encode
nuanced relationships between discrete locations from migration trajectories,
producing an accurate, dense, continuous, and meaningful vector-space
representation. The resulting representation provides a functional distance
between locations, as well as a digital double that can be distributed,
re-used, and itself interrogated to understand the many dimensions of
migration. We show that the unique power of word2vec to encode migration
patterns stems from its mathematical equivalence with the gravity model of
mobility. Focusing on the case of scientific migration, we apply word2vec to a
database of three million migration trajectories of scientists derived from the
affiliations listed on their publication records. Using techniques that
leverage its semantic structure, we demonstrate that embeddings can learn the
rich structure that underpins scientific migration, such as cultural,
linguistic, and prestige relationships at multiple levels of granularity. Our
results provide a theoretical foundation and methodological framework for using
neural embeddings to represent and understand migration both within and beyond
science.
Related papers
- Combining Twitter and Mobile Phone Data to Observe Border-Rush: The Turkish-European Border Opening [2.5693085674985117]
Following Turkey's 2020 decision to revoke border controls, many individuals journeyed towards the Greek, Bulgarian, and Turkish borders.
However, the lack of verifiable statistics on irregular migration and discrepancies between media reports and actual migration patterns require further exploration.
This study is to bridge this knowledge gap by harnessing novel data sources, specifically mobile phone and Twitter data.
arXiv Detail & Related papers (2024-05-21T09:51:15Z) - Named Entity Recognition Under Domain Shift via Metric Learning for Life Sciences [55.185456382328674]
We investigate the applicability of transfer learning for enhancing a named entity recognition model.
Our model consists of two stages: 1) entity grouping in the source domain, which incorporates knowledge from annotated events to establish relations between entities, and 2) entity discrimination in the target domain, which relies on pseudo labeling and contrastive learning to enhance discrimination between the entities in the two domains.
arXiv Detail & Related papers (2024-01-19T03:49:28Z) - SciGLM: Training Scientific Language Models with Self-Reflective
Instruction Annotation and Tuning [60.14510984576027]
SciGLM is a suite of scientific language models able to conduct college-level scientific reasoning.
We apply a self-reflective instruction annotation framework to generate step-by-step reasoning for unlabelled scientific questions.
We fine-tuned the ChatGLM family of language models with SciInstruct, enhancing their scientific and mathematical reasoning capabilities.
arXiv Detail & Related papers (2024-01-15T20:22:21Z) - Unveiling A Core Linguistic Region in Large Language Models [49.860260050718516]
This paper conducts an analogical research using brain localization as a prototype.
We have discovered a core region in large language models that corresponds to linguistic competence.
We observe that an improvement in linguistic competence does not necessarily accompany an elevation in the model's knowledge level.
arXiv Detail & Related papers (2023-10-23T13:31:32Z) - The diaspora model for human migration [0.07852714805965527]
Existing models primarily rely on population size and travel distance to explain flow fluctuations.
We propose the diaspora model of migration, incorporating intensity (the number of people moving to a country) and assortativity (the destination within the country)
Our model considers only the existing diaspora sizes in the destination country, influencing the probability of migrants selecting a specific residence.
arXiv Detail & Related papers (2023-09-06T15:17:53Z) - Priority-Centric Human Motion Generation in Discrete Latent Space [59.401128190423535]
We introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM) for text-to-motion generation.
M2DM incorporates a global self-attention mechanism and a regularization term to counteract code collapse.
We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token.
arXiv Detail & Related papers (2023-08-28T10:40:16Z) - MoDi: Unconditional Motion Synthesis from Diverse Data [51.676055380546494]
We present MoDi, an unconditional generative model that synthesizes diverse motions.
Our model is trained in a completely unsupervised setting from a diverse, unstructured and unlabeled motion dataset.
We show that despite the lack of any structure in the dataset, the latent space can be semantically clustered.
arXiv Detail & Related papers (2022-06-16T09:06:25Z) - Recognition of Implicit Geographic Movement in Text [3.3241479835797123]
Analyzing the geographic movement of humans, animals, and other phenomena is a growing field of research.
We created a corpus of sentences labeled as describing geographic movement or not.
We developed an iterative process employing hand labeling, crowd voting for confirmation, and machine learning to predict more labels.
arXiv Detail & Related papers (2022-01-30T12:22:55Z) - Investigating internal migration with network analysis and latent space
representations: An application to Turkey [0.0]
We provide an in-depth investigation into the structure and dynamics of the internal migration in Turkey from 2008 to 2020.
We identify a set of classical migration laws and examine them via various methods for signed network analysis, ego network analysis, representation learning, temporal stability analysis, and network visualization.
The findings show that, in line with the classical migration laws, most migration links are geographically bounded with several exceptions involving cities with large economic activity.
arXiv Detail & Related papers (2022-01-10T18:58:02Z) - Capturing the diversity of multilingual societies [0.0]
We consider the processes at work in language shift through a conjunction of theoretical and data-driven perspectives.
A large-scale empirical study of spatial patterns of languages in multilingual societies using Twitter and census data yields a wide diversity.
We propose a model in which coexistence of languages may be reached when learning the other language is facilitated and when bilinguals favor the use of the endangered language.
arXiv Detail & Related papers (2021-05-06T10:27:43Z) - Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models.
In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them.
We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.