Related papers: 'Moving On' -- Investigating Inventors' Ethnic Origins Using Supervised Learning

'Moving On' -- Investigating Inventors' Ethnic Origins Using Supervised Learning

URL: http://arxiv.org/abs/2201.00578v1
Date: Mon, 3 Jan 2022 10:47:47 GMT
Title: 'Moving On' -- Investigating Inventors' Ethnic Origins Using Supervised Learning
Authors: Matthias Niggli
Abstract summary: Patent data provides rich information about technical inventions, but does not disclose the ethnic origin of inventors. I construct a dataset of 95'202 labeled names and train an artificial recurrent neural network with long-short-term memory (LSTM) to predict ethnic origins. I use this model to classify and investigate the ethnic origins of 2.68 million inventors and provide novel descriptive evidence regarding their ethnic origin composition.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Patent data provides rich information about technical inventions, but does not disclose the ethnic origin of inventors. In this paper, I use supervised learning techniques to infer this information. To do so, I construct a dataset of 95'202 labeled names and train an artificial recurrent neural network with long-short-term memory (LSTM) to predict ethnic origins based on names. The trained network achieves an overall performance of 91% across 17 ethnic origins. I use this model to classify and investigate the ethnic origins of 2.68 million inventors and provide novel descriptive evidence regarding their ethnic origin composition over time and across countries and technological fields. The global ethnic origin composition has become more diverse over the last decades, which was mostly due to a relative increase of Asian origin inventors. Furthermore, the prevalence of foreign-origin inventors is especially high in the USA, but has also increased in other high-income economies. This increase was mainly driven by an inflow of non-western inventors into emerging high-technology fields for the USA, but not for other high-income countries.

Related papers

Differentiating Emigration from Return Migration of Scholars Using Name-Based Nationality Detection Models [0.0]
Most web and digital trace data do not include information about an individual's nationality due to privacy concerns.<n>We propose methods to detect the nationality with the least available data, i.e., full names.<n>Our results show that using the country of first publication as a proxy for nationality underestimates the size of return flows.
arXiv Detail & Related papers (2025-05-09T15:03:39Z)
Brief analysis of DeepSeek R1 and its implications for Generative AI [0.0]
DeepSeek released their new reasoning model (DeepSeek R1) in January 2025. This report discusses the model, and what its release means for the field of Generative AI more widely.
arXiv Detail & Related papers (2025-02-04T17:45:32Z)
Neural-Symbolic Reasoning over Knowledge Graphs: A Survey from a Query Perspective [55.79507207292647]
Knowledge graph reasoning is pivotal in various domains such as data mining, artificial intelligence, the Web, and social sciences. The rise of Neural AI marks a significant advancement, merging the robustness of deep learning with the precision of symbolic reasoning. The advent of large language models (LLMs) has opened new frontiers in knowledge graph reasoning.
arXiv Detail & Related papers (2024-11-30T18:54:08Z)
Deepfake Media Forensics: State of the Art and Challenges Ahead [51.33414186878676]
AI-generated synthetic media, also called Deepfakes, have influenced so many domains, from entertainment to cybersecurity. Deepfake detection has become a vital area of research, focusing on identifying subtle inconsistencies and artifacts with machine learning techniques. This paper reviews the primary algorithms that address these challenges, examining their advantages, limitations, and future prospects.
arXiv Detail & Related papers (2024-08-01T08:57:47Z)
Graph Representation Learning Towards Patents Network Analysis [2.202803272456695]
This research employed a graph representation learning approach to create, analyze, and find similarities in the patent data registered in the Iranian Official Gazette. Key entities were extracted from the scrapped patents dataset to create the Iranian patents graph from scratch. Thanks to the utilization of novel graph algorithms and text mining methods, we identified new areas of industry and research from Iranian patent data.
arXiv Detail & Related papers (2023-09-25T05:49:40Z)
Has China caught up to the US in AI research? An exploration of mimetic isomorphism as a model for late industrializers [9.03136346887569]
We examine China's AI development process, demonstrating that it is characterized by rapid learning and differentiation. By 2018, the time lag between China and the USA in addressing AI research topics had evaporated. This finding suggests that China has effectively bridged a significant knowledge gap and could potentially be setting out on an independent research trajectory.
arXiv Detail & Related papers (2023-07-11T19:59:54Z)
The Future of Fundamental Science Led by Generative Closed-Loop Artificial Intelligence [67.70415658080121]
Recent advances in machine learning and AI are disrupting technological innovation, product development, and society as a whole. AI has contributed less to fundamental science in part because large data sets of high-quality data for scientific practice and model discovery are more difficult to access. Here we explore and investigate aspects of an AI-driven, automated, closed-loop approach to scientific discovery.
arXiv Detail & Related papers (2023-07-09T21:16:56Z)
Neural Machine Translation for the Indigenous Languages of the Americas: An Introduction [102.13536517783837]
Most languages from the Americas are among them, having a limited amount of parallel and monolingual data, if any. We discuss the recent advances and findings and open questions, product of an increased interest of the NLP community in these languages.
arXiv Detail & Related papers (2023-06-11T23:27:47Z)
NeuroComparatives: Neuro-Symbolic Distillation of Comparative Knowledge [48.17483161013775]
We introduce NeuroComparatives, a novel framework for comparative knowledge distillation. Our framework produces a corpus of up to 8.8M comparisons over 1.74M entity pairs. Human evaluations show that NeuroComparatives outperform existing resources in terms of validity.
arXiv Detail & Related papers (2023-05-08T18:20:36Z)
Informed Machine Learning, Centrality, CNN, Relevant Document Detection, Repatriation of Indigenous Human Remains [1.3299507495084417]
This article reports on collaborative research by data scientists and social science researchers in the Research, Reconcile, Renew Network (RRR) to develop and apply text mining techniques. We describe our work to date on developing a machine learning-based solution to automate the process of finding and semantically analysing relevant texts. To improve the accuracy of our detection model, we explore the use of an Informed Neural Network (INN) model that describes documentary content using expert-informed contextual knowledge.
arXiv Detail & Related papers (2023-03-25T14:08:21Z)
Towards Data-and Knowledge-Driven Artificial Intelligence: A Survey on Neuro-Symbolic Computing [73.0977635031713]
Neural-symbolic computing (NeSy) has been an active research area of Artificial Intelligence (AI) for many years. NeSy shows promise of reconciling the advantages of reasoning and interpretability of symbolic representation and robust learning in neural networks.
arXiv Detail & Related papers (2022-10-28T04:38:10Z)
Relying on recent and temporally dispersed science predicts breakthrough inventions [1.2930336259963562]
We use a large corpus of patents and derive features characterizing how patents temporally search in the scientific space. We find that patents that cite scientific papers have more citations and substantially more likely to become breakthroughs.
arXiv Detail & Related papers (2021-07-19T22:08:33Z)
Influence of cognitive, geographical, and collaborative proximity on knowledge production of Canadian nanotechnology [0.1529342790344802]
Knowledge production through research and invention is the key to scientific and technological development. Canada is reported as one of the major players in producing nanotechnology research.
arXiv Detail & Related papers (2021-06-03T20:07:08Z)
Alpha Discovery Neural Network based on Prior Knowledge [55.65102700986668]
Genetic programming (GP) is the state-of-the-art in financial automated feature construction task. This paper proposes Alpha Discovery Neural Network (ADNN), a tailored neural network structure which can automatically construct diversified financial technical indicators.
arXiv Detail & Related papers (2019-12-26T03:10:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.