Estimating a new panel MSK dataset for comparative analyses of national
absorptive capacity systems, economic growth, and development in low and
middle income economies
- URL: http://arxiv.org/abs/2109.05529v1
- Date: Sun, 12 Sep 2021 14:48:07 GMT
- Title: Estimating a new panel MSK dataset for comparative analyses of national
absorptive capacity systems, economic growth, and development in low and
middle income economies
- Authors: Muhammad Salar Khan
- Abstract summary: Low- and middle-income countries (LMICs) are rarely part of any empirical discourse on growth, development, and innovation.
This work offers a new complete panel dataset with no missing values for LMICs eligible for IDA's support.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Within the national innovation system literature, empirical analyses are
severely lacking for developing economies. Particularly, the low- and
middle-income countries (LMICs) eligible for the World Bank's International
Development Association (IDA) support, are rarely part of any empirical
discourse on growth, development, and innovation. One major issue hindering
panel analyses in LMICs, and thus them being subject to any empirical
discussion, is the lack of complete data availability. This work offers a new
complete panel dataset with no missing values for LMICs eligible for IDA's
support. I use a standard, widely respected multiple imputation technique
(specifically, Predictive Mean Matching) developed by Rubin (1987). This
technique respects the structure of multivariate continuous panel data at the
country level. I employ this technique to create a large dataset consisting of
many variables drawn from publicly available established sources. These
variables, in turn, capture six crucial country-level capacities: technological
capacity, financial capacity, human capital capacity, infrastructural capacity,
public policy capacity, and social capacity. Such capacities are part and
parcel of the National Absorptive Capacity Systems (NACS). The dataset (MSK
dataset) thus produced contains data on 47 variables for 82 LMICs between 2005
and 2019. The dataset has passed a quality and reliability check and can thus
be used for comparative analyses of national absorptive capacities and
development, transition, and convergence analyses among LMICs.
Related papers
- Global Ease of Living Index: a machine learning framework for longitudinal analysis of major economies [0.196629787330046]
The drastic changes in the global economy, geopolitical conditions, and disruptions such as the COVID-19 pandemic have impacted the cost of living and quality of life.
We present an approach to quantifying the quality of life through the Global Ease of Living Index that combines various socio-economic and infrastructural factors into a single composite score.
arXiv Detail & Related papers (2025-02-08T02:37:17Z) - Bridging the Data Provenance Gap Across Text, Speech and Video [67.72097952282262]
We conduct the largest and first-of-its-kind longitudinal audit across modalities of popular text, speech, and video datasets.
Our manual analysis covers nearly 4000 public datasets between 1990-2024, spanning 608 languages, 798 sources, 659 organizations, and 67 countries.
We find that multimodal machine learning applications have overwhelmingly turned to web-crawled, synthetic, and social media platforms, such as YouTube, for their training sets.
arXiv Detail & Related papers (2024-12-19T01:30:19Z) - Evaluating Language Models as Synthetic Data Generators [74.80905172696366]
AgoraBench is a benchmark that provides standardized settings and metrics to evaluate LMs' data generation abilities.
Through synthesizing 1.26 million training instances using 6 LMs and training 99 student models, we uncover key insights about LMs' data generation capabilities.
arXiv Detail & Related papers (2024-12-04T19:20:32Z) - A Novel Framework for Analyzing Structural Transformation in Data-Constrained Economies Using Bayesian Modeling and Machine Learning [0.0]
The shift from agrarian economies to more diversified industrial and service-based systems is a key driver of economic development.
In low- and middle-income countries (LMICs), data scarcity and unreliability hinder accurate assessments of this process.
This paper presents a novel statistical framework designed to address these challenges by integrating Bayesian hierarchical modeling, machine learning-based data imputation, and factor analysis.
arXiv Detail & Related papers (2024-09-25T08:39:41Z) - Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - GeoSEE: Regional Socio-Economic Estimation With a Large Language Model [17.31652821477571]
We present GeoSEE, a method that can estimate various socio-economic indicators using a unified pipeline powered by a large language model (LLM)
The system then computes target indicators via in-context learning after aggregating results from selected modules in the format of natural language-based texts.
Our method outperforms other predictive models in both unsupervised and low-shot contexts.
arXiv Detail & Related papers (2024-06-14T07:50:22Z) - A Big Data Approach to Understand Sub-national Determinants of FDI in Africa [0.0]
This paper proposes a novel methodology, based on text mining and social network analysis, to quantify regional-level (sub-national) attributes affecting FDI ownership in African companies.
Findings suggest that regional (sub-national) structural and institutional characteristics can play an important role in determining foreign ownership.
arXiv Detail & Related papers (2024-03-15T12:12:54Z) - Social Intelligence Data Infrastructure: Structuring the Present and Navigating the Future [59.78608958395464]
We build a Social AI Data Infrastructure, which consists of a comprehensive social AI taxonomy and a data library of 480 NLP datasets.
Our infrastructure allows us to analyze existing dataset efforts, and also evaluate language models' performance in different social intelligence aspects.
We show there is a need for multifaceted datasets, increased diversity in language and culture, more long-tailed social situations, and more interactive data in future social intelligence data efforts.
arXiv Detail & Related papers (2024-02-28T00:22:42Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Training Machine Learning Models to Characterize Temporal Evolution of
Disadvantaged Communities [2.1242970730855126]
The Justice40 initiative of the Department of Energy (DOE), USA, identifies census tracts across the USA to determine where climate and energy investments are or are not accruing.
The DAC status not only helps in determining the eligibility for future Justice40-related investments but is also critical for exploring ways to achieve equitable distribution of resources.
In this paper, machine learning (ML) models are trained on publicly available census data from recent years to classify the DAC status at the census tracts level and then the trained model is used to classify DAC status for historical years.
arXiv Detail & Related papers (2023-03-07T06:33:40Z) - Jalisco's multiclass land cover analysis and classification using a
novel lightweight convnet with real-world multispectral and relief data [51.715517570634994]
We present our novel lightweight (only 89k parameters) Convolution Neural Network (ConvNet) to make LC classification and analysis.
In this work, we combine three real-world open data sources to obtain 13 channels.
Our embedded analysis anticipates the limited performance in some classes and gives us the opportunity to group the most similar.
arXiv Detail & Related papers (2022-01-26T14:58:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.