A Pipeline for Analysing Grant Applications
- URL: http://arxiv.org/abs/2210.16843v1
- Date: Sun, 30 Oct 2022 13:43:53 GMT
- Title: A Pipeline for Analysing Grant Applications
- Authors: Shuaiqun Pan, Sergio J. Rodr\'iguez M\'endez, Kerry Taylor
- Abstract summary: This paper investigates whether grant schemes successfully identifies innovative project proposals, as intended.
Grant applications are peer-reviewed research proposals that include specific innovation and creativity'' (IC) scores assigned by reviewers.
We propose a model with the best performance, a Random Forest (RF) classifier over documents encoded with features.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data mining techniques can transform massive amounts of unstructured data
into quantitative data that quickly reveal insights, trends, and patterns
behind the original data. In this paper, a data mining model is applied to
analyse the 2019 grant applications submitted to an Australian Government
research funding agency to investigate whether grant schemes successfully
identifies innovative project proposals, as intended. The grant applications
are peer-reviewed research proposals that include specific ``innovation and
creativity'' (IC) scores assigned by reviewers. In addition to predicting the
IC score for each research proposal, we are particularly interested in
understanding the vocabulary of innovative proposals. In order to solve this
problem, various data mining models and feature encoding algorithms are studied
and explored. As a result, we propose a model with the best performance, a
Random Forest (RF) classifier over documents encoded with features denoting the
presence or absence of unigrams. In specific, the unigram terms are encoded by
a modified Term Frequency - Inverse Document Frequency (TF-IDF) algorithm,
which only implements the IDF part of TF-IDF. Besides the proposed model, this
paper also presents a rigorous experimental pipeline for analysing grant
applications, and the experimental results prove its feasibility.
Related papers
- A Survey on Data Selection for Language Models [151.6210632830082]
Data selection methods aim to determine which data points to include in a training dataset.
Deep learning is mostly driven by empirical evidence and experimentation on large-scale data is expensive.
Few organizations have the resources for extensive data selection research.
arXiv Detail & Related papers (2024-02-26T18:54:35Z) - A Unified Framework for Generative Data Augmentation: A Comprehensive Survey [0.0]
Generative data augmentation (GDA) has emerged as a promising technique to alleviate data scarcity in machine learning applications.
This thesis presents a comprehensive survey and unified framework of the GDA landscape.
arXiv Detail & Related papers (2023-09-30T07:01:08Z) - WiCE: Real-World Entailment for Claims in Wikipedia [63.234352061821625]
We propose WiCE, a new fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia.
In addition to standard claim-level entailment, WiCE provides entailment judgments over sub-sentence units of the claim.
We show that real claims in our dataset involve challenging verification and retrieval problems that existing models fail to address.
arXiv Detail & Related papers (2023-03-02T17:45:32Z) - Validation Diagnostics for SBI algorithms based on Normalizing Flows [55.41644538483948]
This work proposes easy to interpret validation diagnostics for multi-dimensional conditional (posterior) density estimators based on NF.
It also offers theoretical guarantees based on results of local consistency.
This work should help the design of better specified models or drive the development of novel SBI-algorithms.
arXiv Detail & Related papers (2022-11-17T15:48:06Z) - A Data-Centric AI Paradigm Based on Application-Driven Fine-grained
Dataset Design [2.2223262422197907]
We propose a novel paradigm for fine-grained design of datasets, driven by industrial applications.
We flexibly select positive and negative sample sets according to the essential features of the data and application requirements.
Compared with the traditional data design methods, our method achieves better results and effectively reduces false alarm.
arXiv Detail & Related papers (2022-09-20T03:56:53Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - Novel Applications for VAE-based Anomaly Detection Systems [5.065947993017157]
Deep generative modeling (DGM) can create novel and unseen data, starting from a given data set.
As the technology shows promising applications, many ethical issues also arise.
Research indicates different biases affect deep learning models, leading to social issues such as misrepresentation.
arXiv Detail & Related papers (2022-04-26T20:30:37Z) - Deep Learning Schema-based Event Extraction: Literature Review and
Current Trends [60.29289298349322]
Event extraction technology based on deep learning has become a research hotspot.
This paper fills the gap by reviewing the state-of-the-art approaches, focusing on deep learning-based models.
arXiv Detail & Related papers (2021-07-05T16:32:45Z) - A survey on Variational Autoencoders from a GreenAI perspective [0.0]
Variational AutoEncoders (VAEs) are powerful generative models that merge elements from statistics and information theory with the flexibility offered by deep neural networks.
This article provides a comparative evaluation of some of the most successful, recent variations of VAEs.
arXiv Detail & Related papers (2021-03-01T15:26:39Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - PermuteAttack: Counterfactual Explanation of Machine Learning Credit
Scorecards [0.0]
This paper is a note on new directions and methodologies for validation and explanation of Machine Learning (ML) models employed for retail credit scoring in finance.
Our proposed framework draws motivation from the field of Artificial Intelligence (AI) security and adversarial ML.
arXiv Detail & Related papers (2020-08-24T00:05:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.