A Pipeline for Analysing Grant Applications
- URL: http://arxiv.org/abs/2210.16843v1
- Date: Sun, 30 Oct 2022 13:43:53 GMT
- Title: A Pipeline for Analysing Grant Applications
- Authors: Shuaiqun Pan, Sergio J. Rodr\'iguez M\'endez, Kerry Taylor
- Abstract summary: This paper investigates whether grant schemes successfully identifies innovative project proposals, as intended.
Grant applications are peer-reviewed research proposals that include specific innovation and creativity'' (IC) scores assigned by reviewers.
We propose a model with the best performance, a Random Forest (RF) classifier over documents encoded with features.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data mining techniques can transform massive amounts of unstructured data
into quantitative data that quickly reveal insights, trends, and patterns
behind the original data. In this paper, a data mining model is applied to
analyse the 2019 grant applications submitted to an Australian Government
research funding agency to investigate whether grant schemes successfully
identifies innovative project proposals, as intended. The grant applications
are peer-reviewed research proposals that include specific ``innovation and
creativity'' (IC) scores assigned by reviewers. In addition to predicting the
IC score for each research proposal, we are particularly interested in
understanding the vocabulary of innovative proposals. In order to solve this
problem, various data mining models and feature encoding algorithms are studied
and explored. As a result, we propose a model with the best performance, a
Random Forest (RF) classifier over documents encoded with features denoting the
presence or absence of unigrams. In specific, the unigram terms are encoded by
a modified Term Frequency - Inverse Document Frequency (TF-IDF) algorithm,
which only implements the IDF part of TF-IDF. Besides the proposed model, this
paper also presents a rigorous experimental pipeline for analysing grant
applications, and the experimental results prove its feasibility.
Related papers
- Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored.
We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches.
We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z) - Thinking Racial Bias in Fair Forgery Detection: Models, Datasets and Evaluations [63.52709761339949]
We first contribute a dedicated dataset called the Fair Forgery Detection (FairFD) dataset, where we prove the racial bias of public state-of-the-art (SOTA) methods.
We design novel metrics including Approach Averaged Metric and Utility Regularized Metric, which can avoid deceptive results.
We also present an effective and robust post-processing technique, Bias Pruning with Fair Activations (BPFA), which improves fairness without requiring retraining or weight updates.
arXiv Detail & Related papers (2024-07-19T14:53:18Z) - A Survey on Data Selection for Language Models [148.300726396877]
Data selection methods aim to determine which data points to include in a training dataset.
Deep learning is mostly driven by empirical evidence and experimentation on large-scale data is expensive.
Few organizations have the resources for extensive data selection research.
arXiv Detail & Related papers (2024-02-26T18:54:35Z) - A Unified Framework for Generative Data Augmentation: A Comprehensive Survey [0.0]
Generative data augmentation (GDA) has emerged as a promising technique to alleviate data scarcity in machine learning applications.
This thesis presents a comprehensive survey and unified framework of the GDA landscape.
arXiv Detail & Related papers (2023-09-30T07:01:08Z) - Validation Diagnostics for SBI algorithms based on Normalizing Flows [55.41644538483948]
This work proposes easy to interpret validation diagnostics for multi-dimensional conditional (posterior) density estimators based on NF.
It also offers theoretical guarantees based on results of local consistency.
This work should help the design of better specified models or drive the development of novel SBI-algorithms.
arXiv Detail & Related papers (2022-11-17T15:48:06Z) - A Data-Centric AI Paradigm Based on Application-Driven Fine-grained
Dataset Design [2.2223262422197907]
We propose a novel paradigm for fine-grained design of datasets, driven by industrial applications.
We flexibly select positive and negative sample sets according to the essential features of the data and application requirements.
Compared with the traditional data design methods, our method achieves better results and effectively reduces false alarm.
arXiv Detail & Related papers (2022-09-20T03:56:53Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - Novel Applications for VAE-based Anomaly Detection Systems [5.065947993017157]
Deep generative modeling (DGM) can create novel and unseen data, starting from a given data set.
As the technology shows promising applications, many ethical issues also arise.
Research indicates different biases affect deep learning models, leading to social issues such as misrepresentation.
arXiv Detail & Related papers (2022-04-26T20:30:37Z) - A survey on Variational Autoencoders from a GreenAI perspective [0.0]
Variational AutoEncoders (VAEs) are powerful generative models that merge elements from statistics and information theory with the flexibility offered by deep neural networks.
This article provides a comparative evaluation of some of the most successful, recent variations of VAEs.
arXiv Detail & Related papers (2021-03-01T15:26:39Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.