Related papers: A Pipeline for Analysing Grant Applications

A Pipeline for Analysing Grant Applications

URL: http://arxiv.org/abs/2210.16843v1
Date: Sun, 30 Oct 2022 13:43:53 GMT
Title: A Pipeline for Analysing Grant Applications
Authors: Shuaiqun Pan, Sergio J. Rodr\'iguez M\'endez, Kerry Taylor
Abstract summary: This paper investigates whether grant schemes successfully identifies innovative project proposals, as intended. Grant applications are peer-reviewed research proposals that include specific innovation and creativity'' (IC) scores assigned by reviewers. We propose a model with the best performance, a Random Forest (RF) classifier over documents encoded with features.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data mining techniques can transform massive amounts of unstructured data into quantitative data that quickly reveal insights, trends, and patterns behind the original data. In this paper, a data mining model is applied to analyse the 2019 grant applications submitted to an Australian Government research funding agency to investigate whether grant schemes successfully identifies innovative project proposals, as intended. The grant applications are peer-reviewed research proposals that include specific ``innovation and creativity'' (IC) scores assigned by reviewers. In addition to predicting the IC score for each research proposal, we are particularly interested in understanding the vocabulary of innovative proposals. In order to solve this problem, various data mining models and feature encoding algorithms are studied and explored. As a result, we propose a model with the best performance, a Random Forest (RF) classifier over documents encoded with features denoting the presence or absence of unigrams. In specific, the unigram terms are encoded by a modified Term Frequency - Inverse Document Frequency (TF-IDF) algorithm, which only implements the IDF part of TF-IDF. Besides the proposed model, this paper also presents a rigorous experimental pipeline for analysing grant applications, and the experimental results prove its feasibility.

Related papers

Making Interpretable Discoveries from Unstructured Data: A High-Dimensional Multiple Hypothesis Testing Approach [2.162862183067095]
Social scientists are increasingly turning to unstructured datasets to unlock new empirical insights.<n>This paper proposes a general and flexible framework for pursuing discovery from unstructured data in a statistically principled way.<n>An open source Jupyter notebook is provided for researchers to implement the framework in their own projects.
arXiv Detail & Related papers (2025-11-03T15:42:32Z)
Anomaly Detection and Generation with Diffusion Models: A Survey [51.61574868316922]
Anomaly detection (AD) plays a pivotal role across diverse domains, including cybersecurity, finance, healthcare, and industrial manufacturing.<n>Recent advancements in deep learning, specifically diffusion models (DMs), have sparked significant interest.<n>This survey aims to guide researchers and practitioners in leveraging DMs for innovative AD solutions across diverse applications.
arXiv Detail & Related papers (2025-06-11T03:29:18Z)
Harnessing Large Language Models for Scientific Novelty Detection [49.10608128661251]
We propose to harness large language models (LLMs) for scientific novelty detection (ND)<n>To capture idea conception, we propose to train a lightweight retriever by distilling the idea-level knowledge from LLMs.<n> Experiments show our method consistently outperforms others on the proposed benchmark datasets for idea retrieval and ND tasks.
arXiv Detail & Related papers (2025-05-30T14:08:13Z)
Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored. We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches. We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z)
Thinking Racial Bias in Fair Forgery Detection: Models, Datasets and Evaluations [63.52709761339949]
We first contribute a dedicated dataset called the Fair Forgery Detection (FairFD) dataset, where we prove the racial bias of public state-of-the-art (SOTA) methods. We design novel metrics including Approach Averaged Metric and Utility Regularized Metric, which can avoid deceptive results. We also present an effective and robust post-processing technique, Bias Pruning with Fair Activations (BPFA), which improves fairness without requiring retraining or weight updates.
arXiv Detail & Related papers (2024-07-19T14:53:18Z)
A Survey on Data Selection for Language Models [148.300726396877]
Data selection methods aim to determine which data points to include in a training dataset. Deep learning is mostly driven by empirical evidence and experimentation on large-scale data is expensive. Few organizations have the resources for extensive data selection research.
arXiv Detail & Related papers (2024-02-26T18:54:35Z)
A Unified Framework for Generative Data Augmentation: A Comprehensive Survey [0.0]
Generative data augmentation (GDA) has emerged as a promising technique to alleviate data scarcity in machine learning applications. This thesis presents a comprehensive survey and unified framework of the GDA landscape.
arXiv Detail & Related papers (2023-09-30T07:01:08Z)
Validation Diagnostics for SBI algorithms based on Normalizing Flows [55.41644538483948]
This work proposes easy to interpret validation diagnostics for multi-dimensional conditional (posterior) density estimators based on NF. It also offers theoretical guarantees based on results of local consistency. This work should help the design of better specified models or drive the development of novel SBI-algorithms.
arXiv Detail & Related papers (2022-11-17T15:48:06Z)
A Data-Centric AI Paradigm Based on Application-Driven Fine-grained Dataset Design [2.2223262422197907]
We propose a novel paradigm for fine-grained design of datasets, driven by industrial applications. We flexibly select positive and negative sample sets according to the essential features of the data and application requirements. Compared with the traditional data design methods, our method achieves better results and effectively reduces false alarm.
arXiv Detail & Related papers (2022-09-20T03:56:53Z)
Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature. We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z)
Novel Applications for VAE-based Anomaly Detection Systems [5.065947993017157]
Deep generative modeling (DGM) can create novel and unseen data, starting from a given data set. As the technology shows promising applications, many ethical issues also arise. Research indicates different biases affect deep learning models, leading to social issues such as misrepresentation.
arXiv Detail & Related papers (2022-04-26T20:30:37Z)
A survey on Variational Autoencoders from a GreenAI perspective [0.0]
Variational AutoEncoders (VAEs) are powerful generative models that merge elements from statistics and information theory with the flexibility offered by deep neural networks. This article provides a comparative evaluation of some of the most successful, recent variations of VAEs.
arXiv Detail & Related papers (2021-03-01T15:26:39Z)
Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings. We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data. We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.