Generative AI for Research Data Processing: Lessons Learnt From Three Use Cases
- URL: http://arxiv.org/abs/2504.15829v1
- Date: Tue, 22 Apr 2025 12:21:07 GMT
- Title: Generative AI for Research Data Processing: Lessons Learnt From Three Use Cases
- Authors: Modhurita Mitra, Martine G. de Vos, Nicola Cortinovis, Dawa Ometto,
- Abstract summary: There are concerns about the accuracy and consistency of the outputs of generative AI.<n>We identify tasks for which rule-based or traditional machine learning approaches were difficult to apply, and then performed these tasks using generative AI.<n>We demonstrate the feasibility of using the generative AI model Claude 3 Opus in three research projects involving complex data processing tasks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: There has been enormous interest in generative AI since ChatGPT was launched in 2022. However, there are concerns about the accuracy and consistency of the outputs of generative AI. We have carried out an exploratory study on the application of this new technology in research data processing. We identified tasks for which rule-based or traditional machine learning approaches were difficult to apply, and then performed these tasks using generative AI. We demonstrate the feasibility of using the generative AI model Claude 3 Opus in three research projects involving complex data processing tasks: 1) Information extraction: We extract plant species names from historical seedlists (catalogues of seeds) published by botanical gardens. 2) Natural language understanding: We extract certain data points (name of drug, name of health indication, relative effectiveness, cost-effectiveness, etc.) from documents published by Health Technology Assessment organisations in the EU. 3) Text classification: We assign industry codes to projects on the crowdfunding website Kickstarter. We share the lessons we learnt from these use cases: How to determine if generative AI is an appropriate tool for a given data processing task, and if so, how to maximise the accuracy and consistency of the results obtained.
Related papers
- O1 Replication Journey: A Strategic Progress Report -- Part 1 [52.062216849476776]
This paper introduces a pioneering approach to artificial intelligence research, embodied in our O1 Replication Journey.
Our methodology addresses critical challenges in modern AI research, including the insularity of prolonged team-based projects.
We propose the journey learning paradigm, which encourages models to learn not just shortcuts, but the complete exploration process.
arXiv Detail & Related papers (2024-10-08T15:13:01Z) - Procedural Content Generation via Generative Artificial Intelligence [1.437446768735628]
generative artificial intelligence (AI) saw a significant increase in interest in the mid-2010s.
generative AI is effective for PCG, but building high-performance AI requires vast amounts of training data.
For PCG research to advance further, issues related to limited training data must be overcome.
arXiv Detail & Related papers (2024-07-12T06:03:38Z) - Genetic Auto-prompt Learning for Pre-trained Code Intelligence Language Models [54.58108387797138]
We investigate the effectiveness of prompt learning in code intelligence tasks.
Existing automatic prompt design methods are very limited to code intelligence tasks.
We propose Genetic Auto Prompt (GenAP) which utilizes an elaborate genetic algorithm to automatically design prompts.
arXiv Detail & Related papers (2024-03-20T13:37:00Z) - Psittacines of Innovation? Assessing the True Novelty of AI Creations [0.26107298043931204]
We task an AI with generating project titles for hypothetical crowdfunding campaigns.
We compare within AI-generated project titles, measuring repetition and complexity.
Results suggest that the AI generates unique content even under increasing task complexity.
arXiv Detail & Related papers (2024-03-17T13:08:11Z) - Artificial intelligence to automate the systematic review of scientific
literature [0.0]
We present a survey of AI techniques proposed in the last 15 years to help researchers conduct systematic analyses of scientific literature.
We describe the tasks currently supported, the types of algorithms applied, and available tools proposed in 34 primary studies.
arXiv Detail & Related papers (2024-01-13T19:12:49Z) - How Generative AI models such as ChatGPT can be (Mis)Used in SPC
Practice, Education, and Research? An Exploratory Study [2.0841728192954663]
Generative Artificial Intelligence (AI) models have the potential to revolutionize Statistical Process Control (SPC) practice, learning, and research.
These tools are in the early stages of development and can be easily misused or misunderstood.
We explore ChatGPT's ability to provide code, explain basic concepts, and create knowledge related to SPC practice, learning, and research.
arXiv Detail & Related papers (2023-02-17T15:48:37Z) - Deep Active Learning for Computer Vision: Past and Future [50.19394935978135]
Despite its indispensable role for developing AI models, research on active learning is not as intensive as other research directions.
By addressing data automation challenges and coping with automated machine learning systems, active learning will facilitate democratization of AI technologies.
arXiv Detail & Related papers (2022-11-27T13:07:14Z) - TegTok: Augmenting Text Generation via Task-specific and Open-world
Knowledge [83.55215993730326]
We propose augmenting TExt Generation via Task-specific and Open-world Knowledge (TegTok) in a unified framework.
Our model selects knowledge entries from two types of knowledge sources through dense retrieval and then injects them into the input encoding and output decoding stages respectively.
arXiv Detail & Related papers (2022-03-16T10:37:59Z) - Artificial Intelligence in Drug Discovery: Applications and Techniques [33.59138543942538]
Various AI techniques have been used in a wide range of applications, such as virtual screening and drug design.
We first give an overview on drug discovery and discuss related applications, which can be reduced to two major tasks.
We then discuss common data resources, molecule representations and benchmark platforms.
arXiv Detail & Related papers (2021-06-09T20:46:44Z) - Scaling Systematic Literature Reviews with Machine Learning Pipelines [57.82662094602138]
Systematic reviews entail the extraction of data from scientific documents.
We construct a pipeline that automates each of these aspects, and experiment with many human-time vs. system quality trade-offs.
We find that we can get surprising accuracy and generalisability of the whole pipeline system with only 2 weeks of human-expert annotation.
arXiv Detail & Related papers (2020-10-09T16:19:42Z) - Knowledge-Aware Procedural Text Understanding with Multi-Stage Training [110.93934567725826]
We focus on the task of procedural text understanding, which aims to comprehend such documents and track entities' states and locations during a process.
Two challenges, the difficulty of commonsense reasoning and data insufficiency, still remain unsolved.
We propose a novel KnOwledge-Aware proceduraL text understAnding (KOALA) model, which effectively leverages multiple forms of external knowledge.
arXiv Detail & Related papers (2020-09-28T10:28:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.