Using Artificial Intuition in Distinct, Minimalist Classification of Scientific Abstracts for Management of Technology Portfolios
- URL: http://arxiv.org/abs/2508.13182v2
- Date: Fri, 05 Sep 2025 23:50:47 GMT
- Title: Using Artificial Intuition in Distinct, Minimalist Classification of Scientific Abstracts for Management of Technology Portfolios
- Authors: Prateek Ranka, Fred Morstatter, Alexandra Graddy-Reed, Andrea Belz,
- Abstract summary: We describe an application of a process we call artificial intuition to replicate the expert's approach to generate metadata.<n>We use publicly available abstracts from the United States National Science Foundation to create a set of labels.<n>We demonstrate the feasibility of this method for research portfolio management, technology scouting, and other strategic activities.
- Score: 47.637146980518445
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Classification of scientific abstracts is useful for strategic activities but challenging to automate because the sparse text provides few contextual clues. Metadata associated with the scientific publication can be used to improve performance but still often requires a semi-supervised setting. Moreover, such schemes may generate labels that lack distinction -- namely, they overlap and thus do not uniquely define the abstract. In contrast, experts label and sort these texts with ease. Here we describe an application of a process we call artificial intuition to replicate the expert's approach, using a Large Language Model (LLM) to generate metadata. We use publicly available abstracts from the United States National Science Foundation to create a set of labels, and then we test this on a set of abstracts from the Chinese National Natural Science Foundation to examine funding trends. We demonstrate the feasibility of this method for research portfolio management, technology scouting, and other strategic activities.
Related papers
- In-Context Watermarks for Large Language Models [71.29952527565749]
In-Context Watermarking (ICW) embeds watermarks into generated text solely through prompt engineering.<n>We investigate four ICW strategies at different levels of granularity, each paired with a tailored detection method.<n>Our experiments validate the feasibility of ICW as a model-agnostic, practical watermarking approach.
arXiv Detail & Related papers (2025-05-22T17:24:51Z) - Science Hierarchography: Hierarchical Organization of Science Literature [20.182213614072836]
We motivate SCIENCE HIERARCHOGRAPHY, the goal of organizing scientific literature into a high-quality hierarchical structure.<n>We develop a hybrid approach that combines efficient embedding-based clustering with LLM-based prompting.<n>Results show that our method improves interpretability and offers an alternative pathway for exploring scientific literature.
arXiv Detail & Related papers (2025-04-18T17:59:29Z) - Can AI Extract Antecedent Factors of Human Trust in AI? An Application of Information Extraction for Scientific Literature in Behavioural and Computer Sciences [9.563656421424728]
Trust in AI is where factors contributing to human trust in AI applications are studied.<n>With the input of domain experts, we create the first annotated English dataset in this domain.<n>We benchmark it with state-of-the-art methods using large language models in named entity and relation extraction.<n>Our results indicate that this problem requires supervised learning which may not be currently feasible with prompt-based LLMs.
arXiv Detail & Related papers (2024-12-16T00:02:38Z) - Artificial Intuition: Efficient Classification of Scientific Abstracts [42.299140272218274]
Short scientific texts efficiently transmit dense information to experts possessing a rich body of knowledge to aid interpretation.
To address this gap, we have developed a novel approach to generate and appropriately assign coarse domain-specific labels.
We show that a Large Language Model (LLM) can provide metadata essential to the task, in a process akin to the augmentation of supplemental knowledge.
arXiv Detail & Related papers (2024-07-08T16:34:47Z) - Learning with Language-Guided State Abstractions [58.199148890064826]
Generalizable policy learning in high-dimensional observation spaces is facilitated by well-designed state representations.
Our method, LGA, uses a combination of natural language supervision and background knowledge from language models to automatically build state representations tailored to unseen tasks.
Experiments on simulated robotic tasks show that LGA yields state abstractions similar to those designed by humans, but in a fraction of the time.
arXiv Detail & Related papers (2024-02-28T23:57:04Z) - AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph [62.685920585838616]
abstraction ability is essential in human intelligence, which remains under-explored in language models.
We present AbsPyramid, a unified entailment graph of 221K textual descriptions of abstraction knowledge.
arXiv Detail & Related papers (2023-11-15T18:11:23Z) - Automated Annotation of Scientific Texts for ML-based Keyphrase
Extraction and Validation [0.0]
We present two novel automated text labeling approaches for the validation of ML-generated metadata for unlabeled texts.
Our techniques show the potential of two new ways to leverage existing information about the unlabeled texts and the scientific domain.
arXiv Detail & Related papers (2023-11-08T22:09:31Z) - Generating Knowledge Graphs by Employing Natural Language Processing and
Machine Learning Techniques within the Scholarly Domain [1.9004296236396943]
We present a new architecture that takes advantage of Natural Language Processing and Machine Learning methods for extracting entities and relationships from research publications.
Within this research work, we i) tackle the challenge of knowledge extraction by employing several state-of-the-art Natural Language Processing and Text Mining tools.
We generated a scientific knowledge graph including 109,105 triples, extracted from 26,827 abstracts of papers within the Semantic Web domain.
arXiv Detail & Related papers (2020-10-28T08:31:40Z) - Text Recognition in Real Scenarios with a Few Labeled Samples [55.07859517380136]
Scene text recognition (STR) is still a hot research topic in computer vision field.
This paper proposes a few-shot adversarial sequence domain adaptation (FASDA) approach to build sequence adaptation.
Our approach can maximize the character-level confusion between the source domain and the target domain.
arXiv Detail & Related papers (2020-06-22T13:03:01Z) - Knowledge Elicitation using Deep Metric Learning and Psychometric
Testing [15.989397781243225]
We provide a method for efficient hierarchical knowledge elicitation from experts working with high-dimensional data such as images or videos.
The developed models embed the high-dimensional data in a metric space where distances are semantically meaningful, and the data can be organized in a hierarchical structure.
arXiv Detail & Related papers (2020-04-14T08:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.