Predicting Startup Success Using Large Language Models: A Novel In-Context Learning Approach
- URL: http://arxiv.org/abs/2601.16568v2
- Date: Tue, 27 Jan 2026 17:16:47 GMT
- Title: Predicting Startup Success Using Large Language Models: A Novel In-Context Learning Approach
- Authors: Abdurahman Maarouf, Alket Bakiaj, Stefan Feuerriegel,
- Abstract summary: In this paper, we propose an in-context learning framework for startup success prediction using large language models (LLMs)<n>Specifically, we propose a novel k-nearest-neighbor-based in-context learning framework, called kNN-ICL, which selects the most relevant past startups as examples based on similarity.<n>Using real-world profiles from Crunchbase, we find that the kNN-ICL approach achieves higher prediction accuracy than supervised machine learning baselines and vanilla in-context learning.
- Score: 32.510120225056944
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Venture capital (VC) investments in early-stage startups that end up being successful can yield high returns. However, predicting early-stage startup success remains challenging due to data scarcity (e.g., many VC firms have information about only a few dozen of early-stage startups and whether they were successful). This limits the effectiveness of traditional machine learning methods that rely on large labeled datasets for model training. To address this challenge, we propose an in-context learning framework for startup success prediction using large language models (LLMs) that requires no model training and leverages only a small set of labeled startups as demonstration examples. Specifically, we propose a novel k-nearest-neighbor-based in-context learning framework, called kNN-ICL, which selects the most relevant past startups as examples based on similarity. Using real-world profiles from Crunchbase, we find that the kNN-ICL approach achieves higher prediction accuracy than supervised machine learning baselines and vanilla in-context learning. Further, we study how performance varies with the number of in-context examples and find that a high balanced accuracy can be achieved with as few as 50 examples. Together, we demonstrate that in-context learning can serve as a decision-making tool for VC firms operating in data-scarce environments.
Related papers
- From Limited Data to Rare-event Prediction: LLM-powered Feature Engineering and Multi-model Learning in Venture Capital [0.0]
This paper presents a framework for predicting rare, high-impact outcomes by integrating large language models (LLMs) with a multi-model machine learning (ML) architecture.<n>We use LLM-powered feature engineering to extract and synthesize complex signals from unstructured data.<n>We apply this framework to the domain of Venture Capital (VC), where investors must evaluate startups with limited and noisy early-stage data.
arXiv Detail & Related papers (2025-09-09T20:46:54Z) - Cold-Start Active Preference Learning in Socio-Economic Domains [0.0]
The cold-start problem in active preference learning remains largely unexplored.<n>The proposed method initiates learning with a self-supervised phase that employs Principal Component Analysis (PCA) to generate initial pseudo-labels.<n> Experiments conducted on various socio-economic datasets, including those related to financial credibility, career success rate, and socio-economic status, consistently show that the PCA-driven approach outperforms standard active learning strategies.
arXiv Detail & Related papers (2025-08-07T07:18:50Z) - Learning What Matters: Prioritized Concept Learning via Relative Error-driven Sample Selection [38.35524024887503]
We propose PRioritized cOncept learninG via Relative Error-driven Sample Selection (PROGRESS)<n>PROGRESS is a data- and compute-efficient framework that enables vision-language models to dynamically select what to learn next.<n>We show that PROGRESS consistently outperforms state-of-the-art baselines with much less data and supervision.
arXiv Detail & Related papers (2025-06-01T17:05:35Z) - Policy Induction: Predicting Startup Success via Explainable Memory-Augmented In-Context Learning [0.0]
We propose a transparent and data-efficient investment decision framework powered by memory-augmented large language models.<n>We introduce a lightweight training process that combines few-shot learning with an in-context learning loop.<n>Our system predicts startup success far more accurately than existing benchmarks.
arXiv Detail & Related papers (2025-05-27T16:57:07Z) - Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.<n>We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.<n>We propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - Startup success prediction and VC portfolio simulation using CrunchBase
data [1.7897779505837144]
This paper focuses on startups at their Series B and Series C investment stages, aiming to predict key success milestones.
We introduce novel deep learning model for predicting startup success, integrating a variety of factors such as funding metrics, founder features, industry category.
Our work demonstrates the considerable promise of deep learning models and alternative unstructured data in predicting startup success.
arXiv Detail & Related papers (2023-09-27T10:22:37Z) - PILOT: A Pre-Trained Model-Based Continual Learning Toolbox [65.57123249246358]
This paper introduces a pre-trained model-based continual learning toolbox known as PILOT.<n>On the one hand, PILOT implements some state-of-the-art class-incremental learning algorithms based on pre-trained models, such as L2P, DualPrompt, and CODA-Prompt.<n>On the other hand, PILOT fits typical class-incremental learning algorithms within the context of pre-trained models to evaluate their effectiveness.
arXiv Detail & Related papers (2023-09-13T17:55:11Z) - RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models [57.12888828853409]
RAVEN is a model that combines retrieval-augmented masked language modeling and prefix language modeling.
Fusion-in-Context Learning enables the model to leverage more in-context examples without requiring additional training.
Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning.
arXiv Detail & Related papers (2023-08-15T17:59:18Z) - PRODIGY: Enabling In-context Learning Over Graphs [112.19056551153454]
In-context learning is the ability of a pretrained model to adapt to novel and diverse downstream tasks.
We develop PRODIGY, the first pretraining framework that enables in-context learning over graphs.
arXiv Detail & Related papers (2023-05-21T23:16:30Z) - Using Deep Learning to Find the Next Unicorn: A Practical Synthesis [42.70427723009158]
Venture Capital (VC) strives to identify and invest in unicorn startups during their early stages, hoping to gain a high return.
Over the past two decades, the industry has gone through a paradigm shift moving from conventional statistical approaches towards becoming machine-learning based.
In this work, we carry out a literature review and synthesis on DL-based approaches, covering the entire DL life cycle.
arXiv Detail & Related papers (2022-10-18T13:11:16Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Token-wise Curriculum Learning for Neural Machine Translation [94.93133801641707]
Existing curriculum learning approaches to Neural Machine Translation (NMT) require sufficient sampling amounts of "easy" samples from training data at the early training stage.
We propose a novel token-wise curriculum learning approach that creates sufficient amounts of easy samples.
Our approach can consistently outperform baselines on 5 language pairs, especially for low-resource languages.
arXiv Detail & Related papers (2021-03-20T03:57:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.