Related papers: From Limited Data to Rare-event Prediction: LLM-powered Feature Engineering and Multi-model Learning in Venture Capital

From Limited Data to Rare-event Prediction: LLM-powered Feature Engineering and Multi-model Learning in Venture Capital

URL: http://arxiv.org/abs/2509.08140v1
Date: Tue, 09 Sep 2025 20:46:54 GMT
Title: From Limited Data to Rare-event Prediction: LLM-powered Feature Engineering and Multi-model Learning in Venture Capital
Authors: Mihir Kumar, Aaron Ontoyin Yin, Zakari Salifu, Kelvin Amoaba, Afriyie Kwesi Samuel, Fuat Alican, Yigit Ihlamur,
Abstract summary: This paper presents a framework for predicting rare, high-impact outcomes by integrating large language models (LLMs) with a multi-model machine learning (ML) architecture.<n>We use LLM-powered feature engineering to extract and synthesize complex signals from unstructured data.<n>We apply this framework to the domain of Venture Capital (VC), where investors must evaluate startups with limited and noisy early-stage data.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents a framework for predicting rare, high-impact outcomes by integrating large language models (LLMs) with a multi-model machine learning (ML) architecture. The approach combines the predictive strength of black-box models with the interpretability required for reliable decision-making. We use LLM-powered feature engineering to extract and synthesize complex signals from unstructured data, which are then processed within a layered ensemble of models including XGBoost, Random Forest, and Linear Regression. The ensemble first produces a continuous estimate of success likelihood, which is then thresholded to produce a binary rare-event prediction. We apply this framework to the domain of Venture Capital (VC), where investors must evaluate startups with limited and noisy early-stage data. The empirical results show strong performance: the model achieves precision between 9.8X and 11.1X the random classifier baseline in three independent test subsets. Feature sensitivity analysis further reveals interpretable success drivers: the startup's category list accounts for 15.6% of predictive influence, followed by the number of founders, while education level and domain expertise contribute smaller yet consistent effects.

Related papers

STAR : Bridging Statistical and Agentic Reasoning for Large Model Performance Prediction [78.0692157478247]
We propose STAR, a framework that bridges data-driven STatistical expectations with knowledge-driven Agentic Reasoning.<n>We show that STAR consistently outperforms all baselines on both score-based and rank-based metrics.
arXiv Detail & Related papers (2026-02-12T16:30:07Z)
Linear Complexity Self-Supervised Learning for Music Understanding with Random Quantizer [0.0]
This paper focuses on the reduction of a foundation's model size when applied to music information retrieval (MIR) tasks.<n>Our research combines the Branchformer architecture with SummaryMixing, which were first applied in speech recognition, along with a random quantization process.<n>Our results show that our architecture achieves competitive performance when compared with other state-of-the-art models.
arXiv Detail & Related papers (2026-01-14T16:23:31Z)
Uncertainty-Aware Collaborative System of Large and Small Models for Multimodal Sentiment Analysis [17.98292973608615]
We propose a novel Uncertainty-Aware Collaborative System (U-ACS) that orchestrates a powerful MLLM and a lightweight baseline model for multimodal sentiment analysis.<n>Our proposed method achieves state-of-the-art performance, while requiring only a fraction of the computational resources compared to using a standalone MLLM.
arXiv Detail & Related papers (2025-08-27T16:01:58Z)
Large Language Models as Universal Predictors? An Empirical Study on Small Tabular Datasets [0.0]
Large Language Models (LLMs) can perform predictive tasks over structured inputs without explicit fine-tuning on downstream tasks.<n>We investigate the empirical function approximation capability of LLMs on small-scale structured datasets for classification, regression and clustering tasks.<n>Our findings suggest that LLMs can serve as general-purpose predictive engines for structured data, with clear strengths in classification and significant limitations in regression and clustering.
arXiv Detail & Related papers (2025-08-24T15:00:51Z)
SPaRFT: Self-Paced Reinforcement Fine-Tuning for Large Language Models [51.74498855100541]
Large language models (LLMs) have shown strong reasoning capabilities when fine-tuned with reinforcement learning (RL)<n>We propose textbfSPaRFT, a self-paced learning framework that enables efficient learning based on the capability of the model being trained.
arXiv Detail & Related papers (2025-08-07T03:50:48Z)
Truth in the Few: High-Value Data Selection for Efficient Multi-Modal Reasoning [71.3533541927459]
We propose a novel data selection paradigm termed Activation Reasoning Potential (RAP)<n>RAP identifies cognitive samples by estimating each sample's potential to stimulate genuine multi-modal reasoning.<n>Our RAP method consistently achieves superior performance using only 9.3% of the training data, while reducing computational costs by over 43%.
arXiv Detail & Related papers (2025-06-05T08:40:24Z)
Embedding Domain-Specific Knowledge from LLMs into the Feature Engineering Pipeline [0.0]
We propose using Large Language Models (LLMs) as an initial feature construction step to add knowledge to the dataset.<n>Our results show that the evolution can converge faster, saving us computational resources.
arXiv Detail & Related papers (2025-03-27T04:48:58Z)
Comprehensive Benchmarking of Machine Learning Methods for Risk Prediction Modelling from Large-Scale Survival Data: A UK Biobank Study [0.0]
Large-scale prospective cohort studies and a diverse toolkit of available machine learning (ML) algorithms have facilitated such survival task efforts.<n>We sought to benchmark eight distinct survival task implementations, ranging from linear to deep learning (DL) models.<n>We assessed how well different architectures scale with sample sizes ranging from n = 5,000 to n = 250,000 individuals.
arXiv Detail & Related papers (2025-03-11T20:27:20Z)
Beyond Scaling: Measuring and Predicting the Upper Bound of Knowledge Retention in Language Model Pre-Training [51.41246396610475]
This paper aims to predict performance in closed-book question answering (QA) without the help of external tools.<n>We conduct large-scale retrieval and semantic analysis across the pre-training corpora of 21 publicly available and 3 custom-trained large language models.<n>Building on these foundations, we propose Size-dependent Mutual Information (SMI), an information-theoretic metric that linearly correlates pre-training data characteristics.
arXiv Detail & Related papers (2025-02-06T13:23:53Z)
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale [66.73529246309033]
multimodal large language models (MLLMs) have shown significant potential in a broad range of multimodal tasks.<n>Existing instruction-tuning datasets only provide phrase-level answers without any intermediate rationales.<n>We introduce a scalable and cost-effective method to construct a large-scale multimodal instruction-tuning dataset with rich intermediate rationales.
arXiv Detail & Related papers (2024-12-06T18:14:24Z)
Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.<n>We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.<n>We propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark.
arXiv Detail & Related papers (2024-10-24T17:56:08Z)
Quantifying Prediction Consistency Under Fine-Tuning Multiplicity in Tabular LLMs [10.494477811252034]
Fine-tuning multiplicity can arise in Tabular LLMs on classification tasks.<n>Our work formalizes this unique challenge of fine-tuning multiplicity in Tabular LLMs.<n>We propose a novel measure to quantify consistency of individual predictions without expensive model retraining.
arXiv Detail & Related papers (2024-07-04T22:22:09Z)
SSFF: Investigating LLM Predictive Capabilities for Startup Success through a Multi-Agent Framework with Enhanced Explainability and Performance [0.16385815610837165]
The Startup Success Forecasting Framework is an autonomous system that emulates the reasoning of venture capital analysts.<n>By leveraging founder segmentation, startups led by L5 founders are 3.79 times more likely to succeed than those led by L1 founders.<n>Our framework significantly enhances prediction accuracy, yielding a 108.3 percent relative improvement over GPT 4o mini and a 30.8 percent relative improvement over GPT 4o.
arXiv Detail & Related papers (2024-05-29T19:07:42Z)
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours. We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length. This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z)
AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering. The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch. The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level. The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.