CrunchLLM: Multitask LLMs for Structured Business Reasoning and Outcome Prediction
- URL: http://arxiv.org/abs/2509.10698v2
- Date: Sat, 11 Oct 2025 21:54:51 GMT
- Title: CrunchLLM: Multitask LLMs for Structured Business Reasoning and Outcome Prediction
- Authors: Rabeya Tus Sadia, Qiang Cheng,
- Abstract summary: We present bfCrunchLLM, a domain-adapted LLM framework for startup success prediction.<n>Our approach achieves accuracy exceeding 80% on Crunchbase startup success prediction.<n>This work demonstrates how adapting LLMs with domain-aware fine-tuning and structured--unstructured data fusion can advance predictive modeling of entrepreneurial outcomes.
- Score: 2.124023760378586
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predicting the success of start-up companies, defined as achieving an exit through acquisition or IPO, is a critical problem in entrepreneurship and innovation research. Datasets such as Crunchbase provide both structured information (e.g., funding rounds, industries, investor networks) and unstructured text (e.g., company descriptions), but effectively leveraging this heterogeneous data for prediction remains challenging. Traditional machine learning approaches often rely only on structured features and achieve moderate accuracy, while large language models (LLMs) offer rich reasoning abilities but struggle to adapt directly to domain-specific business data. We present \textbf{CrunchLLM}, a domain-adapted LLM framework for startup success prediction. CrunchLLM integrates structured company attributes with unstructured textual narratives and applies parameter-efficient fine-tuning strategies alongside prompt optimization to specialize foundation models for entrepreneurship data. Our approach achieves accuracy exceeding 80\% on Crunchbase startup success prediction, significantly outperforming traditional classifiers and baseline LLMs. Beyond predictive performance, CrunchLLM provides interpretable reasoning traces that justify its predictions, enhancing transparency and trustworthiness for financial and policy decision makers. This work demonstrates how adapting LLMs with domain-aware fine-tuning and structured--unstructured data fusion can advance predictive modeling of entrepreneurial outcomes. CrunchLLM contributes a methodological framework and a practical tool for data-driven decision making in venture capital and innovation policy.
Related papers
- From Parameters to Performance: A Data-Driven Study on LLM Structure and Development [73.67759647072519]
Large language models (LLMs) have achieved remarkable success across various domains.<n>Despite the rapid growth in model scale and capability, systematic, data-driven research on how structural configurations affect performance remains scarce.<n>We present a large-scale dataset encompassing diverse open-source LLM structures and their performance across multiple benchmarks.
arXiv Detail & Related papers (2025-09-14T12:20:39Z) - Beyond Naïve Prompting: Strategies for Improved Zero-shot Context-aided Forecasting with LLMs [57.82819770709032]
Large language models (LLMs) can be effective context-aided forecasters via na"ive direct prompting.<n>ReDP improves interpretability by eliciting explicit reasoning traces, allowing us to assess the model's reasoning over the context.<n>CorDP leverages LLMs solely to refine existing forecasts with context, enhancing their applicability in real-world forecasting pipelines.<n> IC-DP proposes embedding historical examples of context-aided forecasting tasks in the prompt, substantially improving accuracy even for the largest models.
arXiv Detail & Related papers (2025-08-13T16:02:55Z) - Policy Induction: Predicting Startup Success via Explainable Memory-Augmented In-Context Learning [0.0]
We propose a transparent and data-efficient investment decision framework powered by memory-augmented large language models.<n>We introduce a lightweight training process that combines few-shot learning with an in-context learning loop.<n>Our system predicts startup success far more accurately than existing benchmarks.
arXiv Detail & Related papers (2025-05-27T16:57:07Z) - A Survey on Post-training of Large Language Models [185.51013463503946]
Large Language Models (LLMs) have fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration.<n>These challenges necessitate advanced post-training language models (PoLMs) to address shortcomings, such as restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific performance.<n>This paper presents the first comprehensive survey of PoLMs, systematically tracing their evolution across five core paradigms: Fine-tuning, which enhances task-specific accuracy; Alignment, which ensures ethical coherence and alignment with human preferences; Reasoning, which advances multi-step inference despite challenges in reward design; Integration and Adaptation, which
arXiv Detail & Related papers (2025-03-08T05:41:42Z) - Demystifying Domain-adaptive Post-training for Financial LLMs [79.581577578952]
FINDAP is a systematic and fine-grained investigation into domain adaptive post-training of large language models (LLMs)<n>Our approach consists of four key components: FinCap, FinRec, FinTrain and FinEval.<n>The resulting model, Llama-Fin, achieves state-of-the-art performance across a wide range of financial tasks.
arXiv Detail & Related papers (2025-01-09T04:26:15Z) - Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.<n>We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.<n>We propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - Automating Venture Capital: Founder assessment using LLM-powered segmentation, feature engineering and automated labeling techniques [0.0]
This study explores the application of large language models (LLMs) in venture capital (VC) decision-making.
We utilize LLM prompting techniques, like chain-of-thought, to generate features from limited data, then extract insights through statistics and machine learning.
Our results reveal potential relationships between certain founder characteristics and success, as well as demonstrate the effectiveness of these characteristics in prediction.
arXiv Detail & Related papers (2024-07-05T22:54:13Z) - Hybrid LLM/Rule-based Approaches to Business Insights Generation from Structured Data [0.0]
The ability to extract actionable insights from vast and varied datasets is essential for informed decision-making and maintaining a competitive edge.
Traditional rule-based systems, while reliable, often fall short when faced with the complexity and dynamism of modern business data.
This paper explores the efficacy of hybrid approaches that integrate the robustness of rule-based systems with the adaptive power of Large Language Models.
arXiv Detail & Related papers (2024-04-24T02:42:24Z) - FinGPT: Instruction Tuning Benchmark for Open-Source Large Language
Models in Financial Datasets [9.714447724811842]
This paper introduces a distinctive approach anchored in the Instruction Tuning paradigm for open-source large language models.
We capitalize on the interoperability of open-source models, ensuring a seamless and transparent integration.
The paper presents a benchmarking scheme designed for end-to-end training and testing, employing a cost-effective progression.
arXiv Detail & Related papers (2023-10-07T12:52:58Z) - Startup success prediction and VC portfolio simulation using CrunchBase
data [1.7897779505837144]
This paper focuses on startups at their Series B and Series C investment stages, aiming to predict key success milestones.
We introduce novel deep learning model for predicting startup success, integrating a variety of factors such as funding metrics, founder features, industry category.
Our work demonstrates the considerable promise of deep learning models and alternative unstructured data in predicting startup success.
arXiv Detail & Related papers (2023-09-27T10:22:37Z) - Improving Open Information Extraction with Large Language Models: A
Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text.
Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.