Detecting High-Potential SMEs with Heterogeneous Graph Neural Networks
- URL: http://arxiv.org/abs/2602.19591v2
- Date: Fri, 27 Feb 2026 06:35:57 GMT
- Title: Detecting High-Potential SMEs with Heterogeneous Graph Neural Networks
- Authors: Yijiashun Qi, Hanzhe Guo, Yijiazhen Qi,
- Abstract summary: Small and Medium Enterprises (SMEs) constitute 99.9% of U.S. businesses and generate 44% of economic activity.<n>We introduce SME-HGT, a Heterogeneous Graph Transformer framework that predicts which Phase I awardees will advance to Phase II funding using exclusively public data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Small and Medium Enterprises (SMEs) constitute 99.9% of U.S. businesses and generate 44% of economic activity, yet systematically identifying high-potential SMEs remains an open challenge. We introduce SME-HGT, a Heterogeneous Graph Transformer framework that predicts which SBIR Phase I awardees will advance to Phase II funding using exclusively public data. We construct a heterogeneous graph with 32,268 company nodes, 124 research topic nodes, and 13 government agency nodes connected by approximately 99,000 edges across three semantic relation types. SME-HGT achieves an AUPRC of 0.621 0.003 on a temporally-split test set, outperforming an MLP baseline (0.590 0.002) and R-GCN (0.608 0.013) across five random seeds. At a screening depth of 100 companies, SME-HGT attains 89.6% precision with a 2.14 lift over random selection. Our temporal evaluation protocol prevents information leakage, and our reliance on public data ensures reproducibility. These results demonstrate that relational structure among firms, research topics, and funding agencies provides meaningful signal for SME potential assessment, with implications for policymakers and early-stage investors.
Related papers
- Coverage-Aware Web Crawling for Domain-Specific Supplier Discovery via a Web--Knowledge--Web Pipeline [0.0]
Existing business databases suffer from substantial coverage gaps.<n>We propose a textbfWeb--Knowledge--Web (W$to$K$to$W) pipeline.<n>It crawls domain-specific web sources to discover candidate supplier entities.<n>It consolidates structured knowledge into a heterogeneous knowledge graph.
arXiv Detail & Related papers (2026-02-27T18:31:42Z) - Soft Clustering Anchors for Self-Supervised Speech Representation Learning in Joint Embedding Prediction Architectures [45.74430728311433]
Joint Embedding Predictive Architectures (JEPA) offer a promising approach to self-supervised speech representation learning, but suffer from representation collapse without explicit grounding.<n>We propose GMM-Anchored JEPA, which fits a Gaussian Mixture Model once on log-mel spectrograms and uses its frozen soft posteriors as auxiliary targets throughout training.<n>On 50k hours of speech, GMM anchoring improves ASR (28.68% vs. 33.22% WER), emotion recognition (67.76% vs. 65.46%), and slot filling (64.7% vs. 59.1% F1) compared to a WavLM-style
arXiv Detail & Related papers (2026-01-30T20:51:37Z) - FUGC: Benchmarking Semi-Supervised Learning Methods for Cervical Segmentation [63.7829089874007]
This paper introduces the Fetal Ultrasound Grand Challenge (FUGC), the first benchmark for semi-supervised learning in cervical segmentation.<n>FUGC provides a dataset of 890 TVS images, including 500 training images, 90 validation images, and 300 test images.<n> Methods were evaluated using the Dice Similarity Coefficient (DSC), Hausdorff Distance (HD), and runtime (RT), with a weighted combination of 0.4/0.4/0.2.
arXiv Detail & Related papers (2026-01-22T01:34:39Z) - SmallML: Bayesian Transfer Learning for Small-Data Predictive Analytics [0.0]
SmallML achieves enterprise-level prediction accuracy with datasets as small as 50-200 observations.<n> validation on customer churn data demonstrates 96.7% +/- 4.2% AUC with 100 observations per business.<n>By enabling enterprise-grade predictions for 33 million U.S. SMEs, SmallML addresses a critical gap in AI democratization.
arXiv Detail & Related papers (2025-11-18T02:00:55Z) - Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning [70.56067503630486]
We argue that sixth-generation (6G) intelligence is not fluent token prediction but calibrated the capacity to imagine and choose.<n>We show that WM-MS3M cuts mean absolute error (MAE) by 1.69% versus MS3M with 32% fewer parameters and similar latency, and achieves 35-80% lower root mean squared error (RMSE) than attention/hybrid baselines with 2.3-4.1x faster inference.
arXiv Detail & Related papers (2025-11-04T17:22:22Z) - Shoot First, Ask Questions Later? Building Rational Agents that Explore and Act Like People [81.63702981397408]
Given limited resources, to what extent do agents based on language models (LMs) act rationally?<n>We develop methods to benchmark and enhance agentic information-seeking, drawing on insights from human behavior.<n>For Spotter agents, our approach boosts accuracy by up to 14.7% absolute over LM-only baselines; for Captain agents, it raises expected information gain (EIG) by up to 0.227 bits (94.2% of the achievable noise ceiling)
arXiv Detail & Related papers (2025-10-23T17:57:28Z) - Advanced spectral clustering for heterogeneous data in credit risk monitoring systems [8.92280593592798]
We propose Advanced Spectral Clustering (ASC) to identify meaningful clusters in Heterogeneous Data.<n>By bridging spectral clustering theory with heterogeneous data applications, ASC enables the identification of meaningful clusters, such as recruitment-focused SMEs exhibiting a 30% lower default risk.
arXiv Detail & Related papers (2025-08-30T16:06:00Z) - Credit Risk Analysis for SMEs Using Graph Neural Networks in Supply Chain [2.060688901523233]
This paper introduces a Graph Neural Network (GNN)-based framework to map spatial dependencies and predict loan default risks.<n>Tests on real-world datasets from Discover and Ant Credit show the GNN surpasses traditional and other GNN baselines.<n>It also helps regulators model supply chain disruption impacts on banks, accurately forecasting loan defaults from material shortages, and offers Federal Reserve stress testers key data for CCAR risk buffers.
arXiv Detail & Related papers (2025-07-10T15:33:53Z) - Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait [70.00430652562012]
FarSight is an end-to-end system for person recognition that integrates biometric cues across face, gait, and body shape modalities.<n>FarSight incorporates novel algorithms across four core modules: multi-subject detection and tracking, recognition-aware video restoration, modality-specific biometric feature encoding, and quality-guided multi-modal fusion.
arXiv Detail & Related papers (2025-05-07T17:58:25Z) - A Cross-Country Analysis of GDPR Cookie Banners and Flexible Methods for Scraping Them [6.533686617147407]
We examine the top 10,000 websites across 31 countries under the ePrivacy Directive and consent-observatory.eu.<n>We show that 67% of websites use consent interfaces, but only 15% are minimally compliant, mostly because they lack a reject option.<n>There is little evidence that regulators' guidance and fines have impacted compliance rates, but 18% of compliance variance is explained by CMPs.
arXiv Detail & Related papers (2025-03-25T13:44:26Z) - Heterogeneous Federated Learning via Grouped Sequential-to-Parallel
Training [60.892342868936865]
Federated learning (FL) is a rapidly growing privacy-preserving collaborative machine learning paradigm.
We propose a data heterogeneous-robust FL approach, FedGSP, to address this challenge.
We show that FedGSP improves the accuracy by 3.7% on average compared with seven state-of-the-art approaches.
arXiv Detail & Related papers (2022-01-31T03:15:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.