Related papers: SmallML: Bayesian Transfer Learning for Small-Data Predictive Analytics

SmallML: Bayesian Transfer Learning for Small-Data Predictive Analytics

URL: http://arxiv.org/abs/2511.14049v1
Date: Tue, 18 Nov 2025 02:00:55 GMT
Title: SmallML: Bayesian Transfer Learning for Small-Data Predictive Analytics
Authors: Semen Leontev,
Abstract summary: SmallML achieves enterprise-level prediction accuracy with datasets as small as 50-200 observations.<n> validation on customer churn data demonstrates 96.7% +/- 4.2% AUC with 100 observations per business.<n>By enabling enterprise-grade predictions for 33 million U.S. SMEs, SmallML addresses a critical gap in AI democratization.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Small and medium-sized enterprises (SMEs) represent 99.9% of U.S. businesses yet remain systematically excluded from AI due to a mismatch between their operational scale and modern machine learning's data requirements. This paper introduces SmallML, a Bayesian transfer learning framework achieving enterprise-level prediction accuracy with datasets as small as 50-200 observations. We develop a three-layer architecture integrating transfer learning, hierarchical Bayesian modeling, and conformal prediction. Layer 1 extracts informative priors from 22,673 public records using a SHAP-based procedure transferring knowledge from gradient boosting to logistic regression. Layer 2 implements hierarchical pooling across J=5-50 SMEs with adaptive shrinkage, balancing population patterns with entity-specific characteristics. Layer 3 provides conformal sets with finite-sample coverage guarantees P(y in C(x)) >= 1-alpha for distribution-free uncertainty quantification. Validation on customer churn data demonstrates 96.7% +/- 4.2% AUC with 100 observations per business -- a +24.2 point improvement over independent logistic regression (72.5% +/- 8.1%), with p < 0.000001. Conformal prediction achieves 92% empirical coverage at 90% target. Training completes in 33 minutes on standard CPU hardware. By enabling enterprise-grade predictions for 33 million U.S. SMEs previously excluded from machine learning, SmallML addresses a critical gap in AI democratization. Keywords: Bayesian transfer learning, hierarchical models, conformal prediction, small-data analytics, SME machine learning

Related papers

Binary Token-Level Classification with DeBERTa for All-Type MWE Identification: A Lightweight Approach with Linguistic Enhancement [1.8429656136522097]
We present a comprehensive approach for multiword expression (MWE) identification that combines binary token-level classification, linguistic feature integration, and data augmentation.<n>Our DeBERTa-v3-large model achieves 69.8% F1 on the CoAM dataset, surpassing the best results (Qwen-72B, 57.8% F1) on this dataset by 12 points while using 165x fewer parameters.
arXiv Detail & Related papers (2026-01-27T08:42:54Z)
Explainable Deep Learning for Brain Tumor Classification: Comprehensive Benchmarking with Dual Interpretability and Lightweight Deployment [4.259927630334864]
This study provides a full deep learning system for automated classification of brain tumors from MRI images.<n>Inception-ResNet V2 reached state-of-the-art performance, achieving a 99.53% accuracy on testing.<n>This end-to-end solution considers accuracy, interpretability, and deployability of trustworthy AI.
arXiv Detail & Related papers (2025-11-20T17:21:40Z)
DataDecide: How to Predict Best Pretraining Data with Small Experiments [67.95896457895404]
We release models, data, and evaluations in DataDecide -- the most extensive open suite of models over differences in data and scale.<n>We conduct controlled pretraining experiments across 25 corpora with differing sources, deduplication, and filtering up to 100B tokens, model sizes up to 1B parameters, and 3 random seeds.
arXiv Detail & Related papers (2025-04-15T17:02:15Z)
Erasing Without Remembering: Implicit Knowledge Forgetting in Large Language Models [81.62767292169225]
We investigate knowledge forgetting in large language models with a focus on its generalisation.<n>We propose PerMU, a novel probability perturbation-based unlearning paradigm.<n>Experiments are conducted on a diverse range of datasets, including TOFU, Harry Potter, ZsRE, WMDP, and MUSE.
arXiv Detail & Related papers (2025-02-27T11:03:33Z)
Forecasting Frontier Language Model Agent Capabilities [0.7499722271664147]
We evaluate six forecasting methods that predict downstream capabilities of Language Models (LMs)<n>We use "one-step" approaches that predict benchmark scores from input metrics like compute or model release date directly or "two-step" approaches that first predict an intermediate metric like the principal component of cross-benchmark performance (PC-1) and human-evaluated competitive Elo ratings.<n>Our forecast predicts that by the beginning of 2026, non-specialized LM agents with low capability elicitation will reach a success rate of 54% on SWE-Bench Verified, while state-of-the-art LM agents will reach an 87% success rate.
arXiv Detail & Related papers (2025-02-21T02:34:17Z)
Beyond Scaling: Measuring and Predicting the Upper Bound of Knowledge Retention in Language Model Pre-Training [68.94373533768501]
We model knowledge retention, the capacity of a pre-trained language model to memorize factual information from its corpus, and introduce a principled method to estimate it prior to training.<n>We propose Size-dependent Mutual Information (SMI), an information-theoretic predictor that integrates knowledge frequency, knowledge specificity, and model size to forecast closed-book question answering (QA) accuracy.
arXiv Detail & Related papers (2025-02-06T13:23:53Z)
Training Compute-Optimal Protein Language Models [48.79416103951816]
Most protein language models are trained with extensive compute resources until performance gains plateau. Our investigation is grounded in a massive dataset consisting of 939 million protein sequences. We trained over 300 models ranging from 3.5 million to 10.7 billion parameters on 5 to 200 billion unique tokens.
arXiv Detail & Related papers (2024-11-04T14:58:37Z)
FedCSD: A Federated Learning Based Approach for Code-Smell Detection [7.026278088747708]
This paper proposes a Federated Learning Code Smell Detection approach that allows organizations to collaboratively train ML models. Three experiments have leveraged three manually validated datasets aimed at detecting and examining different code smell scenarios. An accuracy of 98.34% was achieved by the global model that has been trained using 10 companies for 100 training rounds.
arXiv Detail & Related papers (2023-05-31T09:51:45Z)
A Meta-Learning Approach to Predicting Performance and Data Requirements [163.4412093478316]
We propose an approach to estimate the number of samples required for a model to reach a target performance. We find that the power law, the de facto principle to estimate model performance, leads to large error when using a small dataset. We introduce a novel piecewise power law (PPL) that handles the two data differently.
arXiv Detail & Related papers (2023-03-02T21:48:22Z)
Uncertainty-aware Self-training for Text Classification with Few Labels [54.13279574908808]
We study self-training as one of the earliest semi-supervised learning approaches to reduce the annotation bottleneck. We propose an approach to improve self-training by incorporating uncertainty estimates of the underlying neural network. We show our methods leveraging only 20-30 labeled samples per class for each task for training and for validation can perform within 3% of fully supervised pre-trained language models.
arXiv Detail & Related papers (2020-06-27T08:13:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.