Fugu-MT 論文翻訳(概要): Using Synthetic Data for Machine Learning-based Childhood Vaccination Prediction in Narok, Kenya

論文の概要: Using Synthetic Data for Machine Learning-based Childhood Vaccination Prediction in Narok, Kenya

arxiv url: http://arxiv.org/abs/2604.08902v1
Date: Fri, 10 Apr 2026 03:12:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-13 17:57:53.657105
Title: Using Synthetic Data for Machine Learning-based Childhood Vaccination Prediction in Narok, Kenya
Title（参考訳）: ケニア・ナロクにおける機械学習による小児ワクチン接種予測のための合成データの利用
Authors: Jimmy Bach, Yang Li, Yaqi Liu, John Sankok, Rose Kimani, Carrie B. Dolan, Julius N. Odhiambo, Haipeng Chen,
Abstract要約: 遊牧民では、個人は子供として重要なワクチン接種を欠くリスクが増大する。そのような人口の1つがケニアのナロク郡にあるマサイ族であり、そこでは高量で質の高いデータがないため、正確なカバレッジの見積もりを妨げている。我々は,多人数で主要なワクチンが欠落するリスクのある子どもを特定し,タイムリーかつエビデンスに基づく介入を提供することを目標としている。
参考スコア（独自算出の注目度）: 8.32817820047995
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Background: Limited data utilization in low-resource settings poses a barrier to the vaccine delivery ecosystem, undermining efforts to achieve equitable immunization coverage. In nomadic populations, individuals face an increased risk of missing crucial vaccination doses as children. One such population is the Maasai in Narok County, Kenya, where the absence of high-volume, quality data hampers accurate coverage estimates, impedes efficient resource allocation, and weakens the ability to deliver timely interventions. Additionally, data privacy concerns are heightened in groups with limited sensitive data. Objectives: First, we aim to identify children at risk of missing key vaccines across a large population to provide timely, evidence-based interventions that support increased vaccination coverage. Second, we aim to better protect the privacy of sensitive health data in a vulnerable population. Methods: We digitized 8 years of child vaccination records from the MOH 510 registry (n=6,913) and applied machine learning models (Logistic Regression and XGBoost) to identify children at risk. Additionally, we utilize a novel approach to tabular diffusion-based synthetic data generation (TabSyn) to protect patient privacy within the models. Results: Our findings show that classification techniques can reliably and successfully predict children at risk of missing a vaccine, with recall, precision, and F1-scores exceeding 90% for some vaccines modeled. Additionally, training these models with synthetic data rather than real data, thus preserving the privacy of individuals within the original dataset, does not lead to a loss in predictive performance. Conclusion: These results support the use of synthetic data implementation in health informatics strategies for clinics with limited digital infrastructure, enabling privacy-preserving, scalable forecasting for childhood immunization coverage.
Abstract（参考訳）: 背景: 低リソース環境における限られたデータ利用は、ワクチン提供エコシステムへの障壁となり、平等な免疫のカバーを達成するための努力を損なう。遊牧民では、個人は子供として重要なワクチン接種を欠くリスクが増大する。そのような人口の1つはケニアのナロク郡のマサイ族であり、大量のデータがないため、正確なカバレッジの見積もりを妨げ、効率的な資源配分を阻害し、タイムリーな介入を行う能力を弱める。さらに、機密データに制限のあるグループでは、データのプライバシに関する懸念が高まっています。目的: 第一に, 予防接種率の増加を支える, タイムリーかつエビデンスに基づく介入を提供するため, 集団で主要なワクチンが欠落するリスクのある子どもを同定することを目的とする。第2に、脆弱な人口において、機密性の高い健康データのプライバシーをよりよく保護することを目的としている。方法:MOH 510レジストリ (n=6,913) から8年間の予防接種記録をデジタル化し, リスク児の特定に機械学習モデル(ロジスティック回帰とXGBoost)を適用した。さらに,表層拡散に基づく合成データ生成(TabSyn)の新たなアプローチを用いて,患者のプライバシを保護する。結果: 診断手法は, ワクチンの欠失リスクのある子どもに対して, 再現率, 精度, F1スコアが90%を超えることを確実かつ確実に予測できることが示唆された。さらに、これらのモデルを実際のデータではなく合成データでトレーニングすることで、元のデータセット内の個人のプライバシを保存することは、予測性能の損失につながることはない。結論: これらの結果は, デジタルインフラが限られているクリニックの健康情報学戦略における総合的データ実装の活用を支援し, 子どもの予防接種に対するプライバシー保護, スケーラブルな予測を可能にした。

論文の概要: Using Synthetic Data for Machine Learning-based Childhood Vaccination Prediction in Narok, Kenya

関連論文リスト