Risk prediction models using genetic data have seen increasing traction in
genomics. However, most of the polygenic risk models were developed using data
from participants with similar (mostly European) ancestry. This can lead to
biases in the risk predictors resulting in poor generalization when applied to
minority populations and admixed individuals such as African Americans. To
address this bias, largely due to the prediction models being confounded by the
underlying population structure, we propose a novel deep-learning framework
that leverages data from diverse population and disentangles ancestry from the
phenotype-relevant information in its representation. The ancestry disentangled
representation can be used to build risk predictors that perform better across
minority populations. We applied the proposed method to the analysis of
Alzheimer's disease genetics. Comparing with standard linear and nonlinear risk
prediction methods, the proposed method substantially improves risk prediction
in minority populations, particularly for admixed individuals.
Improving genetic risk prediction across diverse population by disentangling ancestry representations Prashnna K Gyawali1, Yann Le Guen1,2, Xiaoxia Liu1, Hua Tang3, James Zou4*, Zihuai He1,5* 1Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA 2 Institut du Cerveau - Paris Brain Institute - ICM, Paris, France 3Department of Genetics, Stanford University, Stanford, CA, USA 4Department of Biomedical Data Science, Stanford University, Stanford, CA, USA 5Quantitative Sciences Unit, Department of Medicine (Biomedical Informatics Research), Stanford University, Stanford, CA, USA * co-corresponding authors One Sentence Summary: Deep learning model for disentangling ancestry representations and related population bias improves genetic risk prediction across diverse population.
Improving genetic risk prediction across diverse population by disentangling ancestry representations Prashnna K Gyawali1, Yann Le Guen1,2, Xiaoxia Liu1, Hua Tang3, James Zou4*, Zihuai He1,5* 1Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA 2 Institut du Cerveau - Paris Brain Institute - ICM, Paris, France 3Department of Genetics, Stanford University, Stanford, CA, USA 4Department of Biomedical Data Science, Stanford University, Stanford, CA, USA 5Quantitative Sciences Unit, Department of Medicine (Biomedical Informatics Research), Stanford University, Stanford, CA, USA * co-corresponding authors One Sentence Summary: Deep learning model for disentangling ancestry representations and related population bias improves genetic risk prediction across diverse population. 訳抜け防止モード: Prashnna K Gyawali1の系統解析による多様な個体群間の遺伝的リスク予測の改善 Yann Le Guen1,2, Xiaoxia Liu1, Hua Tang3, James Zou4 *, Zihuai He1,5 * 1 Department of Neurology and Neurological Sciences スタンフォード大学(Stanford University, Stanford, CA, USA 2 Institut du Cerveau - Paris Brain Institute - ICM, Paris, France 3 Department of Genetics スタンフォード大学, スタンフォード大学, スタンフォード大学, アメリカ合衆国, バイオメディカルデータサイエンスの4部門。 スタンフォード大学(Stanford University, Stanford, CA, USA 5Quantitative Sciences Unit) スタンフォード大学医学科(生物医学情報学研究科) スタンフォード大学, CA, USA * co - 対応する著者1人概要 遺伝的リスク予測を多種多様な個体群で改善する。
0.79
Abstract: Risk prediction models using genetic data have seen increasing traction in genomics.
概要:遺伝子データを用いたリスク予測モデルは、ゲノム学において勢いを増している。
0.61
However, most of the polygenic risk models were developed using data from participants with similar (mostly European) ancestry.
This can lead to biases in the risk predictors resulting in poor generalization when applied to minority populations and admixed individuals such as African Americans.
To address this bias, largely due to the prediction models being confounded by the underlying population structure, we propose a novel deep-learning framework that leverages data from diverse population and disentangles ancestry from the phenotype-relevant information in its representation.
We applied the proposed method to the analysis of Alzheimer’s disease genetics.
提案手法をアルツハイマー病遺伝子解析に応用した。
0.58
Comparing with standard linear and nonlinear risk prediction methods, the proposed method substantially improves risk prediction in minority populations, particularly for admixed individuals.
Keywords: Deep learning, disentangled representation, polygenic risk score, algorithmic bias INTRODUCTION Prediction of complex phenotypic traits, particularly for complex diseases like Alzheimer’s disease (AD) in humans, has seen increased traction in genomics research1.
Different approaches2–4 like polygenic risk score (PRS) and wide-range of linear models have been proposed for risk prediction of complex diseases based on the genotype-phenotype associations for variants identified by GWAS.
Most genetic studies of complex human traits has been undertaken in homogeneous populations from the same ancestry group7, with the majority of studies focusing on European ancestry.
instance, despite African American individuals being twice as likely to develop AD compared to European, the genetic studies in African Americans are scarce8.
Moreover, the genomic studies for admixed individuals are limited, although they make up more than one-third of the US population, with the population becoming increasingly mixed over time9.
This lack of representation for minority populations and admixed individuals, if not mitigated, will limit our understanding of true genotype-phenotype associations and subsequently the development of genetic risk predictions.
This will eventually hinder the long-awaited promise to develop precision medicine.
これにより、待望の精密医療の開発が妨げられることになる。
0.59
The discussion for mitigating the limited diversity in genetic studies has been more pronounced recently due to the consistent observation that the existing models have far greater predictive value in individuals of European descent than of other ancestries10.
Although the non-linear models via deep learning produce impressive results across domains, they are more prone to overfitting – failing to generalize even with minor shifts in the training paradigm13.
For instance, limited generalization of PRS for blood pressure, as reviewed above, substantially increased when GWAS of Europeans were combined with GWAS of Hispanics/Latinos11.
pointed out that the misdiagnoses of multiple individuals with African ancestry would have been corrected with the inclusion of even a small number of African Americans14.
Although efforts to increase the non-European proportion of GWAS participants are being implemented, the proportion of individuals with African and Hispanic/Latino ancestry in GWAS has remained essentially unchanged15.
As such, increasing training participants of other ancestries, particularly for admixed individuals, is not likely to occur any time soon without a dramatic priority shift, given the current imbalance and stalled diversifying progress over the recent years10.
Previous studies for learning robust and unbiased genotype-phenotype relationships against population bias have mostly been carried out with the PRS and its variants16,17.
In addition, PRS won’t capture the complex genotype-phenotype relationship.
さらにprsは、複雑な遺伝子型と表現型の関係を捉えない。
0.57
Alternatively, deep learning has recently shown improved predictability across domains (e g , vision, language, etc.) due to its ability to capture the complex input-output relationship.
major line of work for learning robust and unbiased representation with deep learning involves learning domain-invariant representation, where participants from different domains share common traits (e g , genotype participants from different ancestry with common phenotype)18,19.
Here, we introduce DisPred, a novel deep-learning-based framework that can integrate data from diverse population to improve generalizability of genetic risk prediction.
The proposed method combines a disentangling approach to separate the effect of ancestry from the phenotypespecific representation, and an ensemble modeling approach to combine the predictions from disentangled latent representation and original data.
Moreover, DisPred does not require self-reported ancestry information for predicting future individuals, making it suitable for practical use because human genetics literature has often questioned the definition and the use of an individual’s ancestry27,28, and at the minimum, the ancestry information may not be available during the test time.
We evaluate DisPred performance to predict AD risk prediction in a multi-ethnic cohort composed of AD cases and controls and show that DisPred performs better than existing models in minority populations, particularly for admixed individuals.
The proposed deep RESULTS Overview of the proposed workflow with DisPred DisPred is a three-stage method to improve the phenotype prediction from the genotype dosage data (each feature has a value between 0 and 2).
First, as shown in Figure 1 (A), we built a disentangling autoencoder, a deep-learning-based autoencoder, to learning architecture involves separating latent representation into ancestry-specific representation and phenotype-specific representation.
In this way, this stage explicitly separates the ancestral effect from phenotype-specific representation.
このようにして、この段階は表現型特有の表現から祖先効果を明示的に分離する。
0.55
Second, as shown in Figure 1 (B), we used the learned phenotype-specific representation extracted from the disentangling autoencoder to train the prediction model.
We consider a linear model for the phenotype prediction model in our case.
本事例では表現型予測モデルに対する線形モデルを考える。
0.78
Since the phenotype-specific representation is nonlinear, the prediction model, despite being linear, will still capture the nonlinear genotype-phenotype relationship.
Finally, as shown in Figure 1 (C), we create ensemble models by combining the predictions from the learned representations, i.e., the result of our second stage, with the predictions from the original data, i.e., the existing approach of building prediction models.
The second stage builds the prediction model from the disentangled representations and the ensemble in the third stage aims to enhance the prediction accuracy.
(B) The disentangled phenotype-specific representation is then used for the phenotype prediction.
(b) 表現型特定表現は、表現型予測に使用される。
0.64
A separate linear prediction model is trained on the obtained representation for the phenotype prediction.
表現型予測のために得られた表現に基づいて、分離線形予測モデルを訓練する。
0.70
(C) To increase prediction power, we use the ensemble modeling approach, where the parameters can be obtained either from grid search or gradient-based search.
c) 予測力を高めるために, グリッド探索あるいは勾配に基づく探索からパラメータを得ることができるアンサンブルモデリング手法を用いる。
0.73
𝑁 with 𝑁 participants, where 𝐲 represents the We consider a training data 𝒟 = {𝐱𝐢, 𝐲𝐢, 𝐚𝐢}𝑖=1 disease label (e g , case and control for binarized data), and 𝒂 represents the environment label for the data (e g , categorical ancestral label).
ここで、y はトレーニングデータ d = {xi, yi, ai}i=1 病ラベル(例、二値化データのケースとコントロール)、a はデータの環境ラベル(例、カテゴリー的祖先ラベル)を表す。 訳抜け防止モード: n の参加者が n で、y は訓練データ d = { xi, yi, ai}i=1 病名ラベル(例,2値化データの事例と制御) そして、aはデータ(例えば、カテゴリの祖先ラベル)の環境ラベルを表す。
0.61
An encoder function ℱθ(𝐱) decomposes original data 𝐱 into ancestry-specific representation 𝐳𝐚 and phenotype-specific representation 𝐳𝐝, and a decoder function 𝒢θ′(𝐳𝐚, 𝐳𝐝) reconstruct the original data as 𝐱̂ using 𝐳𝐚 and 𝐳𝐝.
エンコーダ関数 Fθ(x) は、原データ x を祖先固有表現 za と表現型固有表現 zd に分解し、デコーダ関数 Gθ′(za, zd) は za と zd を用いて原データを x として再構成する。
0.70
Both the encoder function (θ) and decoder function (θ′) are parameterized by deep neural networks, and together represent the disentangling autoencoder.
To disentangle the latent bottleneck representation, we propose a latent loss based on the following assumptions: for any data pair originating from the same environment, the corresponding pair of latent variables 𝐳𝐚 should be similar, and different if data pair belong to different environment 𝒂.
潜在ボトルネック表現を解消するために、同じ環境から派生した任意のデータ対に対して、対応する潜在変数 za は類似するべきであり、もしデータ対が異なる環境 a に属するなら異なる。 訳抜け防止モード: 潜在的なボトルネック表現を解消する。 そこで本研究では,同一環境を起源とする任意のデータペアに対して,以下の仮定に基づく潜在損失を提案する。 対応する潜在変数のペア za は類似しており、データペアが異なる環境 a に属する場合は異なる。
0.72
Similarly, the latent 𝐳𝐝 should be similar or different if the pair belong to the same or different disease label 𝒚.
Overall, the objective function for training the disentangled autoencoder takes the following form:
全体として、非絡み合ったオートエンコーダを訓練する目的関数は以下の形をとる。
0.63
ℒ 𝐷𝑖𝑠𝑒𝑛𝑡𝑔𝑙−𝐴𝐸 = ℒ𝑅𝑒𝑐𝑜𝑛 + 𝛼𝑑 ⋅ ℒ𝑧𝑑
ℒ 𝐷𝑖𝑠𝑒𝑛𝑡𝑔𝑙−𝐴𝐸 = ℒ𝑅𝑒𝑐𝑜𝑛 + 𝛼𝑑 ⋅ ℒ𝑧𝑑
0.41
𝑆𝐶 + 𝛼𝑎 ⋅ ℒ𝑧𝑎 𝑆𝐶
𝑆𝐶 + 𝛼𝑎 ⋅ ℒ𝑧𝑎 𝑆𝐶
0.49
英語(論文から抽出)
日本語訳
スコア
where 𝛼∗ represent the hyperparameter for the corresponding latent loss.
ここで α∗ は対応する潜在損失のハイパーパラメータを表す。
0.64
For each latent variable, the contrastive loss will enforce the encoder to give closely aligned representations to all the entries from the same label in the given batch encouraging disentanglement of disease and environment features in two separate latent variables.
In the second stage, we utilize the trained autoencoder to extract phenotype-specific representation 𝐳𝐝 to train prediction model for the given phenotype.
Our prediction models are trained with linear regression, regressing 𝐳𝐝 to the corresponding disease label 𝒚.
予測モデルは線形回帰で訓練され, zd を対応する疾患ラベル y に回帰させる。
0.75
Finally, in the third stage, we create ensemble models by combining 𝑝𝑧, the predictions from the linear model using the phenotype-specific representation 𝐳𝐝, with 𝑝𝑥, the predictions from the linear model using the original data 𝐱:
最後に、第3段階では、表現型固有の表現 zd を用いた線形モデルからの予測である pz と、元のデータ x を用いた線形モデルからの予測とを組み合わせて、アンサンブルモデルを作成する。
0.81
𝑝𝑒 = 𝛼 ⋅ 𝑝𝑧 + 𝛽 ⋅ 𝑝𝑥
𝑝𝑒 = 𝛼 ⋅ 𝑝𝑧 + 𝛽 ⋅ 𝑝𝑥
0.43
where 𝛼 and 𝛽 are the weighing parameters determined using gradient-based search using the validation set.
Here 𝑝𝑒 represents prediction from additive ensemble model, a complex model that combines the predictions resulting from linear and nonlinear relationship between original data and the phenotype target.
The aim is to evaluate the prediction accuracy (via Area under the curve: AUC) of the proposed DisPred and Disentangling Autoencoder (Disentgl-AE) in comparison to other conventional methods in genomics trained on participants from European ancestry, including Polygenic Risk Score (PRS), linear models (Lasso), and the supervised Neural Network (NN).
Although unconventional, we also consider the models trained using other non-European ancestries like African American population (AFR) to understand the effect of training models with training participants other than European ancestry.
We considered genetic variants identified by existing GWAS as features, including 5,014 variants associated with AD from Jansen et al., 201923 (variants with p < 1e-5) and 78 variants from Andrews et al , 202024.
Jansen et al., 201923(p < 1e-5の変異)とAndrews et al , 202024の78の変異を含む,既存のGWASで同定された遺伝的変異体を特徴として検討した。
0.74
The outcome or phenotype label for the ADSP is the clinical diagnosis for the presence or absence of AD, and for the UKB is dichotomized version of continuous proxyphenotype derived from the information from family history (first-degree relative with reported AD or dementia).
After filtering for participants and variants, the final dataset includes 11,640 participants with 3,892 variants for the ADSP cohort and 461,579 participants with 4,967 variants for the UKB.
We split the data into training and test and further divided the training set into training and validation sets with stratification based on the phenotype labels.
We consider self-reported ancestry labels in the training dataset to train the disentangling autoencoder and other ancestryspecific linear and non-linear models.
学習データセットにおける自己報告型祖先ラベルを, 分離型オートエンコーダやその他の祖先固有の線形・非線形モデルを訓練するために検討する。 訳抜け防止モード: 学習データセットにおける自己報告系ラベルの検討 to training the disentangling autoencoder and other ancestorstry specific linear and non- linear model。
0.78
The independent test data consists of 2,101 participants for the ADSP and 138,474 participants for the UKB.
独立したテストデータは、ADSPの参加者2,101人、UKBの参加者138,474人で構成される。
0.53
To evaluate the performance of the proposed
提案する性能を評価するために
0.87
learning to capture domain
ドメインをキャプチャする学習
0.78
英語(論文から抽出)
日本語訳
スコア
method to predict AD status in minority populations or admixed individuals, we estimated ancestry percentages from genome-wide data using SNPWeight v2.125.
This is to ensure that the partitions accurately reflect the actual genetic-ancestry background and enable a more rigorous evaluation of the methods.
これは、分割が実際の遺伝学の背景を正確に反映し、より厳密な方法の評価を可能にするためである。
0.55
Details for dataset preparation are explained in the Methods section.
データセットの準備の詳細はmethodsセクションで説明されている。
0.67
Representation learned via disentangling autoencoder
disentangling autoencoderで学ぶ表現
0.73
Figure 2 UMAP plots of ancestry-specific representation (left column) and phenotype-specific representation (right column) learned from Disentgl-AE for the ADSP (top row) and the UKB (bottom row).
We colored the points with self-reported ancestry labels (EUR: European, MIX: admixed, ASN: Asian, and AFR: African), demonstrating that phenotype-specific representation is invariant to the ancestry background.
We analyzed the representations learned from the proposed DisPred framework.
提案するdispredフレームワークから得られた表現を分析した。
0.65
In Figure 2, we present the Uniform Manifold Approximation and Projection (UMAP)26 plots for the ancestryspecific representation 𝐳𝐚 and phenotype-specific representation 𝐳𝐝 to assess whether the proposed method can separate the ancestry effect from the GWAS variants.
First, we note that the representation captured the ancestry-related information on the left plots.
まず、この表現は、左のプロットの祖先に関連する情報をキャプチャした。
0.57
The three training ancestry groups for the ADSP cohort were clearly separated and three out of four training ancestry groups for the UKB cohort were also separated.
This is a limitation of the proposed regularization, or the proposed architecture, which may not be able to completely separate the MIX group from other ancestries.
On the right side of both figures, when ancestry labels are applied to the phenotype-specific representation 𝐳𝐝, we note that the ancestry labels are scattered without forming clear clusters.
Meaning that our proposed method correctly identified a latent representation 𝐳𝐝 that is invariant to ancestry background, which we later use to build the prediction models.
つまり,提案手法は祖先の背景に不変な潜在表現zdを正しく同定し,予測モデルの構築に用いた。
0.66
英語(論文から抽出)
日本語訳
スコア
DisPred improved AD risk prediction for the minority populations and for admixed individuals
dispredは少数民族と混血者に対する広告リスク予測を改善した
0.74
Figure 3 Test data distribution for the ADSP and the UKB cohorts (A).
図3 ADSP と UKB コホート (A) のデータ分布をテストする。
0.72
In the pie chart, we present the proportion of admixed participants compared to the European participants and other dominant non-European participants, i.e., African American (AFR) for the ADSP and South Asian (SAS) for the UKB.
Prediction accuracy (AUC) for different models (PRS trained on European (PRSEUR), Lasso trained on African American (Lasso-AFR), Lasso trained on European (Lasso-EUR), Neural Network trained on African American (NN-AFR), Neural Network trained on European (NN-EUR), Adversarial Neural Network trained on all data (Adv), Disentangling Autoencoder (Disentgl-AE), and DisPred trained on all data) on different subsets of the test dataset: all participants (B), EUR participants (C), admixed participants (D), and participants from non-European ancestries (E) in the test dataset for the ADSP and the UKB cohorts.
Prediction accuracy (AUC) for different models (PRS trained on European (PRSEUR), Lasso trained on African American (Lasso-AFR), Lasso trained on European (Lasso-EUR), Neural Network trained on African American (NN-AFR), Neural Network trained on European (NN-EUR), Adversarial Neural Network trained on all data (Adv), Disentangling Autoencoder (Disentgl-AE), and DisPred trained on all data) on different subsets of the test dataset: all participants (B), EUR participants (C), admixed participants (D), and participants from non-European ancestries (E) in the test dataset for the ADSP and the UKB cohorts. 訳抜け防止モード: 異なるモデルに対する予測精度(AUC) PRSはヨーロッパ(PRSEUR)、ラッソはアフリカ系アメリカ人(ラッソ - AFR)で訓練された。 ラッソはヨーロッパ(ラッソ - EUR)、ニューラルネットワークはアフリカ系アメリカ人(NN - AFR)で訓練された。 ヨーロッパ(NN - EUR)でトレーニングされたニューラルネットワーク(Adv) Disentangling Autoencoder (Disentgl - AE ) と DisPred はテストデータセットのさまざまなサブセット上でトレーニングされている。 EUR参加者(C)、混合参加者(D) ADSPとUKBコホートのテストデータセットには,非ヨーロッパ系の子孫(E)も参加していた。
0.70
We report the main results in Figure 3.
主な結果を図3に示す。
0.55
First, test data distribution for different ancestries for the two cohorts is presented in panel A. We considered 90% and 65% as estimated ancestry cut-offs, respectively, for the ADSP and the UKB to stratify test participants into five super populations: South Asians (SAS), East Asians (EAS), Americans (AMR), Africans (AFR), and Europeans (EUR), and an admixed group composed of individuals not passing cut-off in any ancestry.
ii) European participants, iii) admixed participants, and
二 欧州の参加者 三 混成参加者、及び
0.57
iv) non-European participants, primarily AFR for the ADSP and primarily SAS for the UKB.
iv)非欧州の参加者、主にADSPのAFR、主に英国におけるSAS。
0.77
Panel B presents the result for all the test participants, and the proposed DisPred leads to the best result for ADSP and the joint best result for UKB (Lasso-EUR being marginally better than DisPred).
The analysis restricted to European participants is shown in panel C. In this case, Lasso-EUR achieves the best result, and the proposed method produces the second-best result for both datasets.
In UKB, the performance on admixed individuals (best AUC: 0.6692 by DisPred) is equivalent to the one
UKBでは、アドミックスされた個人のパフォーマンス(ベストAUC: 0.6692 by DisPred)は、それと同値である
0.71
英語(論文から抽出)
日本語訳
スコア
obtained in the EUR individuals (best AUC: 0.6695 by Lasso-EUR), but for the ADSP, the predictive ability is better in EUR (best AUC: 0.6532 by Lasso-EUR) compared to admixed individuals (best AUC: 0.6157 by DisPred).
AUCでは0.6695(Lasso-EURでは0.6695)であるが、ADSPではEUR(Lasso-EURでは0.6532)とアドミキシングされた(DisPredでは0.6157)。 訳抜け防止モード: AUC: 0.6695 by Lasso - EUR ) しかしADSPでは,admixed individuals (ベスト AUC : 0.6157 by DisPred ) と比較して,EUR (ベスト AUC : 0.6532 by Lasso - EUR ) では予測能力が優れていた。
0.89
Increased heterogeneity in the composition of admixed individuals in the ADSP cohort may explain this difference.
adpコホート中の混合個体の組成の多様性が増加すると、この違いが説明できる。
0.65
The top two ancestral composition for the ADSP’s admixed group includes 45.31% EUR and 40.72% AFR, compared to 50.51% EUR and 28.50% AFR for the UKB.
However, for the SAS in UKB, the adversarial trained NN (Adv) leads to the best AUC.
しかし、UKBのSASでは、敵の訓練されたNN(Adv)が最高のAUCにつながる。
0.65
Except for this group, in general, Adv demonstrates poor generalizability.
この群を除いて、一般には、Adv は一般化性が悪いことを示す。
0.50
Similarly, the predictive ability of PRS is substantially lower in non-European participants.
同様に、prsの予測能力はヨーロッパ以外の参加者ではかなり低い。
0.64
On the other hand, Lasso-based models, especially trained on EUR data, performed competitively, especially in the groups dominated by European ancestry.
Compared to these linear and non-linear models, the proposed Disentgl-AE utilizes all the available training participants to separate the effect of ancestry from the phenotype representation.
As such, the obtained phenotype representation demonstrated the best predictive abilities.
得られた表現型表現は,最高の予測能力を示した。
0.67
DisPred performs better in the presence of ancestral mismatch and dataset shift
DisPredは祖先ミスマッチとデータセットシフトの存在下でパフォーマンスが向上
0.81
Figure 4 Prediction accuracy (AUC) for different models (PRS trained on European (PRS-EUR), Lasso trained on European (LassoEUR), Lasso trained on African American (Lasso-AFR), Neural Network trained on European (NN-EUR), Neural Network trained on African American (NN-AFR), and DisPred) on the practical setting of when there is alignment or mismatch between the ancestry (left side of dotted lines) or dataset shift (right side of dotted lines) of training and test participants.
All prediction models are trained on ADSP (EUR participants on the left and AFR participants on the right).
すべての予測モデルはADSP(EUR参加者は左、AFR参加者は右)で訓練される。
0.72
In a practical setting, since existing methods are often ancestry specific, ancestry information is required for predicting future patients or individuals.
However, human genetics literature has often questioned the definition and the use of an individual’s ancestry27,28, and at the minimum, the ancestry information may not be available during the test time.
Without accurate ancestry information, the participants could either align or mismatch with the corresponding ancestry of training populations.
正確な祖先情報がなければ、参加者はトレーニング人口の祖先と一致したりミスマッチしたりできる。
0.67
Similarly, dataset shift (or distribution shift) might exist in training and test participant, i.e., application of models trained on one cohort to participants of different cohorts.
For such conditions, we present the results in Figure 4.
このような条件で、結果は図4に示します。
0.78
We consider four different test scenarios:
4つの異なるテストシナリオを考えます
0.72
i) ADSP EUR (n=1,014),
一 ADSP EUR(n=1,014)
0.86
ii) ADSP AFR (n=161),
二 ADSP AFR(n=161)
0.72
iii) UKB EUR (n=126,958), and
三 英国EUR(n=126,958)及び
0.68
iv) UKB SAS (n=2,443), and compared the performance of DisPred against models trained with ADSP EUR participants (left panel), and models trained with ADSP AFR participants (right panel).
iv)UKB SAS (n=2,443) と、ADSP EUR参加者(左パネル)とADSP AFR参加者(右パネル)で訓練されたモデルとを比較した。
0.72
We use the same ancestry percentage cut-off, as used in the previous section, to stratify the test participants into EUR and AFR for ADSP and EUR and SAS for UKB.
We note that among all the analyzed cases, only when the EUR test
分析されたすべての事例のうち、EURテスト時にのみ注目する。
0.61
英語(論文から抽出)
日本語訳
スコア
participants aligned with the EUR training population, DisPred obtain less predictive accuracy than Lasso.
参加者はEURトレーニング人口と一致し、DisPredはLassoよりも予測精度が低い。
0.54
Other than that, DisPred achieved better results when the test participants didn’t align with the training population for AFR and EUR analysis and for AFR, even when they aligned.
Furthermore, in the case of dataset shift, DisPred performs better in all the cases, including when models trained on EUR participants (ADSP) are applied to the EUR participants (UKB).
The background is color-coded with the proportion of self-reported ancestry (EUR: European, AFR: African American, AMR: American, EAS: East Asian, and SAS: South Asian) for each data subset.
DisPred, compared to existing methods, improved predictability for the minority population, particularly for admixed individuals.
DisPredは、既存の方法と比較して、少数民族、特に混成個体の予測可能性を改善した。
0.48
Since admixed individuals’ genomes is an admixture of genomes from more than one ancestral population, we identified that different models struggle as individual heterogeneity increases.
This section presents an in-depth evaluation of this issue for the ADSP cohort.
本項では,ADSPコホートにおけるこの問題の詳細な評価について述べる。
0.57
Estimated ancestral percentages was used to calculate the heterogeneity.
推定祖先比は異質性を計算するために用いられた。
0.54
For each sample in the test set, the variance of ancestral proportion was computed, and we sorted these in decreasing order, obtaining a sequence from homogeneity to heterogeneity.
For such arrangement, Figure 5 shows the results, where we created numerous data subsets by sliding window-based process with a window size of 750 participants and stride length of 50 participants.
This way, the data subset heterogeneity increases from left to right.
このように、データサブセットの不均一性が左から右に増加する。
0.66
The background is color-coded with the proportion of self-reported ancestry for each data subset.
背景は、各データサブセットの自己報告された祖先の割合で色分けされる。
0.65
All methods considered in this work start to degrade as we move toward heterogeneity or admixed individuals.
この研究で考慮されたすべての手法は、不均一性や混在した個人へと進むにつれて劣化し始めます。
0.43
The proposed DisPred performs well when there is a sharp decline in the proportion of European ancestry (in the middle region of the graph) and slowly decreases towards the end.
DISCUSSION As the population becomes increasingly mixed over time, understanding how to analyze and interpret admixed genomes will be critical for enabling transethnic and multiethnic medical genetic studies and ensuring that genetics research findings are broadly applicable.
We thus urge a joint research effort to confront the existing approach for ancestry-specific AI frameworks and focus on building unbiased alternatives.
This study introduced DisPred, a deep learning-based framework for improving AD risk prediction in minority populations, particularly for admixed individuals.
First, we developed a disentangling autoencoder to disentangle genotype into ancestry-specific and phenotype-specific representations.
まず,遺伝子型を祖先別表現と表現型別表現に分解する自動エンコーダを開発した。
0.58
We then build predictive models using only phenotype-specific representation.
次に表現型固有の表現のみを用いて予測モデルを構築する。
0.57
Finally, we used ensemble modeling to combine the prediction models built using disentangled latent representation, and the model built using the original data.
In this study, we also demonstrated that, unlike existing practices where models built for particular ancestry are applied to the individuals of the same ancestry requiring ancestry information at the test time, DisPred predicts the risk of test individuals without any assumption on their ancestry information.
We believe this is an appealing feature of DisPred as there has been a long-standing debate on the use of self-reported race/ethnicity/genet ic ancestry in biomedical research.
The scientific, social, and cultural considerations make it challenging to provide an optimal label for race or ethnicity, which could result in ambiguity, contributing to misdiagnosis.
Similarly, dataset shift, i.e., different training and test data distributions (or cohorts), although a common practical scenario, makes generalization challenging for machine learning models.
We showed that compared to existing methods, DisPred demonstrated better predictive abilities in the presence of dataset shift.
既存の手法と比較して、dispredはデータセットシフトの存在下でより良い予測能力を示した。
0.69
Overall, existing methods typically improve predictions by leveraging data from the target population and, in turn, make the model more sensitive to the ancestry background and thus require relevant ancestry information at prediction stages.
Unlike them, DisPred is different and novel as it learns invariant representation robust to ancestry background and performs well in a diverse test environment.
However, certain limitations are not well addressed in this study.
しかし, 本研究では, 一定の限界に対処できない。
0.71
First, in this study, we didn’t perform any feature importance analysis to understand if specific variants were more pronounced for admixed groups than the homogenous European ancestry.
Due to the multistage framework involving different training and optimization processes, unlike other AI approaches, it is not straightforward to relate the predicted phenotype to the original data level.
A detailed study is required to trace the flow of information from the genetic variants to
遺伝的変異体から情報の流れを追跡するには詳細な研究が必要である。
0.79
英語(論文から抽出)
日本語訳
スコア
disentangled representations and then to the predicted phenotype.
切り離された表現 そして予測された表現型に
0.63
The second is the engineering efforts.
2つ目はエンジニアリングの取り組みです。
0.66
The deep learning architecture, proposed in this work constructed using multi-layer perceptron with ReLU activation, potentially has space for improvements.
It is worth mentioning that our proposed framework is not confined to the AD risk prediction and can be extended to other phenotypes.
提案フレームワークは広告リスク予測に限定されず,他の表現型にも拡張可能である点に注意が必要だ。
0.72
MATERIALS AND METHODS In this section, we first describe how we constructed the dataset and provide details of our proposed method and implementations for collaboratively training the AI model.
Dataset preparation In this study, we consider two datasets: the ADSP and the UKB.
本研究では、ADSPとUKBの2つのデータセットについて検討する。
0.67
We use the p<1e-5 threshold to obtain the candidate regions of 5,014 GWAS variants, obtained from Jansen et al , 201923 and Andrews et al , 202024.
我々は,Jansen et al , 201923 および Andrews et al , 202024 から得られた 5,014 GWAS 変種候補領域を求めるために p<1e-5 閾値を用いた。
0.83
We remove SNPs with more than a 10% missing rate to ensure marker quality.
マーカーの品質を確保するために,SNPを10%以上の欠落率で除去する。
0.73
We then remove participants with absent AD phenotype or ancestral information.
次に,ad表現型や祖先情報のない参加者を除外する。
0.56
For the ADSP, we have a dichotomous case and control label for AD phenotype.
ADSPでは,ADの表現型に対するジコトミー症例とコントロールラベルがある。
0.75
For the UKB, we use the AD-proxy score defined in Jansen et al , 201923, which combines the self-reported parental AD status and the individual AD status.
イギリスでは、Jansen et al , 201923で定義されたAD-proxyスコアを用いて、自己申告された親ADステータスと個々のADステータスを組み合わせる。
0.61
Since AD is an age-related disease, we removed control participants below age 65 (age-at-last-visit).
ADは年齢関連疾患であるため,65歳未満のコントロール・参加者を排除した。
0.74
For the ADSP, since case and controls are well defined, we perform this step for the whole dataset, but for the UKB, we first split data into training and test and remove AD proxy score less than 2.0 below age 65.
The training data consists of 8,403 participants, including 2,061 AFR, 4,780 EUR, and 1,562 HIS, for ADSP and 323,105 participants, including 4,493 AFR, 306,169 EUR, 5,406 HIS, and 7,037 ASN for UKB based on selfreported ancestry labels.
These self-reported ancestry labels are incorporated into the proposed method to learn the latent representation.
これらの自己申告された祖先ラベルを提案手法に組み込んで潜在表現を学習する。
0.49
Further, we held out 1,000 participants for the ADSP and 10,000 participants for the UKB as the validation data.
さらに,ADSPの参加者は1,000人,UKBの参加者は10,000人であった。
0.69
The data division into training, test, and validation is stratified using phenotype labels, i.e., each set contains approximately the same percentage of participants of phenotype labels as the complete set.
For the UKB, we dichotomized the AD-proxy score with a threshold of 2.0.
ukbについては、2.0のしきい値でad-proxyスコアを2分した。
0.45
Overall, the number of training, test, and validation participants for the ADSP are 7,403, 2,101, and 1,000 and for the UKB are 313,105, 138,474, and 10,000, respectively.
Ancestry determination For each cohort included in our analysis, we first determined the ancestry of each individual with SNPWeight v2.125 using reference populations from the 1000 Genomes Consortium31.
Prior to ancestry determination, variants were filtered based on genotyping rate (< 95%), minor allele frequency (MAF < 1%) and Hardy-Weinberg equilibrium (HWE) in controls (p < 1e-5).
an ancestry percentage cut-off > 90% (for ADSP) and > 65% (for UKB), the participants were stratified into five super populations: South Asians, East Asians, Americans, Africans, and Europeans, and an admixed group composed of individuals not passing cut-off in any ancestry.
The encoder decomposes original data 𝐱 into ancestry-specific representation 𝐳𝐚 and phenotype-specific representation 𝐳𝐝, and the decoder reconstruct the original data as 𝐱̂ using 𝐳𝐚 and 𝐳𝐝, which are concatenated together.
To achieve disentanglement, we propose contrastive loss20–22 to enforce the similarities between the latent representations obtained from the data pair in the 𝑁 , the contrastive learning latent space.
For a batch with 𝑁 randomly sampled pairs, {𝒙𝒌, 𝒚𝒌}𝑘=1 algorithm use two random augmentations (commonly referred to as ``view'') to create 2𝑁pairs 2N , such that 𝐲̃[1:N] = 𝐲̃[N+1:2N] = 𝐲[1:N].
N 個のランダムサンプリングペアを持つバッチに対して、 {xk, yk}k=1 のアルゴリズムは、2Npairs 2N を生成するために2つのランダム拡張(一般に ``view'' と呼ばれる)を用いる。 訳抜け防止モード: N 個のランダムサンプリングペアを持つバッチに対して、 { xk, yk}k=1 アルゴリズムは 2Npairs 2N を生成するために2つのランダム拡張(通常 ` ` view ' )を使用する。 したがって y[1 : N ] = y[N+1:2N ] = y[1 : N ] となる。
0.78
In our case, we simply replicate the batch to {𝐱𝐥̃ , 𝐲𝐥̃}l=1 create 2𝑁 pairs.
私たちの場合、バッチを {xl' , yl'}l=1 に複製するだけで、2N 対が生成される。
0.65
Using this multi-viewed batch, with index 𝑖 ∈ 𝐼 ≡ {1 … 2𝑁} and its augmented pair 𝑗(𝑖), the supervised contrastive (SC) loss32 takes the following form:
指数 i ∈ I > {1 ... 2N} と拡張対 j(i) の多ビューバッチを用いて、教師付きコントラスト(SC)損失32 は以下の形式を取る。
0.77
ℒ 𝑆𝐶 = − ∑ 𝑖∈𝐼
ℒ 𝑆𝐶 = − ∑ 𝑖∈𝐼
0.46
1 𝑆(𝑖) ∑ log 𝑠∈𝑆(𝑖)
1 𝑆(𝑖) 太字はS(i)。
0.43
exp(𝐳𝑖 ⋅ 𝐳𝑠)/ 𝜏
exp(zi ⋅ zs)/τ
0.40
∑ 𝑟∈𝐼\𝑖 exp(𝐳𝑖 ⋅ 𝐳𝑟)/ 𝜏
∑ 𝑟∈𝐼\𝑖 exp(zi ⋅ zr)/τ
0.34
where 𝑆(𝑖) ≡ {𝑠 ∈ 𝐼\𝑖 ∶ 𝐲̃𝑝 = 𝐲𝑝} is the set of indices of all the positives in the multi-viewed batch distinct from 𝑖 and 𝜏 is a temperature parameter.
Overall, the objective function for training disentangled autoencoder takes the following form:
全体として、乱れたオートエンコーダを訓練する目的関数は以下の形式をとる。
0.63
ℒ 𝐷𝑖𝑠𝑒𝑛𝑡𝑔𝑙−𝐴𝐸 = ℒ𝑅𝑒𝑐𝑜𝑛 + 𝛼𝑑 ⋅ ℒ𝑧𝑑
ℒ 𝐷𝑖𝑠𝑒𝑛𝑡𝑔𝑙−𝐴𝐸 = ℒ𝑅𝑒𝑐𝑜𝑛 + 𝛼𝑑 ⋅ ℒ𝑧𝑑
0.41
𝑆𝐶 + 𝛼𝑎 ⋅ ℒ𝑧𝑎 𝑆𝐶
𝑆𝐶 + 𝛼𝑎 ⋅ ℒ𝑧𝑎 𝑆𝐶
0.49
where 𝛼∗ represent the hyperparameter for the corresponding latent loss.
ここで α∗ は対応する潜在損失のハイパーパラメータを表す。
0.64
We set these hyperparameters through grid-search using held-out validation set.
ホールドアウト検証セットを用いて格子探索によりこれらのハイパーパラメータを設定する。
0.54
For each latent variable, the contrastive loss will enforce encoder to give closely aligned representations to all the entries from the same label in the given batch encouraging disentanglement of disease and environment features in two separate latent variables.
Model architecture and training setting We used the Python package scikit-learn33 to implement all the linear models and PyTorch34 to implement all the non-linear models, including the proposed Disentangling Autoencoder.
The encoder layers are designed to learn low-dimensional features, and the decoder layers to upscale the low-dimensional features into high-dimensional
We conduct hyperparameter testing selecting the model that results in the best validation AUC for the following hyperparameters: 𝐳𝐝 (30, 40, 50), 𝐳𝐚 (30, 40, 50), 𝜏 (0.03, 0.05), and 𝛼∗ (0.0001, 0.0003).
For the ADSP, 𝐳𝐝, 𝐳𝐚, 𝜏, and 𝛼∗ are selected respectively as 40, 40, 0.03 and 0.0001, and for the UKB, 𝐳𝐝, 𝐳𝐚, 𝜏, and 𝛼∗ are selected respectively as 40, 30, 0.05 and 0.0001.
For the UKB, N = 100, N1 = 10, N2 = 70, and batch size = 256.
イギリスでは、N = 100、N1 = 10, N2 = 70、バッチサイズ = 256 となる。
0.82
We then use ordinary least square linear regression to minimize the residual sum of squares between 𝐲, the phenotype labels and 𝒑𝒛, the targets produced by the 𝐳𝐝.
次に、通常の最小二乗線型回帰を用いて、zd によって生成される目的である y, 表現型ラベルと pz の間の平方の残余和を最小化する。
0.60
For the ensemble modeling, we combine 𝒑𝒛, predictions from learned representations and 𝒑𝒙, prediction from the original data.
For the prediction from the original data, we consider Lasso linear model to make fair comparison with existing models.
原データからの予測のために,ラッソ線形モデルは既存のモデルと公平に比較できると考える。
0.80
To combine these predictions, we conduct both grid-search and gradient-based search.
これらの予測を組み合わせるために、格子探索と勾配探索の両方を行う。
0.60
For grid-search, we test all the values from 0.1 to 1.5, increasing at 0.1 for both 𝛼 and 𝛽, and found 𝛼 = 1.4 and 𝛽 = 0.4 for the ADSP, and 𝛼 = 1.2 and 𝛽 = 0.6 for the UKB, that produces the best AUC for the validation set.
For the gradientbased search, we consider 𝛼 and 𝛽 as the parameter, initialize their weights as 1.1 and 0.9, and train the ensemble function (𝑝𝑒) with an SGD optimizer for 5000 epochs.
For training NN, we consider following parameters for ADSP: number of epochs = 200, learning rate = 5e-3, batch size = 64, and for the UKB: number of epoch = 100, learning rate = 5e3, batch size = 256.
For Adv, we consider Wasserstein distance for capturing domain invariant representation, and followed the experimental setup from Shen et al , 201736.
advでは、ドメイン不変表現をキャプチャするためにwasserstein距離を考慮し、shen et al , 201736からの実験的な設定に従っている。
0.60
The Lasso linear models are fitted with iterative fitting along a regularization path and the best model is selected by 5-fold cross-validation using whole training set.
We set alphas automatically and consider 𝑁𝑎𝑙𝑝ℎ𝑎 (number of alphas along the regularization path) = 10, maximum number of iterations = 5000 and tolerance value for optimization = 1e-3.
For the polygeneic risk score (PRS), we follow the standard approach by computing the sum of risk alleles corresponding to the AD phenotype for each sample, weighted by the effect size estimate of the most powerful GWAS on
the phenotype. We obtain the effect size from the genetic variants identified by Jansen et al , 201923 and Andrews et al , 202024.
表現型。 我々はJansen et al , 201923 および Andrews et al , 202024 によって同定された遺伝的変異体から効果の大きさを求める。
0.61
References and Notes 1. Zhang, Q. et al Risk prediction of late-onset Alzheimer’s disease implies an oligogenic architecture.
参考文献1。 zhang, q. et al risk prediction of late-onset alzheimer's disease はオリゴゲン構造である。
0.59
Nat. Commun. 11, 1–11 (2020).
Nat! 共産。 11, 1–11 (2020).
0.40
Escott-Price, V., Shoai, M., Pither, R., Williams, J. & Hardy, J. Polygenic score prediction captures nearly all common genetic risk for Alzheimer’s disease.
escott-price, v., shoai, m., pither, r., williams, j. & hardy, j. polygenic score predictionは、アルツハイマー病の遺伝子リスクのほとんどすべてを捉えている。
0.78
Neurobiol.
ニューロバイオオール
0.52
Aging 49, 214.e7 (2017).
年齢49, 214.e7 (2017)。
0.73
Leonenko, G. et al Identifying individuals with high risk of Alzheimer’s disease using polygenic risk scores.
leonenko, g. et al は多原性リスクスコアを用いてアルツハイマー病のリスクの高い個人を同定した。
0.62
Nat. Commun. 12, (2021).
Nat! 共産。 12, (2021).
0.53
Squillario, M. et al A telescope GWAS analysis strategy, based on SNPs-genes-pathways ensamble and on multivariate algorithms, to characterize late onset Alzheimer’s disease.
SNPs-genes-pathwaysのエンサンブルと多変量アルゴリズムに基づく、Squillario, M. et al A telescope GWAS analysis strategyは、後期アルツハイマー病を特徴づける。
0.81
Sci. Rep. 10, 1–12 (2020).
Sci 第10巻1-12巻(2020年)。
0.30
Jo, T., Nho, K., Bice, P. & Saykin, A. J. Deep learning-based identification of genetic variants: Application to Alzheimer’s disease classification.
Jo, T., Nho, K., Bice, P. & Saykin, A. J. Deep Learning-based Identification of genetic variants: Application to Alzheimer's disease classification。 訳抜け防止モード: Jo, T., Nho, K., Bice, P。 And Saykin, A. J. Deep Learning - 遺伝的変異の同定に基づくアルツハイマー病分類への応用
0.78
medRxiv 2021.07.19.21260789 (2021).
medRxiv 2021.07.19.21260789 (2021)
0.32
Peng, J. et al A Deep Learning-based Genome-wide Polygenic Risk Score for Common Diseases Identifies Individuals with Risk.
Peng, J. et al A Deep Learning-based Genomewide Polygenic Risk Score for Common Diseases Identification individuals with Risk。
0.46
medRxiv (2021).
メドクシウス(2021年)。
0.50
Cook, J. P. & Morris, A. P. Multi-ethnic genome-wide association study identifies novel locus for type 2 diabetes susceptibility.
Cook, J. P. & Morris, A. P. Multi-ethnic genome-wide association study は2型糖尿病の感受性を示す新規遺伝子座を同定する。
0.60
Eur. J. Hum. Genet.
Eur! j・ハン Genet
0.41
24, 1175–1180 (2016).
24, 1175–1180 (2016).
0.47
N’songo, A. et al African American exome sequencing identifies potential risk variants at Alzheimer disease loci.
N’songo, A. et al African American Exome Sequencing はアルツハイマー病座の潜在的なリスク変異を同定する。
0.73
Neurol. Genet.
神経質だ Genet
0.41
3, (2017).
3, (2017).
0.43
Atkinson, E. G. et al Tractor uses local ancestry to enable the inclusion of admixed individuals in GWAS and to boost power.
Atkinson, E. G. et al Tractorは、GWASに混入した個体を包含し、パワーを高めるために、局所的な祖先を利用する。 訳抜け防止モード: Atkinson, E. G. et al Tractor は局所的な祖先を用いる GWASに混入した個人を取り込み、パワーを高める。
0.71
Nat. Genet.
Nat! Genet
0.31
53, 195–204 (2021).
53, 195–204 (2021).
0.47
2. 3. 4.
2. 3. 4.
0.43
5. 6. 7.
5. 6. 7.
0.42
8. 9. 10.
8. 9. 10.
0.43
Martin, A. R. et al Clinical use of current polygenic risk scores may exacerbate health
Martin, A. R. et al による現在のポリジェニックリスクスコアの臨床利用は健康を悪化させる
0.70
disparities. Nat. Genet.
格差。 Nat! Genet
0.30
51, 584–591 (2019).
51, 584–591 (2019).
0.47
11. Grinde, K. E. et al Generalizing polygenic risk scores from Europeans to
11. grinde, k. e. et al ヨーロッパ人から多因性リスクスコアを一般化
0.52
12. Hispanics/Latinos.
12. ヒスパニック/ラテン系。
0.43
Genet. Epidemiol.
Genet エピデミオール。
0.40
43, 50–62 (2019).
43, 50–62 (2019).
0.47
Carlson, C. S. et al Generalization and Dilution of Association Results from European GWAS in Populations of Non-European Ancestry: The PAGE Study.
Carlson, C. S. et al Generalization and Dilution of Association Results from European GWAS in Populations of Non-European Ancestry: The PAGE Study
0.46
PLoS Biol.
PLoS Biol所属。
0.81
11, (2013).
11, (2013).
0.42
Shen, Z. et al Towards Out-Of-Distribution Generalization: A Survey.
Shen, Z. et al towardss Out-Of-Distribution Generalization: A Survey
0.40
14, 1–22 (2021).
14, 1–22 (2021).
0.47
13. 14. Martin, A. R. et al Human Demographic History Impacts Genetic Risk Prediction across
13.14. Martin, A. R. et al Human Demographic History は遺伝リスク予測に影響を及ぼす
0.84
15. 16. Diverse Populations.
15. 16. 多様な民族。
0.51
Am. J. Hum.
私は... j・ハン
0.47
Genet. 100, 635–649 (2017).
Genet 100, 635–649 (2017).
0.36
Popejoy, A. & Fullerton, S. Genomics is failing on diversity.
Popejoy, A. & Fullerton, S. Genomicsは多様性に欠けている。
0.83
Nature 538, 161–164 (2016).
自然 538, 161–164 (2016)。
0.43
Bitarello, B. D. & Mathieson, I. Polygenic scores for height in admixed populations.
Bitarello, B. D. & Mathieson, I. 混交個体群の高さのポリジェニックスコア
0.70
G3 Genes, Genomes, Genet.
G3遺伝子、ゲノム、遺伝子。
0.75
10, 4027–4036 (2020).
10, 4027–4036 (2020).
0.47
17. Marnetto, D. et al Ancestry deconvolution and partial polygenic score can improve
17. marnetto, d. et al ancestry deconvolutionと部分多原性スコアが向上する
0.55
18. susceptibility predictions in recently admixed individuals.
18. 最近混入した個体の感受性予測
0.55
Nat. Commun. 11, 1–9 (2020).
Nat! 共産。 11, 1–9 (2020).
0.40
Tzeng, E., Hoffman, J., Saenko, K. & Darrell, T. Adversarial discriminative domain adaptation.
Tzeng, E., Hoffman, J., Saenko, K. & Darrell, T. Adversarial 識別ドメイン適応。
0.82
in Proceedings of the IEEE conference on computer vision and pattern recognition 7167–7176 (2017).
ieee conference on computer vision and pattern recognition 7167–7176 (2017) で発表された。
0.76
19. Ganin, Y. et al Domain-adversarial training of neural networks.
19. Ganin, Y. et al Domain-Adversarial Training of Neural Network
0.42
Adv. Comput.
adv。 Comput
0.37
Vis. Pattern Recognit.
ビス パターン 認識。
0.43
17, 189–209 (2017).
17, 189–209 (2017).
0.47
英語(論文から抽出)
日本語訳
スコア
20. Chopra, S., Hadsell, R. & LeCun, Y. Learning a similarity metric discriminatively, with application to face verification.
20. Chopra, S., Hadsell, R. & LeCun, Y. 類似度を識別的に学習し, 対面検証に適用する。
0.63
Proc. - 2005 IEEE Comput.
Proc 2005年のieeeコンピュート。
0.40
Soc. Conf. Comput.
Soc Conf Comput
0.25
Vis. Pattern Recognition, CVPR 2005 I, 539–546 (2005).
ビス Pattern Recognition, CVPR 2005 I, 539-546 (2005)。
0.59
21. Oord, A. van den, Li, Y. & Vinyals, O. Representation Learning with Contrastive Predictive
21. Oord, A. van den, Li, Y. & Vinyals, O. Representation Learning with Contrastive Predictive
0.44
Coding. (2018).
コーディング。 (2018).
0.53
23. 22.
23. 22.
0.43
Gyawali, P. K., Horacek, B. M., Sapp, J. L. & Wang, L. Sequential Factorized Autoencoder for Localizing the Origin of Ventricular Activation from 12-Lead Electrocardiograms.
gyawali, p. k., horacek, b. m., sapp, j. l. & wang, l. sequential factorized autoencoder for localizing the origin of ventricular activation from 12-lead electrocardiograms. 訳抜け防止モード: Gyawali, P. K., Horacek, B. M., Sapp J. L. & Wang, L. Sequential Factorized Autoencoder 12左心電図による心室活動の起源の同定
0.85
IEEE Trans. Biomed.
ieeeトランス。 バイオメディカル。
0.56
Eng. 67, 1505–1516 (2020).
Eng! 67, 1505–1516 (2020).
0.42
Jansen, I. E. et al Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk.
Jansen, I. E. et al Genome-wide meta-analysisはアルツハイマー病のリスクに影響を与える新しい座位と機能経路を同定する。
0.62
Nat. Genet.
Nat! Genet
0.31
51, 404–413 (2019).
51, 404–413 (2019).
0.47
Andrews, S. J., Fulton-Howard, B. & Goate, A. Interpretation of risk loci from genomewide association studies of Alzheimer’s disease.
Andrews, S. J., Fulton-Howard, B. & Goate, A. アルツハイマー病のゲノムワイド関連研究からのリスク・ロシの解釈
0.80
Lancet Neurol.
ランセットニューロン。
0.57
19, 326–335 (2020).
19, 326–335 (2020).
0.47
Chen, C. Y. et al Improved ancestry inference using weights from external reference panels.
Chen, C. Y. et al 外部基準パネルからの重みを用いた祖先推論の改善。
0.67
Bioinformatics 29, 1399–1406 (2013).
バイオインフォマティクス29,1399-1406 (2013)
0.68
24. 25. 26.
24. 25. 26.
0.43
Mclnnes, L., Healy, J., Saul, N. & Grobberger, L. UMAP: Uniform Manifold Approximation
Mclnnes, L., Healy, J., Saul, N. & Grobberger, L. UMAP: Uniform Manifold Approximation
0.47
27. 28. and Projection.
27. 28. そしてプロジェクション。
0.46
J. Open Source Softw.
j. open source softw(英語)
0.70
3, (2018).
3, (2018).
0.42
Fang, H. et al Harmonizing Genetic Ancestry and Self-identified Race/Ethnicity in Genome-wide Association Studies.
ゲノムワイド・アソシエーション研究におけるFang, H. et al Harmonizing Genetic Ancestry and Self-identified Race/Ethnicity
0.69
Am. J. Hum.
私は... j・ハン
0.47
Genet. 105, 763–772 (2019).
Genet 105, 763–772 (2019).
0.36
Borrell, L. N. et al Race and Genetic Ancestry in Medicine - A Time for Reckoning with Racism.
Borrell, L. N. et al Race and Genetic Ancestry in Medicine - A Time for Reckoning with Racism 訳抜け防止モード: Borrell, L. N. et al Race and Genetic Ancestry in Medicine - レイシズムによるリコンディングの時間。
0.73
Obstet. Gynecol.
わいせつだ Gynecol
0.25
Surv. 76, 395–397 (2021).
サーヴ。 76, 395–397 (2021).
0.46
29. Hua, T. et al On Feature Decorrelation in Self-Supervised Learning.
29. hua, t. et al 自己教師付き学習における特徴非相関について
0.63
9598–9608 (2021).
9598–9608 (2021).
0.44
30. Zbontar, J., Jing, L., Misra, I., LeCun, Y. & Deny, S. Barlow Twins: Self-Supervised Learning via Redundancy Reduction.
30. Zbontar, J., Jing, L., Misra, I., LeCun, Y. & Deny, S. Barlow Twins: Self-Supervised Learning via Redundancy Reduction 訳抜け防止モード: 30. Zbontar, J., Jing, L., Misra, I. LeCun, Y. & Deny, S. Barlow Twins : Self - 冗長化による教師付き学習
0.65
(2021). Auton, A. et al A global reference for human genetic variation.
(2021). Auton, A. et al ヒトの遺伝的変異のグローバル参照。
0.65
Nature 526, 68–74 (2015).
526, 68-74 (2015)。
0.66
Khosla, P. et al Supervised contrastive learning.
khosla, p. et alは対照的な学習を監督した。
0.46
Adv. Neural Inf.
adv。 神経障害。
0.48
Process. Syst.
プロセス。 シスト。
0.68
2020Decem, 1–23 (2020).
2020年、第123回(2020年)。
0.57
31. 32. 33.
31. 32. 33.
0.43
Duchesnay, F. P. and G. V. and A. G. and V. M. and B. T. and O. G. and M. B. and P. P. and
Duchesnay, F. P. and G. V. and A. G. and V. M. and B. T. and O. G. and M. B. and P. P. 訳抜け防止モード: Duchesnay, F. P. and G. V. and A. G. and V. M. そして、B.T.、O.G.、M.B.、P.P.
0.66
34. 35. 36.
34. 35. 36.
0.43
R. W. and V. D. and J. V. and A. P. and D. C. Scikit-learn: Machine Learning in Python.
R. W. and V. D. and J. V. and A. P. and D. C. Scikit-learn: Machine Learning in Python
0.39
J. Mach. Learn. Res. 12, 2825–2830 (2011).
j・マッハ 学ぶ。 第12巻2825-2830頁(2011年)。
0.56
Paszke, A. et al PyTorch: An Imperative Style, High-Performance Deep Learning Library.
Paszke, A. et al PyTorch: 命令型スタイル、高性能ディープラーニングライブラリ。
0.76
Adv. Neural Inf.
adv。 神経障害。
0.48
Process. Syst.
プロセス。 シスト。
0.68
32 8024–8035 (2019).
32 8024–8035 (2019).
0.46
Kingma, D. P. & Ba, J. Adam: A Method for Stochastic Optimization.
kingma, d. p. & ba, j. adam: 確率最適化の方法。
0.77
(2014) doi:10.1063/1.490245 8.
(2014) doi:10.1063/1.490245 8。
0.24
Shen, J., Qu, Y., Zhang, W. & Yu, Y. Wasserstein distance guided representation learning for domain adaptation.
Shen, J., Qu, Y., Zhang, W. & Yu, Y. Wasserstein distanceはドメイン適応のための表現学習を指導した。
0.79
32nd AAAI Conf.
第32回aaaiコンファレンス。
0.49
Artif. Intell.
アーティフ インテリ。
0.49
AAAI 2018 4058–4065 (2018).
AAAI 2018 4058-4065 (2018)。
0.80
Acknowledgements Funding: This work is supported by NIH/NIA award AG066206 (ZH).
認定基金:NIH/NIAアワードAG066206(ZH)が支援している。
0.64
Author Contributions P.K.G, J.Z, and Z.H. developed the concepts for the manuscript and proposed the method.
著者のP.K.G、J.Z、Z.H.は原稿の概念を開発し、その方法を提案した。
0.54
P.K.G, Y.L.G, X.L., J.Z, and Z.H. designed the analyses and applications and discussed the results.
Competing interests: The authors declare no competing interests.
競合する利益:著者は競合する利益を宣言しない。
0.67
Data and materials availability: The dataset used in this paper i.e., the Alzheimer’s Disease Sequencing Project (ADSP) and the UK Biobank (UKB) are publicly available data cohorts.
The top row represents architecture for Disentangling autoencoder (DisentglAE), the middle row represents architecture for supervised Neural Network (NN), and the bottom row represents Adversarial learning (Adv).
For all the networks except Adv Critic (bottom row, right), the genotype data is presented to the network as the first input data with dimension (input dim) of 3892 for the ADSP and 4967 for the UKB.
For Adv Critic, the input dimension is the output of the FC2 layer of Adv Backbone.
Adv Critic の場合、入力次元は Adv Backbone の FC2 層の出力である。
0.70
Linear Layer represents multilayer perceptron, and ReLU represents Rectified Linear Unit, a nonlinear activation function.
線形層は多層パーセプトロンを表し、ReLUは非線形活性化関数Rectified Linear Unitを表す。
0.87
For DisentglAE, 𝐳𝐝 dim and 𝐳𝐚 dim represents latent dimensions respectively for phenotype-specific representation and ancestry-specific representation.
DisentglAE では、zd dim と za dim はそれぞれ表現型特異的表現と祖先特異的表現の潜在次元を表す。