TY - JOUR
T1 - An evaluation of the MiDCoP method for imputing allele frequency in genome wide association studies
AU - Gautam, Yadu
AU - Lee, Carl
AU - Cheng, Chin I.
AU - Langefeld, Carl
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2015.
PY - 2015
Y1 - 2015
N2 - A genome wide association studies require genotyping DNA sequence of a large sample of individuals with and without the specific disease of interest. The current technologies of genotyping individual DNA sequence only genotype a limited DNA sequence of each individual in the study. As a result, a large fraction of Single Nucleotide Polymorphisms (SNPs) are not genotyped. Existing imputation methods are based on individual level data, which are often time consuming and costly. A new method, the Minimum Deviation of Conditional Probability (MiDCoP), was recently developed that aims at imputing the allele frequencies of the missing SNPs using the allele frequencies of neighboring SNPs without using the individual level SNP information. This article studies the performance of the MiDCoP approach using association analysis based on the imputed allele frequency by analyzing the GAIN Schizophrenia data. The results indicate that the choice of reference sets has strong impact on the performance. The imputation accuracy improves if the case and control data sets are imputed using a separate but better matched reference set, respectively.
AB - A genome wide association studies require genotyping DNA sequence of a large sample of individuals with and without the specific disease of interest. The current technologies of genotyping individual DNA sequence only genotype a limited DNA sequence of each individual in the study. As a result, a large fraction of Single Nucleotide Polymorphisms (SNPs) are not genotyped. Existing imputation methods are based on individual level data, which are often time consuming and costly. A new method, the Minimum Deviation of Conditional Probability (MiDCoP), was recently developed that aims at imputing the allele frequencies of the missing SNPs using the allele frequencies of neighboring SNPs without using the individual level SNP information. This article studies the performance of the MiDCoP approach using association analysis based on the imputed allele frequency by analyzing the GAIN Schizophrenia data. The results indicate that the choice of reference sets has strong impact on the performance. The imputation accuracy improves if the case and control data sets are imputed using a separate but better matched reference set, respectively.
KW - Association Tests
KW - Conditional Probability
KW - Imputation
KW - Minimum Deviation
KW - Multilocus Information Measure
KW - Single Nucleotide Polymorphisms
UR - http://www.scopus.com/inward/record.url?scp=84921486321&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-10389-1_5
DO - 10.1007/978-3-319-10389-1_5
M3 - Article
AN - SCOPUS:84921486321
VL - 569
SP - 57
EP - 67
JO - Studies in Computational Intelligence
JF - Studies in Computational Intelligence
SN - 1860-949X
ER -