TY - JOUR
T1 - TopQA: a topological representation for single-model protein quality assessment with machine learning
AU - Smith, John
AU - Eickholt, Jesse
N1 - Funding Information:
The authors would like to thank support from students and faculties at Pacific Lutheran University. Also, we thank the anonymous referees for their valuable comments and helpful suggestions. This work is supported by Natural Sciences Undergraduate Research Program at Pacific Lutheran University, Karen Hille Phillips Regency Advancement Award to Prof. Cao, and the Division of Natural Science at Pacific Lutheran University.
Publisher Copyright:
© 2020 Inderscience Enterprises Ltd.
PY - 2020/2/7
Y1 - 2020/2/7
N2 - Correctly predicting the complex three-dimensional structure of a protein from its sequence would allow for a superior understanding of the function of specific proteins with many applications. We propose a novel method aimed to tackle a crucial step in the protein prediction problem, assessing the quality of generated predictions. Unlike traditional methods, our method, to the best of our knowledge, is the first to analyse the topology of the predicted structure. We found that our new representation provided accurate information regarding the location of the protein's backbone. Using this information, we implemented a novel algorithm based on convolutional neural network (CNN) to predict GDT\_TS score for given protein models. Our method has shown promising results - overall correlation of 0.41 on CASP12 dataset. Future work will aim to implement additional features into our representation. The software is freely available at GitHub: https://github.com/caorenzhi/TopQA.
AB - Correctly predicting the complex three-dimensional structure of a protein from its sequence would allow for a superior understanding of the function of specific proteins with many applications. We propose a novel method aimed to tackle a crucial step in the protein prediction problem, assessing the quality of generated predictions. Unlike traditional methods, our method, to the best of our knowledge, is the first to analyse the topology of the predicted structure. We found that our new representation provided accurate information regarding the location of the protein's backbone. Using this information, we implemented a novel algorithm based on convolutional neural network (CNN) to predict GDT\_TS score for given protein models. Our method has shown promising results - overall correlation of 0.41 on CASP12 dataset. Future work will aim to implement additional features into our representation. The software is freely available at GitHub: https://github.com/caorenzhi/TopQA.
UR - https://www.inderscienceonline.com/doi/abs/10.1504/IJCBDD.2020.105095
M3 - Article
SN - 1756-0756
VL - 13
SP - 144
EP - 153
JO - International Journal of Computational Biology and Drug Design
JF - International Journal of Computational Biology and Drug Design
IS - 1
ER -